Chingie, the Studio Chingie mascot

Studio Chingie

Games, apps, tools, and the side quests between them.

Back to projects

Launch Bay

Platformer Slice Benchmark

AI benchmark

A browser platformer benchmark for comparing how different AI models build and revise a small playable slice.

AI benchmarkBrowser prototypePlatformer sliceLocal model test
Contact sheet comparing platformer-slice benchmark outputs from several AI models

Details

Status
Local model benchmark
Images
3 images linked

Stack

HTML CanvasPlaywrightLocal modelsBenchmark reports

Source

platformer-slice-world1 benchmark reportsPrivate | Markdown / HTML | Updated 2026-06-28Local benchmark reports and generated browser platformer slices.

Benchmark

Result table

Scores are from the final benchmark shot.

ModelFinalShot 1Shot 2Shot 3Wall time
ds4-100k-nothink79/100727279638.2s
qwen27-mtp-fast75/100737375347.7s
qwen122-q4xl-vision-64k-think75/100676875271.4s
step37-unsloth-iq4xs-text-mtp2-r204875/100667475524.3s
qwen35-a3b-no-think74/100476774115.6s
qwen122-q4xl-vision-64k74/100717174296.5s
nex-n2-mini-q8-vision-64k74/100646374222.7s
step37-unsloth-iq4xs-vision-r204871/100626871522.6s

Playable Builds

Load a benchmark run

Sandboxed browser builds from the final benchmark shots.

Loading playable build...

Screenshots

Images from the project

Some are current captures; some are concept images or source art.

Contact sheet comparing several AI-generated browser platformer benchmark outputs
Current build

Cropped benchmark contact sheet comparing final outputs from local and AI model runs.

DS4 model browser platformer benchmark capture after playtest input
Current build

Playtest capture from the top-scoring DS4 run after automated movement input.

Nex model browser platformer benchmark capture after playtest input
Current build

Playtest capture from the Nex run after automated movement input.

Overview

This benchmark asks several models to build and revise the same small browser platformer slice, then compares the final playable output.

Now

The current page publishes the June benchmark results, selected playtest captures, and a cropped contact sheet of final outputs.

Why I'm making it

Small playable tests make model differences easier to see than chat transcripts: movement, layout, errors, and polish all show up on screen.