Data

Vibe Coding Games: What Ships and What Breaks

By Ziva.sh • April 2, 2026 • 7 min read

TL;DR / Key Takeaways

Vibe coding is real for game prototyping. Developers are building playable games in hours instead of weeks. Pieter Levels earns
$50K+/mo

from an AI-built flight sim. Nicolas Zullo got 45,000 players with zero human-written code.
But code quality collapses fast. A Carnegie Mellon study of 807 repos found AI-assisted code had 41% higher complexity. Velocity gains disappeared within two months while the technical debt stayed.
Platform choice decides whether you ship. Browser games (Three.js) are easiest to vibe code but hardest to turn into real products. Game engines like Godot offer a middle path: AI-friendly scripting with actual export pipelines.

Pieter Levels built an in-browser MMO flight simulator and earns over $50,000 per month from it. Bartek Bogacki built a retro browser game in one hour using Cursor , but the result was a single 1,400-line JavaScript file that the AI agent “reached a dead end and couldn’t recover” from.

These two stories capture the state of vibe coding for games in 2026. The technique works, sometimes spectacularly. It also fails in predictable ways once projects cross a complexity threshold.

What vibe coding means (and doesn’t)

Andrej Karpathy coined the term in February 2025: “fully give in to the vibes, embrace exponentials, and forget that the code even exists.” Collins Dictionary named it Word of the Year by November. Google Keyword Planner shows 110,000 monthly searches with over 1,200% year-over-year growth.

For game development specifically, vibe coding means describing what you want your game to do in natural language and letting an AI tool (Cursor, Claude Code, Copilot) write the implementation. You iterate by playtesting and feeding observations back to the model.

This is different from using AI as an autocomplete. Vibe coding means the human sets direction and evaluates output but does not write or deeply review the code. That distinction matters for understanding where it breaks.

The prototyping speed is real

The numbers from real developers are consistent: AI compresses game prototyping from days into hours.

Nicolas Zullo built a multiplayer WW2 dogfight arena with zero human-written code . Twenty hours, 500 prompts, twenty euros of API costs. The game attracted 45,000 players and 1.5 million views on X. He used Cursor with Claude for code and Grok 3 for planning.

Ágoston Torok vibe-coded a CEO Simulator that drew 10,000 players in one weekend using Windsurf and DeepSeek R1. He even built a reinforcement learning loop with Claude 3.5 to automatically playtest and balance the game.

Bill Prin, building a deckbuilder for the Pieter Levels Vibe Coding Game Jam (over 1,000 entries), estimated it would have taken “at least a week to get the game feeling right, but it took the AI about 90 minutes.”

For prototyping, vibe coding delivers. But these success stories share a pattern: browser-based games running on web stacks that LLMs know extremely well. None shipped through Steam or had save systems, multiplayer infrastructure, or platform-specific builds.

The data

The complexity wall is measured, not anecdotal

AI velocity gains fade. Code complexity stays.

% change vs control repos after Cursor adoption

■ Velocity gain (lines/month)■ Complexity increase

CMU study of 807 GitHub repos adopting Cursor vs. 1,380 controls. Velocity gains dissipate; complexity persists. Source: He et al., MSR 2026 (arXiv:2511.04427)

A CMU study of 807 GitHub repos found AI-assisted coding (Cursor) produced 281% more lines in month 1, dropping to 48% in month 2 and near zero by month 3. Code complexity increased 41% and stayed elevated permanently.

Carnegie Mellon researchers studied 807 GitHub projects that adopted Cursor versus 1,380 control repos. The velocity spike was dramatic: 281% more lines of code in the first month. By month two it dropped to 48%. By month three it was gone. But code complexity increased 41% and static analysis warnings rose 30%, and those numbers did not come back down. The technical debt accumulated during the fast phase slowed everything afterward.

This matches what GitClear found analyzing 211 million changed lines across repos from Google, Microsoft, and Meta: copy-pasted code surpassed refactored code for the first time in 2024. Code churn (new code rewritten within two weeks) nearly doubled from 2020 to 2024.

The METR randomized trial added a psychological dimension. Sixteen experienced open-source developers (repos averaging 22,000+ stars) completed 246 tasks with and without AI. With Cursor Pro and Claude 3.5 Sonnet, they were 19% slower. But they perceived themselves as 24% faster. Even after seeing the data, they still believed AI had helped.

For game development, the complexity wall hits especially hard. Games have tightly coupled systems: physics, rendering, input, audio, state management, saving, and UI all interact. When AI generates code that works in isolation but couples poorly, debugging requires understanding the full system. Faros AI’s analysis of 10,000+ developers found that higher AI adoption correlated with 9% more bugs and 91% longer code reviews.

Game engine developers are hitting this wall

Lucca Sanwald tried to build a pixel-art RTS in Godot with Claude writing 100% of the code . After eight hours he had a working prototype. Then he scrapped the project. “I’ve decided to scrap this experiment because the loss of control that I experienced was not sensible to me.” He had lost touch with his own codebase and could not fix issues independently.

Olga Biro vibe-coded a match-3 game in Godot and the AI’s first attempt hallucinated an entirely wrong game: “space invaders with cats shooting lasers at baked goods” instead of a candy-matching game. She burned through 130 of 150 monthly AI credits in one session and concluded that “every time I asked for a minor tweak, the result was overcomplicated.”

Jack Le Hamster, who vibe-coded a physics game for GameDev.js Jam 2025 , identified the core issue: “Why can’t it test its own code, how can it build something without ever trying to see if it actually works?” He called vibe coders “testers for a dev who never tests.”

The prototype compiles and runs, but the developer cannot maintain or debug it. For a game jam, that is fine. For a game you want to sell on Steam, the approach breaks down.

Platform choice decides what you can ship

Vibe coding trade-offs by platform

■ AI fluency■ Engine abstraction■ Ship-readiness

Informal assessment based on community reports, LLM benchmark data, and documentation coverage. 'AI fluency' reflects how well current models generate correct code for each platform.

Browser/Three.js scores highest for AI fluency (95) but lowest for ship-readiness (20). Godot scores 75 for AI fluency, 80 for engine abstraction, and 85 for ship-readiness. Unity and Unreal score higher on ship-readiness but lower on AI fluency.

There is a reason every viral vibe-coded game is a browser game. JavaScript and Three.js dominate LLM training corpora, and the deployment path is minimal: the AI generates a file, you open it in a browser, and it runs.

But browser games have real limits. No Steam integration, no console export, limited performance for complex 3D, and their own set of platform quirks (WebGL differences between browsers, mobile audio policies). If you want to ship a game people pay for, you need an engine.

Among game engines, Godot has a structural advantage for AI-assisted development. GDScript is a purpose-built, Python-like language with a small surface area. The entire engine ships as a 120MB binary. Scenes are stored as readable .tscn text files that AI models can parse and modify directly. The MIT license means no revenue thresholds or per-seat fees complicating an indie developer’s path to market.

Unity and Unreal have deeper ecosystems and more shipped AAA titles. But their codebases are orders of magnitude larger, their build systems more complex, and C++/C# require more precision from AI models. The 2024 GMTK Game Jam saw Godot’s share surge from 19% to 37%, nearly matching Unity, suggesting the engine’s simplicity attracts the same developers who gravitate toward AI-assisted workflows.

How to use vibe coding without getting stuck

The data points toward a hybrid approach. Use AI for the fast parts, stay hands-on for the structural ones.

Prototype with AI, architect yourself. Let the AI generate gameplay code, dialog trees, and UI layouts. Define the scene structure, file organization, and system boundaries yourself. The complexity wall hits when the AI decides how systems connect, not when it implements a single system.

Read what it writes. The Qodo survey found the most dangerous pattern: junior developers reported the lowest quality improvements from AI (51.9%) yet the highest confidence in shipping unreviewed code (60.2%). Senior developers, who got the most quality benefit (68.2%), were least likely to skip review. As we wrote about in our piece on AI and expertise, AI amplifies what you already know. It does not teach you what you are missing.

Pick the right scope. Game jam entries and prototypes are ideal vibe coding territory. Commercial releases with save systems, multiplayer, modding support, and platform exports need human architectural decisions that current AI cannot reliably make.

Tools that integrate with game engines rather than replacing them help bridge this gap. Ziva , for example, works inside the Godot editor and understands scene trees, node types, and GDScript APIs, which means its suggestions stay within the engine’s idioms instead of hallucinating frameworks that do not exist.

The Stack Overflow 2025 survey found that 77% of developers say vibe coding is not part of their professional work. The GDC 2026 survey reports only 7% of game developers view AI positively, down from 13% the year before. The hype phase is over. What remains is a tool that works well within a defined scope and fails badly outside it.