Bubble Pop

What started as an inventory control system for a client became a home security project, which somehow turned into an AR game my kids play. Here's the whole story.

๐Ÿซง Wave your hands to pop bubbles
โšก Clap to unleash a blast
๐Ÿ”ฅ Dodge lasers with your whole body
๐Ÿ’ƒ Dance party mode with your kids
29
Git Commits
17
Versions
1
Night
~3,400
Lines of JS

// The Story

It started with a client project โ€” an inventory control system that needed computer vision to track items. That got me deep into YOLO object detection and real-time video processing. Once I saw what was possible, I thought: why not build a security system for my own house?

So I built Sentry โ€” a fully local computer vision pipeline that captures video, runs YOLO object detection, tracks people with pose estimation, and logs events to a web dashboard. All on a Mac mini M4, nothing leaves the machine. Real-time person detection, skeleton tracking, the whole deal.

Then one night, staring at the skeleton tracking output, I had a completely different idea.

Then one night at 11:15 PM, I had a thought:

"Let's make a pose detection game that would be fun for a few little kids to play and interact with what's on the screen."

Three minutes later, a coding agent had built v1 from the Sentry baseline. It was terrible. It used the wrong camera, froze after a few seconds, and the hands were mirrored โ€” move left, your on-screen hand goes right. Classic.

But it proved the concept. Seeing yourself on screen popping bubbles with your actual hands was immediately fun. Even broken, you could feel it.

So I kept going.

By 2 AM, we'd built 17 versions. Full body tracking with a neon skeleton overlay. Laser beams you dodge by ducking and jumping. A clap-to-blast mechanic where you charge up energy between your palms and fire it across the screen. Spring physics that make everything feel smooth and alive. Gesture-controlled menus โ€” no mouse needed. And a progressive soundtrack built entirely from Web Audio oscillators.

But the real pivot came when I tested it with my family.

The original plan was arcade-style: punishing escape penalties, bomb bubbles, difficulty that pushes you to fail. Classic arcade design. Then I watched my daughter play and realized I had it all wrong.

"I don't want to punish the user for bubbles escaping. This is supposed to be fun for kids too. I want them to just be able to play for a long timeโ€ฆ shared rewards so if something is coming down and two players are running to collect it, both people should get it. This is a fun community game where we are all having fun together."

That killed the arcade version and birthed what Bubble Pop actually is: a family celebration engine that happens to use your body. Think of it like a digital piรฑata โ€” everyone swings, everyone cheers, candy falls for everyone. Nobody loses.

"The game should make you feel good and be happy to play."

Before going to sleep at 2 AM, I told my AI assistant one last thing:

"When I wake up in the morning, I can't wait to play it with my daughter to show her all of the amazing stuff we built!"

That's the soul of this project. It's not about the tech. It's about seeing two glowing skeletons on screen โ€” a parent and a kid โ€” popping bubbles together.

// The Build Log

Version by version โ€” the interesting pivots from one overnight session.

v1 ~11:16 PM

๐Ÿซง "The First Pop"

Built from the Sentry baseline in 3 minutes. Camera feed as background, colorful bubbles floating across the screen, wave your hands to pop them. Four bubble types, combo multipliers, Web Audio synth pops.

What went wrong: Wrong camera. Froze after a few seconds. Hands mirrored. No error handling โ€” any hiccup killed the whole pipeline.

The lesson: The first version always breaks. But it proved the concept โ€” seeing yourself on screen popping bubbles was immediately fun.

v2 ~12:08 AM

๐Ÿ’€โšก "Neon Skeleton"

Five rounds of iteration from live playtesting. This is where the game got its identity.

  • Full 17-keypoint body skeleton drawn on screen with color-coded neon bones โ€” cyan head, pink arms, purple torso, green legs
  • Laser dodge obstacles โ€” horizontal and vertical beams sweep across the screen. Duck, jump, sidestep.
  • Clap-to-blast โ€” bring hands together, charge for 0.8 seconds, fire a glowing projectile that plows through everything
  • 3 lives with screen shake on hit
  • Camera picker, mirror fix, hardened error handling, auto-reconnecting SSE

Architecture locked in: FastAPI + YOLO pose on M4 GPU pushes data via SSE โ†’ vanilla JS canvas at 60fps interpolates between updates.

v3 ~12:35 AM

๐ŸŽฎ๐Ÿ”ฅ "It Feels REAL"

1,567 lines of handcrafted game engine. This is where it stopped feeling like a tech demo and started feeling like an actual game.

The smoothness breakthrough: Spring physics. Every skeleton keypoint lives inside a spring-damper system โ€” pulled toward the real tracked position, but with momentum and damping. Camera sends data at 20fps, player sees 60fps butter. The actual AI model didn't change โ€” we just made the presentation layer respect the limits of the data.

  • Scale awareness โ€” measures shoulder distance, scales everything proportionally. A 4-year-old at 3 feet and an adult at 8 feet both get a playable experience.
  • Feet are live โ€” ankles as poppers, stomp gesture creates expanding shockwave
  • Four powerups โ€” Shield, Fire Hands, Slow-Mo, Magnet
  • Gesture menus โ€” hover hand over button for 1.5s to activate, fully hands-free
  • Progressive audio engine โ€” adaptive kick, hi-hat, bass, tempo changes
  • Game feel juice โ€” hit stop, screen flash, score ticker, streak fire
v3.2โ€“3.5 12:51 AM โ€“ 1:06 AM

๐Ÿ”ง "Making It Feel Right"

Four rapid-fire fixes from live playtesting in 15 minutes:

  • The jiggle fix โ€” replaced spring physics with EMA (exponential moving average). Dead simple: pos += (target - pos) * alpha. No overshoot, no jiggle. Just butter.
  • Benchmarked YOLO at different sizes โ€” settled on imgsz=384 for 96 FPS on M4
  • Stuck keypoints โ€” points that never got real data were drawing at (0,0). Added validity tracking.
  • Weapon lock system โ€” clasp hands once, rapid fire with aim, body-scale-relative thresholds
  • MPS GPU โ€” enabled Metal acceleration for pose estimation. ~15% faster.
THE PIVOT ~1:10 AM

๐ŸŽฏ "This Is a Family Game"

The moment that changed everything. Tested with the family, had a revelation.

Arcade Thinking
Punish missed bubbles
One player competes
"GAME OVER"
Bombs cost hearts
Family Thinking
Bubbles just float away
Everyone plays together
"GREAT GAME! ๐ŸŽ‰"
Only positive feedback
"The game should make you feel good and be happy to play. It shouldn't be overloading."

Co-op gesture designed: two players hold all four hands together โ†’ 1 second charge โ†’ screen-clearing rainbow shockwave. Nearly impossible to trigger accidentally, incredibly satisfying when intentional.

v4 ~1:25 AM

๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘ง "Family Fun Update"

Multiplayer, waves, shared rewards. Multiple skeletons tracked simultaneously, wave-based progression, wave names that feel like chapters: "๐Ÿซง Let's Pop!", "โœจ Golden Bubbles!", "๐ŸŒง๏ธ Bubble Rain!", "๐Ÿ’ƒ Dance Party!"

Then I went to sleep and my AI assistant kept building through the night โ€” weapon evolution, sound redesign around wind chimes and music boxes, progressive tutorials, brand identity alignment.

v4.2โ€“4.6 Next Morning

โšก๐ŸŒ‹ "Polish & Floor Is Lava"

Morning playtesting and a string of rapid improvements:

  • One weapon, one team โ€” stripped the confusing 3-weapon system, built a single satisfying beam with physical reload (pull hands apart to reload like pump-action)
  • The shadowBlur massacre โ€” found 55 Gaussian blur calls per frame killing performance. Eliminated all of them. Simple shapes with alpha look 90% as good at 10% of the GPU cost.
  • Floor Is Lava โ€” backend silently scans for furniture with YOLO every 5 seconds. Player sees nothing. Then randomly: "๐ŸŒ‹ THE FLOOR IS LAVA!" โ€” furniture zones glow green as safe spots, 12 seconds to get your feet on something.
"I don't want the furniture or anything to be known until it's about to be floor is lava time. That's when the glowing or whatever can start."
v4.7โ€“4.8 Latest

๐ŸŽ๏ธ "Lean & Mean"

Final performance pass. Furniture detector off by default (enable with --lava). MJPEG background dropped to 480ร—270 at quality 35. Inference optimized to imgsz=256 โ€” 2.3x faster. Combined: ~40% less CPU/GPU load.

Two clean run modes: --game for fast and smooth, --game --lava when you want the surprise.

// The Tech

Everything runs locally on a Mac mini M4. No cloud, no subscriptions, no accounts. Just a computer and a webcam.

๐Ÿง 

YOLOv11n-pose

17-keypoint body tracking on Apple Silicon GPU (MPS). ~96 FPS inference at 256px. Real-time skeleton of everyone in frame.

๐Ÿ“ก

SSE Streaming

Server pushes pose data at 30fps via Server-Sent Events. Browser interpolates with EMA smoothing for 60fps rendered movement.

๐ŸŽต

Web Audio Synth

All sounds generated from oscillators. C major pentatonic scale, triangle waves, no audio files. Mathematically incapable of sounding bad.

โšก

Zero Dependencies

The entire game is one HTML file โ€” ~3,400 lines of vanilla JS. No React, no frameworks, no build step. Canvas 2D + requestAnimationFrame.

๐Ÿ

FastAPI Backend

Python server handles camera capture, YOLO inference, and SSE streaming. Async everywhere, lightweight, stable.

๐ŸŽ

Apple Silicon

Mac mini M4, 24GB RAM. Metal Performance Shaders for GPU inference. Everything stays on-device โ€” nothing leaves the machine.

// What's Next

This is a living project. Ideas on the table:

// About

Built by Bryan Brodsky โ€” exploring what's possible when you point AI at real problems and refuse to stop iterating. Bubble Pop started as an inventory system for a client, became a home security project, and turned into a family-tested AR experience by morning. That's the fun of building things.

The whole codebase is 29 commits of "what if we tried this?" โ€” and this page will keep growing as the project does.