a story about the limits of vibe coding:
Recently built a blackjack card counting calculator that helps advantage players know their expected value (EV) depending on game conditions, bet spreads, deck penetration and so on.
I had the simulation results from external software, but wanted to express this as a math formula so we could cover any conditions that didn't have simulation results.
so I built a little self calibration tool, where the AI tweaks a few numbers, runs tests against the real simulator results, and goes in a loop until it all tests pass a given threshold
at first it got impressively close, but not close enough to pass the tests.
eventually it gave up and cheated by just changing the threshold so tests would pass
after explicitly telling it that thresholds cannot be changed, it resorted to changing the simulation results!
after telling it that's also not acceptable, it started to regress and eventually made the calculator much worse.
both Claude and codex did the same thing, resorting to cheating and being sneaky, and eventually ruining the code when it couldn't produce the results we needed
Recently built a blackjack card counting calculator that helps advantage players know their expected value (EV) depending on game conditions, bet spreads, deck penetration and so on.
I had the simulation results from external software, but wanted to express this as a math formula so we could cover any conditions that didn't have simulation results.
so I built a little self calibration tool, where the AI tweaks a few numbers, runs tests against the real simulator results, and goes in a loop until it all tests pass a given threshold
at first it got impressively close, but not close enough to pass the tests.
eventually it gave up and cheated by just changing the threshold so tests would pass
after explicitly telling it that thresholds cannot be changed, it resorted to changing the simulation results!
after telling it that's also not acceptable, it started to regress and eventually made the calculator much worse.
both Claude and codex did the same thing, resorting to cheating and being sneaky, and eventually ruining the code when it couldn't produce the results we needed
6β€οΈ3π1π€1