Damus
Hoshino Lina (星乃リナ) 🩵 3D Yuri Wedding 2026!!! · 3w
So it's not surprising that an LLM can solve them, because it automates the process. That just takes all the fun and all the learning out of it, completely defeating the purpose. I'm sure you could s...
Hoshino Lina (星乃リナ) 🩵 3D Yuri Wedding 2026!!! profile picture
This is, quite frankly, the same problem LLM agents are causing in software engineering and such, just way worse. Because with CTFs, there is no "quality metric". Once you get the flag you get the flag. It doesn't matter if your approach was ridiculous or you completely misunderstood the problem or "winged it" in the worst way possible or the solver is a spaghetti ball of technical debt. It doesn't matter if Claude made a dozen reasoning errors in its chain that no human would (which it did). Every time it gets it wrong it just tries again, and it can try again orders of magnitude faster than a human, so it doesn't matter.

I don't have a solution for this. You can't ban LLMs, people will use them regardless. You could try interviewing teams one on one after the challenge to see if they actually have a coherent story and clearly did the work, but even then you could conceivably cheat using an LLM and then wait it out a bit to make the time spent plausible, study the reasoning chain, and convince someone that you did the work. It's like LLMs in academics, but much worse due to the time constraints and explicitly competitive nature of CTFs.

LLMs broke CTFs.
4
Hoshino Lina (星乃リナ) 🩵 3D Yuri Wedding 2026!!! · 3w
And honestly, reading the Claude output, it's just ridiculous. It clearly has no idea what it's doing and it's just pattern-matching. Once it found the flag it spent 7 pages of reasoning and four more scripts trying to verify it, and failed to actually find what went wrong. It just concluded after a...
Nathan :ver: :aro: :pride: · 3w
nostr:nprofile1qy2hwumn8ghj7un9d3shjtnyd968gmewwp6kyqpq6tx08mwy9vkkjen5s8ahy9e3x5z4dmykefvs7u6wex0s02puuskqv750mt How does this statement differ from "DeepBlue broke chess"? Cheat engines are similarly impossible to deterministically detect in online competition, yet the game is more popular than e...
Ivan Molodetskikh · 3w
nostr:nprofile1qy2hwumn8ghj7un9d3shjtnyd968gmewwp6kyqpq6tx08mwy9vkkjen5s8ahy9e3x5z4dmykefvs7u6wex0s02puuskqv750mt perhaps having separate categories for LLMs allowed vs. banned would help with 90% of this problem? So ppl who want to use LLM can do so at their pleasure, and only ppl who actively want...
Григорий Клюшников · 3w
Asahi Linya (朝日りにゃ〜), I really hope that LLMs are a temporary phenomenon. Sure the local ones will remain even after the bubble finally bursts, but they're ridiculously bad, you do need millions of dollars worth of GPUs to get to that "it's still bad but it looks plausible" level of outp...