Carter Temm

On AI Detection

When you read something, you probably make a subconscious assumption. You assume a person sat down, thought for a while, wrote stuff, deleted stuff, restructured the rest, and hit publish when it felt ready. You don't think about this assumption because it's been true since we began carving portraits on the walls of caves.

Unfortunately, it isn't true anymore, and the tools we have to deal with this new reality don't work.

The contract

Peter Steinberger, a name that wasn't on my radar until mid-January, burst onto the AI scene and almost single-handedly built Clawdbot/Moltbot/Openclaw. On a recent episode of the Lex Fridman Podcast (super thought provoking episode btw) he mentioned that he's started blocking anything that "smells like AI." with absolutely zero tolerance.

The person who open-sourced a tool that can spew mountains of autonomously generated content across what were once human-only spaces... now blocks autonomous content on sight.

Anyone who has spent five minutes on the theatrical hellscapes that are Facebook or LinkedIn knows what he's reacting to. The hollow confidence, bullet-pointed wisdom, and unoriginal ideas painted as novelty and the most important thing you'll read that day.

Steinberger calls it a broken psychological contract, and I agree with this framing. When I realize I've been reading machine output, the feeling isn't "oh well, that was still useful." It's "I could have generated that myself." Regardless of how good or useful the content was to me, I walk away feeling played.

I read articles and blogs because I want to hear from people. When I just want an answer, I have AI subscriptions for that. I resent the environment we're creating when one pretends to be the other.

So: build tools that tell the difference. Around the end of 2022, GPTZero, Grammarly's AI detector, and dozens of others flooded the market.

This would be great... if they worked.

The detectors

These tools are pattern matchers, a lot like the LLMs that created the need for them.

They're built by ingesting text in two buckets: AI generated and human generated. From there they look for statistical signatures like word frequency, sentence structure, perplexity (or randomness), and structural variation. When writing is too smooth and predictable, it gets flagged.

GPTZero claims:

GPTZero has an accuracy rate of 99% when detecting AI-generated text versus human writing, meaning we correctly classify AI writing 99 out of 100 times. When testing samples where there's a mix of AI and human writing in one submission, we have a 96.5% accuracy rate.

I tested this. At first I was impressed. Then I fed in a rant I wrote years ago, long before the Generative Pre-trained Transformer was a concept. It scored positive for AI. Then I ran some actual AI output through the same detector: "Likely human."

Plenty of humans write predictably. AI is getting better at not writing like AI. These trends only go one direction.

Meanwhile, anyone who wants to beat detection can do it in about thirty minutes, less if they really know what they're doing.

How to beat an AI detector

Wikipedia has been dealing with a flood of AI-generated edits. Most are "AI slop", which is to say poorly sourced, factually wrong, generated without thought and shoved into the world because creating it costs nothing.

In response, the community at Wikipedia put together a comprehensive guide to spot signs of AI writing as part of their AI cleanup effort. It catalogs a bunch of tells across vocabulary (apparently AI overuses words like "delve," "tapestry," "pivotal moment"), structural patterns (rule-of-three lists, em dash abuse), and tone (promotional filler, sycophantic openers). It's definitely worth reading if you want to understand why AI text feels off to us.

Of course, the guide is public because it's on Wikipedia, meaning it's in the training data too.

I took that list, handed it to a model that isn't GPT, and told it to avoid every pattern on the page and package the result into a reusable skill. The output didn't trip a single tell.

Apparently I'm not the only one to have this idea. Within the same hour I found someone who'd already packaged this as a shareable tool.

If nothing else, there are GPTZero MCP servers now, so you can have Claude Code/Cowork (anything but ChatGPT) loop its way to perceived humanity. Just generate some text, check the score, revise, repeat until it passes.

I also tested a few of the free AI humanizer tools that have popped up. Half of them didn't even work. The others passed the detection check, but they do it by flattening everything and removing the voice until what comes out feels like it has less color overall. This color is what defines human writing. It's not always clean or presentable. "What if I just, like, really like using the word like, man?"

This is a defeating cycle through and through. AI generates text with obvious patterns. People document those patterns. The documentation becomes training data. AIs learn to avoid the patterns. Detectors catch less. New patterns get identified. With each iteration, the gap narrows.

I think we have about a year before detection becomes impossible. Probably less.

The merge

The detectors don't assume a clear binary, but the humans making decisions based on their output certainly do. You either used AI, or you did not. If your text is flagged, it must have been generated at least in part by a computer. That binary is dissolving from both directions.

The AI side is obvious. Models keep getting better at producing human-sounding text by default. Implement a few of the tricks above and you're done.

The human side is weirder. We're reading AI-generated text all the time now (in emails, documentation, social media posts, and articles that may or may not have a person behind them). My totally unsubstantiated theory is that since we're prone to absorbing the patterns of what we read, this exposure is steadily turning AI-isms into human usage too. Linguists call this process convergence.

Maybe my GPTZero experiment wasn't showing that the detector is bad at its job, but that the categories don't hold up anymore.

I use AI tools constantly. A lot of what I write starts with me throwing back-of-a-napkin thoughts at Claude, and we go back and forth as sparring partners until I know what I want to say. Other stuff I write from scratch. Even the "from scratch" stuff is shaped by years of interacting with AI output, so the line between these processes is blurrier than one might think.

There's an asymmetry here. People freely admit to using AI to write code. Nobody cares unless the code sucks. Admitting you used AI while writing prose on the other hand feels like confessing to cheating. I think it's because we treat writing as proof of thought and code as proof of function. When AI touches the prose, it feels like fraud in a way that AI-assisted code doesn't.

Perhaps the fraud isn't in the tool as much as in the absence of thought. Others have said this better than I can.

David McCullough (American historian and two-time Pulitzer winner) put it well: "Writing is thinking. To write well is to think clearly. That's why it's so hard." Leslie Lamport, the father of distributed computing and initial developer of LaTeX, said something similar: "If you're thinking without writing, you only think you're thinking."

The tool doesn't change whether you're thinking. I only care whether you cared enough to work through what you were saying and make it yours.

Steinberger's approach (explicitly labeling AI agents so people can choose what to engage with) is closer to a real answer. It's not a solution, but it's significantly better than any detector. Just tell people what's AI and let them decide.

We're nowhere near that being standard. For now we're stuck with detectors that don't work, generators that keep improving, and people getting falsely accused of being machines when they turn in their homework.

That last one is worth sitting with. Tools like Turnitin are now baked into the academic pipeline at most universities. When a student's work gets flagged as AI-generated, they often have to retake the course and pay for it again.

To be clear, AI-assisted cheating is a real problem that institutions need to solve, but not this way. The present situation is that the accused student is asked to prove a negative (that they didn't use AI) and the burden falls entirely on them. The adjudicator is the same institution that collects tuition if the student has to retake the course. I'm not saying there's extensive precedent of institutions exploiting this, but the financial incentive exists, and it's nearly impossible for a student to challenge.

My underlying point is that the tools that were supposed to protect human writing have rapidly become a way to punish human writers who happen to conform to a certain style. That's not a detection problem; detection is a ship that sails farther away with every passing day. It's a problem with the way we're planning for a future that we don't yet understand.

Further Reading

Semantic ablation: Why AI Writing is Boring and Dangerous