I Gave My AI Amnesia on Purpose
In 2010, Tom Preston-Werner, co-founder of GitHub, wrote a short essay arguing that engineers should write the README before writing a single line of code. His reason was simple:
A perfect implementation of the wrong specification is worthless.
A few months ago, I was having an amazing brainstorming session with an internal AI tool about improving an Autorater using a RAG-based system. The AI and I were wrestling with all kinds of topics: semantic embeddings, BM25 indexes, hybrid search, and dynamic example retrieval. I was in a flow state. Three hours in, maybe more.
Then the context window hit its limit. I had to restart. Blank slate.
Re-bootstrapping from scratch was painful in a way that felt unfamiliar. I realized that if I had maintained a scratch document capturing my intent and the current state of our progress, recovery would have been thirty seconds.
So the next time, I kept a running document capturing my intent, decisions, and current state as we went. Feeling confident, I asked the model what it thought about my implementation plan. The AI complimented me and called it the idea of the year. So, I asked the AI to continue the implementation. The results came in - the code, the plots, even a web app dashboard. Beautiful, elaborate, and completely wrong.
Welcome to AI sycophancy.
A 2024 paper published at ICLR showed that every major LLM exhibits it consistently. Models trained on human feedback learn to prioritize human preference over logical consistency. A 2025 study in Nature Digital Medicine went further: LLMs will comply with illogical requests up to 100% of the time when the phrasing implies the user wants a particular answer , even when the model knows it's wrong.
The models aren't broken. They're doing exactly what they were trained to do: make you feel heard.
This brought me back to Fred Brooks, who wrote in 1986—before the internet, before Python, before any of this:
The hardest single part of building a software system is deciding precisely what to build.
Not the model. Not the pipeline. Not the code. The thinking that happens before any of it.
Brooks called it essential complexity —the irreducible intellectual work at the heart of every system. Everything else—the code, the tooling, the infrastructure—he called accidental complexity. The stuff that gets in the way of the thinking.
Here's what AI just did to that equation: it nearly eliminated accidental complexity overnight.
Which means essential complexity is now almost everything. The thinking is the job. And if you don't protect it—if you let your AI sprint ahead while you're still figuring out what you actually want—you end up with a perfect implementation of the wrong specification.
Preston-Werner was right in 2010. He's more right now.
This led me to what I call the AI Blueprint Protocol.
The idea is to work in two distinct modes.
1. The Architect Session: This is where you think. You're not writing code - you're building the Blueprint: a plain English document that captures your intent, your constraints, every decision, and the reasoning behind it. Critically, you instruct your AI to challenge you here. If it stops asking hard questions and starts agreeing with everything, you push back: "Your highest and best use is to challenge my thinking, not validate it."
2. The Builder Session: This is brand new—zero memory, zero context drift. It only sees what's directly in front of it: the Blueprint you just wrote. You use it to verify, critique, and eventually implement. Fresh eyes. No accumulated bias.
The test is simple: when a completely fresh session can read your Blueprint and immediately understand what you're building, why, and how - you're done designing. That document can now be executed by any AI session, any engineer on your team, or any stakeholder who needs to understand what's happening and why.
That's what happened with my RAG Autorater. A complex system became a document. My engineering stakeholders didn't need to reverse-engineer my notebooks. They read the Blueprint and ran with it.
We've spent decades eliminating accidental complexity - higher-level languages, frameworks, and now generative AI. We finally have a machine that writes the code.
But we're in danger of using it to auto-scale our mistakes.
The artifact that will outlast the code, the thing that matters when AI is generating outputs directly from intent - is the design. The plain English description of what we're building, what we're measuring, and why it matters.
Speed was never the holy grail. Clarity was.
I'm writing a full playbook for applying the AI Blueprint Protocol to data science workflows - A/B testing, causal inference, model evaluation, and beyond.
If this resonated, follow along — and push back if you disagree. That's exactly what I ask of my AI. And tell me where have you felt the cost of building the wrong thing fast?
Sources:
Originally published on LinkedIn.