I'm a staff-level software engineer with twenty years of experience, and I'm basically a hallucinating language model with a small context window and unreliable fact recall.
That's not self-deprecation. It's a design constraint.
Breadcrumbs
For as long as I can remember, I've been leaving breadcrumbs everywhere. Issue trackers, readmes, changelogs, commit messages, chat messages to myself scattered across whatever app or server I happened to be standing in. I once journaled in book and quill on a Minecraft server because that was the application I had open and the thought needed to go somewhere before I lost it.
There was never a grand system. Just a hard-earned understanding that if it's not written down, it doesn't exist. I get distracted, move on, and the context evaporates.
I tried every respectable version of "be more disciplined about note-taking." It never stuck. Writing documentation in the middle of solving a problem is context-switching away from the problem, and for someone with a small context window, that switch is expensive enough to be destructive. I lose the thread. Every time.
So for twenty years I left a trail without knowing where it led.
The missing piece was never capture. It was readback.
Readback
Sometime in the last year, frontier models crossed a threshold that mattered to me more than benchmark scores: they got good enough to ingest a pile of structured artifacts and reconstruct the context I'd lost between sessions. The breadcrumb habit I'd been maintaining for two decades suddenly had a payoff mechanism.
Then the second threshold hit, and this one mattered even more: the models got good enough to write the documentation too.
That's what made it actually work. Capturing artifacts by hand always failed me: it was boring, interruptive, and easy to defer. Agents don't have that problem. I do the work. The agent captures what happened. Future-me shows up three days later and says "brief me," and another agent reconstructs context from the docs, issues, commit history, and review findings the first one left behind.
I'm not in the documentation path anymore. I look at output. I ask questions. I course-correct. I approve or push back. The clerical layer is mostly gone.
Invisible work
The work itself is less changed than people seem to expect.
For years I was a prolific implementer. I built tools people adopted and depended on. They worked, but they often lacked the rigor and scaffolding that make systems durable.
Later, especially at larger scale, the center of gravity shifted. More and more of my time went to dissecting intractable problems spanning interconnected systems. Finding the fracture that requires coordination across six teams to resolve. Figuring out which system is lying to you when the dashboards say everything's fine while the users say it isn't.
I still implemented. I'd write error classifiers, or work out complex recovery workflow ordering, partly to triage and partly because I needed to understand the domain deeply enough to articulate better requirements for the people building the larger system. But the thing I spent the most time on was holding a broken system in my head just long enough to see where the fracture was, then convincing people who own different pieces to fix it together.
That kind of work can feel oddly invisible. Somebody else lands the bigger fix. Somebody else closes the ticket. You surface the fracture, maybe build the narrow piece that makes the problem legible, and then watch the visible output happen elsewhere.
The agents haven't changed the core of the job. They've mostly stripped away the clerical parts that were never the bottleneck.
Practice, not orchestration
So I got specific about the plumbing. If the whole point is to stay at the gate and make good calls, the stuff flowing through the gate has to be reliable.
There's a popular vision of agentic development where you throw thirty agents at a problem and let the robot mayor run the factory. That's not what I'm doing.
What I built is not an orchestrator. It's a practice.
- briefing reconstructs context.
- develop does phased work with explicit gates.
- code-review verifies independently.
- land closes the loop: commits, tracking, handoff notes, recorded wins.
- edit-skill lets the system modify itself.
I use multiple agents in parallel, but the parallelism is across phases, not tasks. Planning for one project, implementation for another, review for a third. Different phases have different cognitive profiles, so they get different models. Judgment where judgment matters. Execution where execution matters. The human stays at every gate. Agents do most of the building. I make sure what they build is right.
Feedback loops
The code-review skill runs three parallel reviewers with different lenses: correctness, design, and architecture+security. They don't know the implementation agent exists. They produce findings tagged by severity. Then a coordinator verifies every critical and important finding against the actual code, because LLM reviewers hallucinate.
Both false positives and missed findings break trust in different ways. One creates noise that distracts from signal. The other is signal you can turn into better practice.
When a finding reveals a pattern rather than a one-off bug, that pattern gets written to persistent review memory. The next review reads that memory before it starts.
The system gets better at reviewing code every time it reviews code.
That same loop runs upstream. The develop skill has a checklist distilled from prior review findings: security guard consistency, lazy resource acquisition, cleanup on error paths. Each item exists because it previously caused a real failure and a fix-and-re-review cycle. The system is teaching itself to catch recurring classes of mistakes earlier in the pipeline. Rules like "simplicity is a security property" and "plans describe intent, not contracts" didn't start as principles. They started as bugs.
Even the flight log, the end-of-session record of what I built and shipped, applies the same idea. Continuous improvement, but pointed inward. Reflection as a feedback loop. Keeping a clear head about what I'm doing and where I'm oriented.
The skills are backed by boring but necessary infrastructure: semantic memory stored as markdown in git, policy-gated shell execution, notification hygiene, tracked tasks, PRs as the unit of change, branch protection even when I'm working alone, linear history, templatized CI enforcing format/lint/test gates on every push. None of this was designed as a grand platform. Each piece exists because I hit a specific gap and got tired of working around it.
None of it is super polished. But the bones are there, and I know where I'm headed and what I want. I'm constantly iterating on the skills and the subsystems underneath them, partnering with agents in the refinement process itself. The practice is part of its own feedback loop.
Completing the circuit
The interesting part is what happened when the pieces started reinforcing each other.
I feel more productive now than I have in a decade, but not in the generic "AI makes coding faster" sense. The multiplier hit hard because I already had the instinct. Twenty years of leaving breadcrumbs taught me that externalizing is how I think. What changed is that the notes are no longer write-only. What I learn in one session feeds the next. Review findings become development checklists. Development patterns become review criteria. The system gets better because I use it, and I get better because it remembers what I forget.
The joke version is that I'm a hallucinating language model with a small context window. The real version is that I spent a career compensating for the shape of my mind, and only recently got tools that could close the loop.
The workflow works because it was built around what I actually am, not around an imaginary, more disciplined version of me.
The tools referenced here are open source at github.com/butterflyskies.