My point today is that, if we wish to count lines of code, we should not regard them as
"lines produced" but as "lines spent"
EWD 1036
Inspired by the avalanche of coding agent endorsements from legitimate coding experts like Liz, Steve, Salvatore, Mitchell, Addy and obviously Simon and Andrej - but also by the excellent o16g outcome engineering manifesto (via Charity), I felt it may be interesting to explain my view on two sides: o16g coming from integration engineering, and architecting for cost. I am not trying to come up with general ideas here - there is excellent work on making sense of agents, like Birgitta's Context Engineering for Coding Agents, already.
Beginning with the second one, cost - I always knew death of (capital-A) Agile would wear a familiar face, but not the one of (capital-A) Architecture. o16g says "The Backlog is Dead" but also the sudden swing back to Spec engineering, instead of Architecture and Management we just call it Plan Mode, is somewhat ironic, and maybe a sign of the wider socio-political times where compliance and loyalty are seen somehow equal with ethics and safety. Anyways, o16g goes on: "Cost not time". I am not nostalgic about coding, but what surprised me with the epiphany posts of coding experts above was, obviously they have no concern whatsoever for money, especially when calling out that only newest frontier models work. Cost is seen as just another metric, requiring multiple Claude accounts is the humble brag of the hour. It's hard to burn fewer than $100 USD credits a day, which makes a Max 20x plan for $200 USD per month seem a good investment. Certainly for corporations, given that's roughly the salary of an offshore software engineer per day, but for individuals? On the surface it is, and I admit vibe coding helps me a lot every day, but at the same time I feel this lavishness sounds too much like an anthropomorphized man month / lines of code metric, showing off how much can be produced rather than asking: why?
(the next paragraph is a bit ranty, you might want to skip it)
One of my side projects, a medium complex data pipeline that had time pressure, used ~700M tokens against ~700K tokens of code, a factor of 1000, in a week. Yes it's as much as other hobbies, like an expensive gym membership or gambling, but it's still a product for the privileged, where power and influence stand above money, just as addictive and even less sustainable. If you are not already privileged, like I am, you have to use cheaper tools such as OpenCode with Ollama, but that won't boost your productivity just as much and probably just intensifies your work. Never has leverage of privilege and virtue been clearer in our industry. Instead of Jevons Paradox we should talk about frontier models and disposable software in the same way as other hobbies, as Veblen goods and these endorsements as tantalizing, especially when they go along with status and virtue signalling in society: "I can spend on AI, I can build resonance far away from the environmental damage by Ralphing resources, I can communicate via AI, because through my well paid job I am virtuous - my reach, my karma, is justified to be boosted".
Something big is happening for people who believe they can grace humanity with just another app, but as long as Anthropic's AI powered customer service still can't figure out my overbilling issue two weeks in and their 3 distinct billing systems are mutually incompatible it's clear they can't manage human bottlenecks either. Exploitation has reached its peak and skilllessness returns to its feudal origins where we have to be grateful to only contribute to secret police and not directly to carte blanche military use. The junior developer doesn't get a revenge because they are poor.
Sorry for the rant. So what are my patterns for saving cost and trying to stay sustainable?
I agree using the latest thinking models is unfortunately mandatory for many tasks, anything that's non-linear. I had Claude write undeletable files to my filesystem when stumbling to set a parameter with "--", kernel panic restart my Mac (without remembering, obviously), introduce a performance regression because it wrote the experiment result to file A but read from file B, destroy layouts by not understanding what a grid is, fork a library instead of suggesting to update it to the latest version, come up with endless loops while at the same time initializing a large transformer model inside the loop - and so on and so forth. And I don't even talk about the general issues with folders, modules, the bias to write more code and the tendency to create a garbage dump in root with a strong tendency to ignore rules especially due to context drift in long sessions (yes I still feel a codebase needs to be human readable eventually in addition to observability - ops bias). This class of weird linearity bias errors, out of all reasoning failures, can be significantly reduced by using frontier models with large context to build a harness or "doing the work twice".
So here is my current workflow with ~3 instances of Claude, plus agents, running in parallel, trying to go by universal principles (but skipping the obvious like one session per task, planning, review plan like a staff engineer etc):
- I use flagship frontier models like Opus for planning, and that means I always let them write documents, rarely execute. Planning here means anything ordered, i.e. that targets a larger codebase, span of time, hierarchy or spatial understanding beyond linear steps. In other words anything with a larger than single task scope and can also mean documentation, visual design or refactoring like splitting a module. Experiments are multi-step so also planning - in all these cases I ask for a clear, stored writeup on main with specific file references and then restart context to avoid compaction or rate limiting in the middle of reasoning. It also helps human reasoning to not do something, or to simplify. I can decide if a refactoring is worth it, or if something really needs to be platform, framework or service and fan these out as experiments on separate branches. Refactoring like renaming a variable or method consumes a huge amount of tokens, are slow and unreliable - I often use IDE functions for that instead to be sure (until we have better tools).
- Cost effective models like Haiku I use for anything highly localized, for example agentic background tasks (especially when based on skills, e.g. writing a "evaluate experiment using local Python environment" skill), running batches, tests or experiments, configuration changes or moving something trivial without type dependencies (because Claude just cannot stop using the root folder, regardless of how many rules one defines). Usually I use Opus to write a plan and split it in stages for different types of models - the ones for Haiku are usually also fine to be used with offline models, which is useful from a data governance perspective. I execute them on a different branch and occasionally ask Opus to judge the branch implementation. This plan I usually ask to persist in a gitignored folder with a running version number that experiments can refer to.
- "best of both worlds" models like Sonnet I use the least, usually it's for smaller standalone tools like a quick data conversion or crawler, but I rarely see the benefit over cost effective models. I do start with Sonnet on a new task though because I feel the extra context helps with cost saving rules such as "always activate the Python3.14 environment in /venv first and only use existing dependencies" which saves tokens by reducing double work. Usually for these tasks I run them on a separate branch as the models still ignore rules. Sometimes I use them to summarize a larger task for a smaller model or make slightly bigger changes, like writing a changelog or skills after a session. But even writing a claude.md memory often missed important detail and is best done with flagship frontier models given the outsized downstream impact. Maybe adaptive thinking will solve that?
- If possible I use offline models, preferably Qwen3 via Ollama when I know I will be offline, asleep, intermittent connected; when it's a quick question like a bash script that I want to be mindful of resources; or when I want to make sure no data is sent to a Cloud API, for example to investigate local data or test data, or even when looking at someone else's codebase. I tend to keep data and project specific documentation away as I don't trust local security settings and external libraries or code. Here and there I spin up a VM or Container with a snapshot to dangerously skip permissions aka yolo.
- For inspiration I often use other models like Gemini Deep Research (in NotebookLM) or even
ChatGPT*especially when it feels like a conversation is more helpful to arrive at a solution, e.g. in researching for a certain algorithm with tradeoffs - occasionally I use a "best of both worlds" model to write a summary (e.g. architecture.md) for a certain component to send to another tool in a conversation format making sure any identifying or sensitive data is removed.
Cost obviously connects to integration engineering and field engineering - we never had the luxury of strategic moonshot product budgets, like we also never had the luxury of nostalgia or romanticism for code - we kill with fire. As Benedict Evans said "there’s a huge gap between what looks cool in demos and all of the work and thought in the interaction models and the workflows in the actual product". Even though AI gives us as per Chad Fowler "industrial generation immediately. It does not automatically give us industrial regeneration", i.e. yield. Code does not create value in itself. Yes, you can reduce some engineering cost, but technology companies praise themselves on creating products with value, not on racing cost to the bottom. Which brings us back to the point of asking "why" despite code being cheap now. Integration engineering is about a sense of product, about stability, user trust. Anthropic's 3 broken and incompatible billing systems can't use an excuse of "move fast and break things" anymore - it's a sign either of a product decision or that they have no control over their own technology. In both cases they outsource the risk to their users (not their investors who care about MRR) via "all credits just expire" rules. That's consumer product thinking, not a hack you can play on corporate users over the long term (hype aside), you can't just outsource the risk to the pointy end of the integration and then refuse support, stability and quality.
The death of Agile comes with the face of Architecture because it's more important than ever to consider Technical Debt and Cognitive Debt, in the same way that Task Isolation is positioned against Context Drift. Design decisions become brutally visible, and products expose their inner working, a material witness to business priorities, where cost of code is no excuse anymore. Charity is spot on that observability (o11y) is now the primary tooling to establish, monitor, experience, learn from and feed feedback loops. It never was Architecture of code, it always was Architecture of the product. Your user experience, and in probabilistic software user experience is your reliability - the future of engineering was always SRE.