Saturday, 28 February 2026

The death of Agile comes with the face of Architecture

My point today is that, if we wish to count lines of code, we should not regard them as
"lines produced" but as "lines spent"

EWD 1036

Inspired by the avalanche of coding agent endorsements from legitimate coding experts like Liz, SteveSalvatore, Mitchell, Addy and obviously Simon and Andrej - but also by the excellent o16g outcome engineering manifesto (via Charity), I felt it may be interesting to explain my view on two sides: o16g coming from integration engineering, and architecting for cost. I am not trying to come up with general ideas here - there is excellent work on making sense of agents, like Birgitta's Context Engineering for Coding Agents, already.

Beginning with the second one, cost - I always knew death of (capital-A) Agile would wear a familiar face, but not the one of (capital-A) Architecture. o16g says "The Backlog is Dead" but also the sudden swing back to Spec engineering, instead of Architecture and Management we just call it Plan Mode, is somewhat ironic, and maybe a sign of the wider socio-political times where compliance and loyalty are seen somehow equal with ethics and safety. Anyways, o16g goes on: "Cost not time". I am not nostalgic about coding, but what surprised me with the epiphany posts of coding experts above was, obviously they have no concern whatsoever for money, especially when calling out that only newest frontier models work. Cost is seen as just another metric, requiring multiple Claude accounts is the humble brag of the hour. It's hard to burn fewer than $100 USD credits a day, which makes a Max 20x plan for $200 USD per month seem a good investment. Certainly for corporations, given that's roughly the salary of an offshore software engineer per day, but for individuals? On the surface it is, and I admit vibe coding helps me a lot every day, but at the same time I feel this lavishness sounds too much like an anthropomorphized man month / lines of code metric, showing off how much can be produced rather than asking: why?

(the next paragraph is a bit ranty, you might want to skip it)

One of my side projects, a medium complex data pipeline that had time pressure, used ~700M tokens against ~700K tokens of code, a factor of 1000, in a week. Yes it's as much as other hobbies, like an expensive gym membership or gambling, but it's still a product for the privileged, where power and influence stand above money, just as addictive and even less sustainable. If you are not already privileged, like I am, you have to use cheaper tools such as OpenCode with Ollama, but that won't boost your productivity just as much and probably just intensifies your work. Never has leverage of privilege and virtue been clearer in our industry. Instead of Jevons Paradox we should talk about frontier models and disposable software in the same way as other hobbies, as Veblen goods and these endorsements as tantalizing, especially when they go along with status and virtue signalling in society: "I can spend on AI, I can build resonance far away from the environmental damage by Ralphing resources, I can communicate via AI,  because through my well paid job I am virtuous - my reach, my karma, is justified to be boosted".

Something big is happening for people who believe they can grace humanity with just another app, but as long as Anthropic's AI powered customer service still can't figure out my overbilling issue two weeks in and their 3 distinct billing systems are mutually incompatible it's clear they can't manage human bottlenecks either. Exploitation has reached its peak and skilllessness returns to its feudal origins where we have to be grateful to only contribute to secret police and not directly to carte blanche military use. The junior developer doesn't get a revenge because they are poor.

Sorry for the rant. So what are my patterns for saving cost and trying to stay sustainable?

I agree using the latest thinking models is unfortunately mandatory for many tasks, anything that's non-linear. I had Claude write undeletable files to my filesystem when stumbling to set a parameter with "--", kernel panic restart my Mac (without remembering, obviously), introduce a performance regression because it wrote the experiment result to file A but read from file B, destroy layouts by not understanding what a grid is, fork a library instead of suggesting to update it to the latest version, come up with endless loops while at the same time initializing a large transformer model inside the loop - and so on and so forth. And I don't even talk about the general issues with folders, modules, the bias to write more code and the tendency to create a garbage dump in root with a strong tendency to ignore rules especially due to context drift in long sessions (yes I still feel a codebase needs to be human readable eventually in addition to observability - ops bias). This class of weird linearity bias errors, out of all reasoning failures, can be significantly reduced by using frontier models with large context to build a harness or "doing the work twice".

So here is my current workflow with ~3 instances of Claude, plus agents, running in parallel, trying to go by universal principles (but skipping the obvious like one session per task, planning, review plan like a staff engineer etc):

  1. I use flagship frontier models like Opus for planning, and that means I always let them write documents, rarely execute. Planning here means anything ordered, i.e. that targets a larger codebase, span of time, hierarchy or spatial understanding beyond linear steps. In other words anything with a larger than single task scope and can also mean documentation, visual design or refactoring like splitting a module. Experiments are multi-step so also planning - in all these cases I ask for a clear, stored writeup on main with specific file references and then restart context to avoid compaction or rate limiting in the middle of reasoning. It also helps human reasoning to not do something, or to simplify. I can decide if a refactoring is worth it, or if something really needs to be platform, framework or service and fan these out as experiments on separate branches. Refactoring like renaming a variable or method consumes a huge amount of tokens, are slow and unreliable - I often use IDE functions for that instead to be sure (until we have better tools).
  2. Cost effective models like Haiku I use for anything highly localized, for example agentic background tasks (especially when based on skills, e.g. writing a "evaluate experiment using local Python environment" skill), running batches, tests or experiments, configuration changes or moving something trivial without type dependencies (because Claude just cannot stop using the root folder, regardless of how many rules one defines). Usually I use Opus to write a plan and split it in stages for different types of models - the ones for Haiku are usually also fine to be used with offline models, which is useful from a data governance perspective. I execute them on a different branch and occasionally ask Opus to judge the branch implementation. This plan I usually ask to persist in a gitignored folder with a running version number that experiments can refer to.
  3. "best of both worlds" models like Sonnet I use the least, usually it's for smaller standalone tools like a quick data conversion or crawler, but I rarely see the benefit over cost effective models. I do start with Sonnet on a new task though because I feel the extra context helps with cost saving rules such as "always activate the Python3.14 environment in /venv first and only use existing dependencies" which saves tokens by reducing double work. Usually for these tasks I run them on a separate branch as the models still ignore rules. Sometimes I use them to summarize a larger task for a smaller model or make slightly bigger changes, like writing a changelog or skills after a session. But even writing a claude.md memory often missed important detail and is best done with flagship frontier models given the outsized downstream impact. Maybe adaptive thinking will solve that?
  4. If possible I use offline models, preferably Qwen3 via Ollama when I know I will be offline, asleep, intermittent connected; when it's a quick question like a bash script that I want to be mindful of resources; or when I want to make sure no data is sent to a Cloud API, for example to investigate local data or test data, or even when looking at someone else's codebase. I tend to keep data and project specific documentation away as I don't trust local security settings and external libraries or code. Here and there I spin up a VM or Container with a snapshot to dangerously skip permissions aka yolo.
  5. For inspiration I often use other models like Gemini Deep Research (in NotebookLM) or even ChatGPT* especially when it feels like a conversation is more helpful to arrive at a solution, e.g. in researching for a certain algorithm with tradeoffs - occasionally I use a "best of both worlds" model to write a summary (e.g. architecture.md) for a certain component to send to another tool in a conversation format making sure any identifying or sensitive data is removed.

Cost obviously connects to integration engineering and field engineering - we never had the luxury of strategic moonshot product budgets, like we also never had the luxury of nostalgia or romanticism for code - we kill with fire. As Benedict Evans said "there’s a huge gap between what looks cool in demos and all of the work and thought in the interaction models and the workflows in the actual product". Even though AI gives us as per Chad Fowler "industrial generation immediately. It does not automatically give us industrial regeneration", i.e. yield. Code does not create value in itself. Yes, you can reduce some engineering cost, but technology companies praise themselves on creating products with value, not on racing cost to the bottom. Which brings us back to the point of asking "why" despite code being cheap now. Integration engineering is about a sense of product, about stability, user trust. Anthropic's 3 broken and incompatible billing systems can't use an excuse of "move fast and break things" anymore - it's a sign  either of a product decision or that they have no control over their own technology. In both cases they outsource the risk to their users (not their investors who care about MRR) via "all credits just expire" rules. That's consumer product thinking, not a hack you can play on corporate users over the long term (hype aside), you can't just outsource the risk to the pointy end of the integration and then refuse support, stability and quality.

The death of Agile comes with the face of Architecture because it's more important than ever to consider Technical Debt and Cognitive Debt, in the same way that Task Isolation is positioned against Context Drift. Design decisions become brutally visible, and products expose their inner working, a material witness to business priorities, where cost of code is no excuse anymore. Charity is spot on that observability (o11y) is now the primary tooling to establish, monitor, experience, learn from and feed feedback loops. It never was Architecture of code, it always was Architecture of the product. Your user experience, and in probabilistic software user experience is your reliability - the future of engineering was always SRE.

*) uhm no, nevermind

Friday, 26 September 2025

Hope is not a strategy, it's a mission

I came across an interview* covering Palantir's Forward Deployed Engineer (FDE) model, which pointed to a trend I've definitely observed: Post-ZIRP many startups, specifically AI startups, didn't only scale with revenue very quickly, treating their business model with as much priority as their engineering work, but alongside revenue also treating their users as first citizen, and working much closer with them, through outbound-focused product management, deployment engineers, field engineers, solution engineers and similar roles. Actually one of the reasons I didn't continue my series on Professional Services a few years ago (yet) was precisely this shift.

At one point in the interview the interviewee says "that's services ... historically something you wanted to minimize". Amongst a lot of questionable points, this one is valid. Until Cloud became really big in the 2010s, Professional Services in general, and Consulting specifically, was seen either as something dirty (a cost center) or, worse, just another revenue driver (a profit center) - which often conflicted with the product due to its high margin contribution. Services heavy organizations often took hefty maintenance fees (around 20%) and on top of that required expensive business consulting services, often making it look like the product was not accidentally, but very much intentionally, complex - regardless whether it was essential or not. I think it changed with Cloud because it was the simplicity of AWS services, both in terms of integration and pricing model, that removed these entry barriers. Consequently, they turned Professional Services into a strategic cost center, an enabling function where margin was less important than adoption, and in turn reducing friction. That didn't mean it would be free (I strongly believe it should never be, but that's another post), but it was margin neutral. In such setups, SaaS businesses would often track the need for services as "sub-linear", growing slower than revenue, reflecting essential complexity.

While Palantir made the FDE model famous, around 2010 when it was called Delta (seriously, what's up with those military references, especially when you don't want to be the one Deltas land on), SRE had long been a thing inside Google with a similar user-first but less heroical mindset, and alongside it CRE not much later, just when I joined. Google at the time was not known to be a specifically user first company in the sense of being user specific, it had to be focused on extreme scale. Ease of use was important, but in the sense of universal accessibility. So it was an interesting cultural change to work much deeper with users. There wasn't even a place or process for SCEs to write code on the boundary between Google3 and user's systems, and when I was working in the BigQuery (Dremel) codebase, introducing use-case-, or even user-specific flags was heresy. AWS by that point in time had long written user-specific code, easier to do thanks to their multi-tenant architecture, often running custom tuned versions of a product for large users to build new features before rolling them out. In other words, in my mental model (I have no insight into either Palantir nor AWS), FDE was just one of the possible reactions to the "services over product" and outsourcing (BPO) trend of the late 1990ies / early 2000s, and fits into the countermovement started by SRE and culminating with AWS CAF around 2014 or so.

FDE was unique in one element, though - travel on-site. The Cloud providers wanted to prove, maybe too much, that local infrastructure is obsolete. Throwing out the baby with the bathwater, they missing that localization in itself had value, more often of than not being late to local markets and enterprises - a wedge that Azure was able to leverage, especially in Europe and the middle east. I see the current FDE hype more in line with Return to Office (RTO) mandates, where Microsoft is also setting a sad precedent. In a rare rejection of scientific findings about increased productivity, RTO is purely there to set the vibe "we don't trust you", burying competition and discontent in loyalty. Through this lens, FDE embodies the current Post-Twitter "working hardcore" style, with famously AI startups pushing for 007 work hours and embracing weird ideas about masculinity as some kind of "reaction" to DEI programs - when these programs, alongside remote work, actually proved successful. Tech is clearly irrationally reacting to changed politics and picking the prevailing loyalty over essence vibe.

I am glad for the DEI programs I participated in, and I don't miss high school gyms. I prefer scientific, rational, collaborative, unbiased and ultimately optimistic work. What made FDE unique was listening to their users wherever they are. Being users first on their terms. But the important part was listening - that's what FDE, SRE and CAF have in common. And while each had an edge in one or another area, for example Azure is clearly lacking the resilience of SRE culture, it's what they have in common that proves the success. They all reacted to a mindset that forgot the product, long term commitment, over short time margins, they forgot the strategy in exchange for tactical. Applying either of these models is not about more exclusion, taking shortcuts, it's about more inclusion, doing the hard work, staying with the trouble. That's a loyalty I can get behind, collaboration. Not sitting in front of your boss waiting for instructions. So please, don't turn into a military contractor, swinging war terms around and pretending to be a drill instructor. You're making the same mistake, forgetting a better product over short-term margins. Try to build products as if lives depend on it, not as if your goal is to destroy life. In SRE we used to say, "Hope is not a strategy". That's true - Hope is a mission.

*) which I am not linking because it was a tasteless piece of military propaganda normalizing how PayPal, Palantir, OpenAI and the US Army fit together - I really wonder if in the current spirit of renaming they soon call themselves War Combinator

Sunday, 30 October 2022

Modern Professional Services - Part 1 - Types of Consulting Organizations

I am giving a talk at SRECon '22 APAC "Deploying Humans at the Edge of SRE". It's meant as inspirational, so I'm starting a 3-part series on Professional Services. The bigger context is the discourse on interesting career paths into tech, which I've been working on joining Google, particularly with my Awsome Tech Roles compilation (please contribute!).

A series on Professional Services

In this 3-part series I'd like to introduce "Professional Service" organizations in tech product companies. In times of standard convergence and consolidation, recession and and "cash is king", it's easy to observe product companies pivoting to offer services, usually paid, as a way to ensure success of implementation and provide additional value-add and thought leadership if products differences are minimal. But implementations can vary widely, and it's hard to compare how they work.

I'm excluding pure managed services, outsourcing and consulting firms and the traditional industry (in-house consulting, ops) here, same as in my Tech Job Titles List. Product implementation may be outsourced in some tech firms and therefore be relevant, but specifically in tech product firms is either internal effort based ("what can I get done with the SWE cycles I have") or covered by a "partner" organization who scopes projects in a similar way to what I am about to explain.

This series will be split in 3 parts:
  • Part 1 on Types of Consulting Organizations
  • Part 2 on Estimation, and Fordian vs Bayesian Timebooking
  • Part 3 on Impact Metrics, including Cost
To be clear, this is neither driven by Stripe or Google, not represents Stripe's or Google approach. It is purely based on my observations when talking to my peers - and given I am not in Silicon Valley likely highly biased on not canonical at all.

Types of Consulting / Professional Services Organizations within Product Companies

When I say Professional Services I generally mean paid post-sales (post go-live / post-commit) services provided from the same company that built a product to the buyer or user of the product. That services goes beyond standard case-based support and troubleshooting or continuous customer success plans, and is defined by a project scope and business impact, not a product pipeline or effort SLA, and not billed outsourcing-style "time and materials" (like, say, an adoption-focused Customer Success Manager or Resident Architect or extended workbench).

Here is how I see the 4 extreme types of professional services organizations in tech product firms - this may sound negative, but the goal here is to show extreme cases, ymmv:

  1. Separate organization or profit center with completely separate P&L contribution, often even a different brand or subsidiary with different contracts and benefits. Usually these are product firms with very strong partner ecosystem, making sure their consulting arm stays "independent", often due to a strong license lock-in ("stickiness") and fixed-support-fee business that does not depend on a SasS / usage pricing model. In other words customer's success is not actually relevant post-sale, customers may even be forced to pay a maintenance fee which puts the professional services organization close to "customer operations". As separate organization, they may often do real implementation and delivery work. Therefore, individuals are often incentivized in their performance process purely based on P&L, e.g. chargeability metrics or thought leadership generating leads and opportunities. Due to this it is not uncommon this is actually a different entity with different contracts and employment plans.

    IBM Global Services used to be the classic example, SAP Advisory Services and, until recently, Microsoft are others, but there is a general trend to blur the lines with the shift to SaaS and customer success metrics. Not even outsources prefer this model anymore as capacity planning is hard (more on this in the next post on estimation).

  2. Separate team within a larger organization, usually either the Sales / Go-to-Market or Support / Customer Success organization, aligned to their goals but different methods. The lucky invariant is a prototyping, solution engineering or deployment engineering team, embedded in the rollout motion but connected to initial support. The less lucky invariant is an intermediate stage to (1), as the KPIs / metrics / OKRs need to be hacked to fit a fundamentally different culture e.g. via profit or revenue share in a value-selling approach. Therefore, individuals are often defined by roles rather than reporting lines, and incentivized in their performance process on metrics they have little control over e.g. product upsell or utilization. This may lead to frequent re-organization, strategy and leadership changes and associated employee retention issues.

    Usually these are SaaS usage pricing companies that at the same time have lock-in potential and therefore see professional services as a retention channel. The classic examples used to be VMware, Salesforce and ServiceNow. Cloud providers may fall back into this as default or crisis mode (as they often don't have natural "stickiness" e.g. license leverage, at best commit contracts), and while aspiring to profit centers in reality often turn it into a cost center by discounting or waiving cost for services e.g. for migration.

  3. Horizontal enabling team, role, community of practice or speciality organizations with fluidity and some fuzzy lines to presales / solution engineering, developer relations, evangelism or support. Usually these are SaaS / usage pricing companies with real long-term customer success vision beyond quarterly goals. Often for more experienced / senior individual contributors like principal architects; individual contributors who are ok with swarm-style loosely defined reporting lines and define their own specialization e.g. holocracy. In the extreme case a catchall for all work not otherwise clearly defined even to the extent of building unofficial tools and integrations. Therefore, individuals are often incentivized in their performance process on metrics that represent this long-term vision like customer or partner happiness, public engagement, IP / asset generation or product enablement (e.g. sublinear support growth). In this model professional services may be close to developer relations or field engineering.

    Often de-facto operated "cost neutral" as cost or investment center and therefore smaller than in comparable firms. Amazon with its very independent team, customer obsession and embedded roles is the classic example. And every provider claiming to do "Digital Transformation" pretends to be here (e.g. Microsoft is moving from 1 to 3).

  4. A rare hybrid between type 2 and 3 are professional services organizations aligned to the product feature engineering org, for instance combined with Product Ops for customer empathy (also here) strategic customer advisory and lighthouse customer launch support, or with DevRel to cover OSS tools and integrations. In the best cases these are actually rotation-based, starting with a residency program. Often they are smaller organizations where "everything customer facing" is grouped e.g. into Field or Deployment engineering.

    Often these are data companies who build very specific use cases and can monetize their reuse, e.g. Palantir, Databricks and Looker, also Google used to be like this. A recent thread shows Color seems to do Solution Engineering this way (the right way). Seen as an investment or profit center but via product usage or adoption, not direct payment for services. Unsurprisingly all of these sound familiar to Team Topologies - this type would be the Platform Team, the closest to how SRE works, just on the other end of spectrum as maximum customer empathy team (often Professional Services sees itself as "spearhead" or the product).
A fifth type would be an organization that does not have in-house professional services. The classic example here are Atlassian and Apple, arguably even Oracle, who always put self-services and partner network first, with their field engineers primarily being a glue between partners and internal engineering teams. In Apple's case there is also simply the product decision to not be customizable, so naturally field engineering is more like DevRel, evangelizing the ideas which are famously executed without considering the customer. Microsoft and SAP over the aeons oscillated between this model and type 1 above.

In part 2, I will talk about estimation, and how work is prioritized in Professional Services teams across those 4 models.

Saturday, 15 January 2022

Letter to my future self 2022 edition

Reading my earlier “letter to my future self” from 2016 is interesting, because it still holds true, irrespective of how the pandemic changed the workplace: I wanted to work on delightful products with end-user impact, intelligent and intuitive products with a strong and clear vision. First class in technology, with information and technology as a “narrative”, meaning culture, and with a strong focus on operational excellence. Back then the SRE book had just not been released so it was hard to explain to classic “architects” what my role would be, yet I was keen to learn how to run things at scale. I wanted to be part of a global, distributed and diverse organization, a “swarm” with a focus on Asia, what I liked about consulting, not a manager.

All of that, and more, came true and I am grateful to Google for giving me this opportunity. I learned more about product development and excellence, customer empathy, tech leadership, reliability and eventually SRE than I ever expected - from the most brilliant and humble people I ever met. I was lucky to work on one the largest scalable, concurrent and low latency systems in the industry, diving deep into (data) observability and machine learning. Helping migrate some of them to Spanner I built up experience to help launch Cloud Spanner in Asia, and with that move from product technology management to strategic cloud engineering. I was again lucky to help integrate and migrate real-time streaming systems and data warehouses, and improve products like BigQuery, Dataflow / Beam and Kubernetes / Cloud Run. The only thing I couldn’t avoid was becoming a tech lead manager again - but I love building up teams and so I focused a lot on creating coaching, scaling (security and scoping), hiring, onboarding and culture programs, with a special focus on inclusion of diverse backgrounds and community and student outreach to make tech legible.

Monday, 27 December 2021

My Tech Interview style (and the Integration Engineer role) [backdated draft]

Note: This post is backdated to the date of the last draft (27 Dec 2021) as I changed my job and role and didn't want to bias / inform myself by that. It's an unfinished fragment of my thinking at that point in time that I just cleaned up a little and added references where necessary, but it's still rough and incomplete.

I've never been happy with the Tech interview process and burned by it many times - being under-levelled, in role I barely understood (a reason I launched the Tech Job Titles list), rejected for algorithm questions ("write a solver for Tetris but in n dimensions"), or simply not even screened for "no product experience". This form of gatekeeping in the tech industry is one of my pet peeves, but it's also simply unproductive and inefficient. It only works because of survivorship bias for people from top universities who are being prepped with special courses, books, test interviews and special coaching by tech firms - as such it functions more as a ritual than actual role fit and/or culture add. It is basically a code (e.g. speaking out loud during programming), and by knowing that code the interviewee signals the interviewer to be part of the same group ("likeability").

My interview style

I've done about ~250 tech interviews at Google and ~250-500 at Accenture, plus quite few in non-for-profit volunteering and my own startups - regardless whether the role was called programmer, engineer, consultant or architect. Instead of going into depth on what the current interview process is or what's broken with it (the list has a few pointers and there is great research by others), let me highlight what is important for me in tech interviews (leaving behavioural and hypothetical aside):