The architecture decisions that compound: why early choices matter most
Some architecture decisions have linear consequences. Others compound. Which early choices constrain what's economically possible later, and why the decisions that compound are almost always made by people who don't realise they're making them.
Some architecture decisions have linear consequences. You make a choice, you accept a cost, and the cost stays roughly proportional to the scope of the choice. You can undo it later at a predictable price.
Others compound. The early cost is small and the decision feels reversible, but the shape of everything you build afterwards is quietly constrained by it. By the time you want to change it, the change isn't a refactor. It's an exfiltration, and you have to run two systems in parallel until you're done.
The decisions that compound are almost always made early, often by people who don't realise they're making them, and almost never revisited until they start to hurt. I want to talk about the ones I see most often, how they compound, and where the counter-arguments actually hold.
What compounding actually means here
Linear cost: you chose Redis, and now when you want to change cache layer you pay to migrate a bounded set of call sites. Annoying, not existential.
Compounding cost: you chose a document store, so every query pattern in every feature you've built since then has been shaped by what indexes well in that store. When you realise you needed relational joins, you don't just have to change the database. You have to rethink the data model, which means rethinking every feature that reads or writes it, which means rethinking the API surface, which means rethinking the client code. The original decision has been amplified through every decision made on top of it.
The test for whether a decision compounds is roughly this: does this choice shape how future decisions feel? If yes, it compounds. If no, it's local.
With that framing, here are the six decisions I see compound most often. I have an opinion on each. Reasonable engineers will disagree, and I'll be explicit about where.
1. Monolith vs microservices (but actually: team topology)
The microservices literature sells itself on operational concerns: scaling, independent deployment, fault isolation. Those are real, but they aren't the compounding consequence. The compounding consequence is team topology.
A service boundary is a coordination boundary. Whoever owns the service owns its schema, its deployment, its on-call, and its interface contract. That only works if you've got enough people to own the services with some continuity. A ten-person team running thirty microservices isn't running thirty services. They're running thirty abandoned codebases with shared production incidents. The services become a liability that compounds with every new feature, because each feature now has to cross four or five service boundaries that no single person fully understands.
Monoliths defer this forcing function. They buy organisational optionality. You can restructure your teams without restructuring your system, which means you can reshape how people work as the business changes, and that happens more often than people expect.
My position: for a team under twenty-five engineers, a well-structured monolith is almost always the right choice. Module boundaries, not service boundaries. If you later need to extract a service, a disciplined monolith extracts cleanly. A premature microservice, once tangled with the others, rarely does.
Where I'm wrong: if you've got genuinely independent product lines, owned by genuinely independent teams, with genuinely different operational characteristics, microservices earn their place. But that's a rarer configuration than the architecture literature implies.
2. Database selection, and specifically query pattern lock-in
The relational-versus-document debate is tired, and most of the positions are wrong for a predictable reason: they focus on developer ergonomics at T0 rather than query patterns at T+3 years.
Document stores optimise for a specific access pattern. You know the aggregate root, you read and write it atomically. When that's what you need, they're excellent. The problem is that product requirements shift, and three years in you discover you need to answer a question the aggregate root doesn't support. Maybe it's reporting. Maybe it's a relationship you didn't anticipate. Maybe it's a compliance query.
The mechanism of the compounding: once you've shaped your data for aggregate access, every feature reads and writes it that way. The schema has no formal definition, which means the shape is enforced by implicit convention across the codebase. Retrofitting joins onto that requires either application-level joins (expensive, fragile) or a parallel relational store (a migration problem in slow motion).
Relational stores get this wrong in the opposite direction. Teams start relational, then discover they've poorly modelled their domain, and their schema becomes a poorly normalised mess that compounds through every migration. But relational schema drift has a well-understood remedy. Document store schema drift doesn't. The drift is distributed across the application code.
My position: default to relational. Not because it's always the right answer, but because its failure modes are better understood and more recoverable. Document stores are a considered choice for cases where the access pattern is genuinely aggregate-shaped and unlikely to change. That's a narrower set of cases than the NoSQL literature suggested.
Where the counter-argument bites: for genuinely document-shaped workloads (content management, catalog data, user-generated structured content), the aggregate model is the right abstraction. The mistake is reaching for a document store because "relational is old" rather than because the access pattern justifies it.
3. API design and the coupling topology it creates
Protocol wars (REST vs GraphQL vs RPC) miss the compounding decision. The compounding decision is coupling topology.
A chatty REST API with dozens of small, resource-oriented endpoints creates a coupling topology where the client knows a lot about the server's model. Every change to the underlying domain ripples through client code. GraphQL partially solves this at the query layer but introduces a different problem: clients now shape the query space, and the server becomes responsible for a cartesian product of possible queries that it didn't design for. Performance issues compound in ways that didn't exist before the API was flexible.
RPC with a well-defined service contract gives you the tightest coupling at a single boundary, which sounds bad until you compare it to the alternatives. The coupling is explicit, versioned, and local. It changes when you change it.
The compounding mechanism: your API is the boundary that everything else hangs off. Once you've got dozens of clients consuming your API in a particular shape, changing the shape isn't a server concern. It's a coordination problem across every consumer. The choice of how you expose your domain shapes how expensive every future change will be.
My position: design the API around the operations the business actually does, not the entities the domain happens to have. Operations-oriented APIs evolve better than entity-oriented ones because operations are more stable than data shapes. Most teams do the opposite because REST nudges you towards entity thinking.
4. Event sourcing and the schema you can't break
Event sourcing is adopted early for a predictable reason: "we might need it later." The promise is audit trails, time-travel debugging, replayable state. All real, all valuable, sometimes.
The compounding cost: your events become a schema you can never break. Every event you've ever emitted must remain readable, because replaying them is the whole point of the architecture. That means versioning isn't optional, it's the dominant design constraint. Schema evolution becomes a first-class engineering concern. Teams that adopted event sourcing because it sounded elegant spend the next three years maintaining backwards-compatible event transformers for scenarios that only ever happen in replay.
My position: don't adopt event sourcing as a default. Adopt it when the audit trail or the replay capability is a product requirement: regulated industries, systems where "what did we know when" is a legal question, genuinely collaborative systems where merging concurrent state changes matters. The rest of the time, an append-only audit log next to a conventional store gives you 80% of the benefit at 20% of the ongoing cost.
The counter-argument: if you need event sourcing, you need it early. Retrofitting is horrendous. The judgement call is whether you actually need it, and the pattern I see most often is adoption based on aesthetic appeal rather than product requirement.
5. The identity and tenancy model
This one hides. At T0, single-tenant is simpler. One user record, one organisation, one set of permissions. Ship the product, find customers, iterate.
At T+18 months, the first enterprise prospect asks about SSO. Then about role-based access control with custom roles. Then about data isolation guarantees. Then about compliance attestations. The tenancy model you chose at T0 is now the gatekeeper on every enterprise deal you've tried to close.
The compounding mechanism: the identity model is entangled with every authorization check in the codebase, every data access pattern, every audit log entry. Refactoring it isn't a migration. It's rebuilding the spine of the application. Teams that hit this typically discover a six-to-twelve month detour before they can close the deal they found the problem on.
My position: even if you're certain you won't have enterprise customers in year one, model tenancy and identity as if you might. The cost of a slightly more abstract identity layer at T0 is small. The cost of inserting one at T+18 months, under commercial pressure, is substantial. This is one of the cheapest compounding decisions to get right early.
Where the argument is less clear: if you're genuinely serving consumers and will never have an organisational identity layer, you can skip it. But "never" is a strong word. Check your assumption twice before committing to single-tenant.
6. Deployment pipeline as a cultural artefact
This is the decision that least feels like architecture and most behaves like it. The deployment pipeline you choose at T0 becomes a cultural artefact that shapes your release cadence, your risk appetite, and eventually the kind of engineer who will work for you.
A pipeline that requires a half-day of ceremony to ship a change produces a culture that doesn't ship on Friday. Not because anyone decided, but because the path of least resistance routes around the pain. A pipeline where deploys are cheap, reversible, and observable produces a culture that ships confidently. The culture hardens faster than most teams realise, and it's very hard to thaw.
The compounding mechanism: every developer who joins the team inherits the deployment culture. Norms calcify. The pipeline becomes a reflection of what the team believes is possible. Changing the pipeline later isn't a CI/CD project. It's a change to the team's relationship with its own work. That's why so many pipeline improvements stall: the technical change is easy; the cultural one isn't.
My position: treat the deployment pipeline as an architectural decision from day one. Not because you need to solve every problem immediately (you don't), but because the defaults you set will harden. Build for reversibility (blue/green, feature flags), observability (logs, metrics, traces as a first-class concern), and short feedback loops (fast tests, fast deploys). Every one of those will feel like overhead at T0 and like a foundation at T+1 year.
Where the argument doesn't hold, and where it does
Two legitimate counter-positions are worth naming, because they aren't wrong, they're just about different decisions.
YAGNI. "You aren't gonna need it" is valid for specific features, for speculative generality in code, for infrastructure you're building on the off-chance. It's misapplied when used against architectural decisions whose reversal is asymmetric. The heuristic for telling the difference: if the decision can be deferred without changing the shape of the thing you build next, defer it. If it shapes the next ten decisions, it isn't a YAGNI candidate.
Premature optimisation is the root of all evil. Donald Knuth's actual quotation is narrower than the folk version. He was writing about micro-optimisation in specific code paths. Architectural decisions about data shape, API boundaries, and team topology aren't premature optimisation. They're structural decisions that change what optimisations are even available to you later. Treating them as optimisation is a category error.
The distinction worth holding: small decisions made with full reversibility, don't sweat them. Structural decisions whose reversal is asymmetrically expensive, think about them carefully, even when the stakes appear low at T0.
What this means in practice
The takeaway isn't "agonise over every decision." It's "notice which decisions compound and spend disproportionate effort on those."
The heuristic I use when I sit down with a team at the start of a system: identify the two or three decisions that will shape everything downstream. The identity model. The data shape. The boundary between services, if any. The pipeline's reversibility properties. Spend real time on those. Accept that most other decisions can be corrected later, and move through them at pace.
The teams that do this well end up with fewer architectural regrets at T+3 years. The teams that treat every decision as equal-weight either agonise over the wrong ones and miss the important ones, or rush through the important ones because they look, at T0, like any other choice.
If you're facing one of these decisions now, that's exactly the conversation I find most useful. Architecture advisory engagements are often a single critical decision (the structural one that shapes the next three years) addressed with a senior outside perspective before the team commits. That's a different engagement shape to ongoing technical leadership, and it's often the highest-leverage intervention available.
Related reading:
