#devops #web Development

Three years ago the decision made sense. The team was smaller. The application was simpler. The traffic was predictable. You picked a deployment approach, a database, a cloud provider, and you shipped. It worked. The product grew. That's the good news. The less good news is that every new feature got bolted onto an architecture that wasn't designed to carry it. More services depending on the same database. More deployments touching more shared code. More ways for one thing going wrong to affect something it should have nothing to do with. The team that used to ship confidently now triple-checks before deploying. The bills that used to feel reasonable are harder to explain in a quarterly review. Nobody made a bad decision — they just kept building on a foundation that was right for a smaller version of the product and never stopped to ask whether it was still right now.

None of this is a failure. It's what growth looks like when the infrastructure didn't grow with it. Recognising that gap is the first step toward closing it.

The Monolith That Became a Problem

A lot of engineering teams built monolithic applications when their products were young because monoliths are genuinely faster to build and easier to reason about when the codebase is small and the team is tight. Every feature in one codebase, one deployment, one database. It's simple. It works.

The problems appear gradually. The codebase grows large enough that a change in one area has unpredictable effects on others. Deployments get slower because the entire application has to be built and released together — you can't ship a fix to the checkout flow without also shipping whatever else is in progress in the payments service. When traffic spikes on one part of the application, you can't scale just that part. The whole thing has to scale together, which means paying for compute across the entire application to handle load that only affects one component.

And by the time this becomes obviously painful, the codebase is usually large enough that changing it isn't a small project. It's months of migration work running in parallel with a product roadmap that isn't slowing down to wait for the infrastructure to catch up. The temptation is to keep going and deal with the architecture later. The cost of later tends to be higher than the cost of now, because every new feature added to the existing structure makes the eventual migration harder.

What Microservices Actually Solve — And What They Don't

The standard answer to a scaling monolith is microservices. Break the application into independently deployable services, each with its own data store, each deployable without affecting the others. The checkout service scales independently during peak periods. The payments service can be updated without touching the user management service. Incidents in one service don't cascade across the whole system.

This is all true and the benefits are real. The catch is that microservices introduce a different class of complexity that a monolith doesn't have. Network calls between services that were previously in-process function calls. Distributed transactions that are harder to reason about than local database transactions. Observability requirements that are significantly higher because a request now touches multiple services and you need to be able to trace it across all of them when something goes wrong.

Teams that migrate to microservices without first investing in the infrastructure to support them — automated deployment pipelines, distributed tracing, centralised logging, service mesh for inter-service communication — often end up with an architecture that is harder to operate than the monolith was. The problem wasn't the monolith. The problem was trying to run microservices without the tooling that makes them manageable.

Getting the sequence right means building the operational foundation before breaking up the application. The deployment pipeline, the observability stack, the security model — these go first. Then the migration can happen incrementally, one service at a time, with each newly extracted service running in a properly instrumented environment from day one.

The Database That's Holding Everything Back

One architectural decision that quietly limits what a cloud infrastructure can do is the database layer. A single relational database that serves every part of the application is a natural starting point. It's also a ceiling. Every query from every feature competes for the same connection pool. Schema changes that one team needs can break queries another team depends on. Scaling the database means scaling a single instance, which gets expensive and eventually hits hardware limits.

The path forward usually involves decomposing the database alongside the application — giving each service its own data store, sized and configured for what that service actually needs. A search feature might move to Elasticsearch. An event log might move to a time-series database. A service that needs flexible schema might move to a document store. The core transactional data might stay relational but in a managed cloud service that handles replication and failover without the team having to manage it.

This is significant work and it introduces new complexity around data consistency across services. But it also removes the single-database bottleneck that caps how far the architecture can scale and how independently teams can work. Done incrementally, starting with the services that have the clearest data separation from everything else, it's achievable without disrupting the running system.

The Traffic Spike That Exposed the Gap

There's a specific moment that forces most cloud modernisation conversations. Not a gradual accumulation of small problems, but one event where the infrastructure failed visibly. A product launch that drove ten times the usual traffic and the application fell over. A promotional campaign that worked better than expected and the system couldn't handle the load. A viral moment that the team should have been celebrating but was instead spending managing an incident.

These events are expensive in customer trust and engineering time. They're also diagnostic. A system that falls over under unexpected load is telling you something specific about where the architecture is fragile. Auto-scaling wasn't configured properly. A database that handles most traffic fine became the bottleneck under load. A component that nobody worried about turned out to be a single point of failure.

The useful response to these events isn't just to fix the immediate problem. It's to treat the incident as a map of the architectural work that needs to happen before the next spike. Traffic spikes will happen again — either from growth or from success. The infrastructure should be built to handle them without requiring an all-hands incident response.

What Edge Computing Changes for Real-Time Use Cases

For most applications, central cloud infrastructure is fine. The latency of a request travelling to a cloud region and back is measured in tens of milliseconds and users don't notice it. For some use cases, they do.

Manufacturing equipment that needs to respond to sensor data in real time can't wait for a round trip to a cloud region. A retail system running personalisation at the point of sale needs to make decisions in milliseconds, not hundreds of milliseconds. A healthcare application performing clinical inference at the bedside has latency requirements that central cloud infrastructure doesn't consistently meet.

Cloud Modernization Strategies for Enterprise Infrastructure