- 1553 words
- 8 mins
Scalability is sometimes confused with performance, but it’s not at all the same thing.
In this article, we’ll explore the differences between them using a simple model!
We’ll start by answering a more fundamental question. It will allow us to determine the scope of our model.
What is performance?
When people talk about the performance of a distributed system, they’re usually considering one of two metrics:
- , which is the time it takes the system to process a single job (on average).
- , which is the number of jobs the system can process in a given period.
Low automatically means having higher . But in spite of that, software architecture typically focuses on increasing through other means.
This is because reducing is something that can only be done on a per-component basis, it’s subject to severe diminishing returns, and there are inherent limits to how low you can go.
It’s much easier to increase , even if it comes at the expense of . So that’s what our model is going to focus on.
Boxes and jobs
This model involves just two entities — and .
A stands for pretty much anything a server could be doing — like answering an HTTP request, processing a credit card transaction, or running a database query.
are processed by . A could stand in for a VM, container, or maybe even a process inside a single machine.
The rules for how process jobs are very simple:
- Every takes a fixed amount of to complete. That is, is fixed.
- Every is identical to any other. They all process at the same rate.
- Each can only process one job at a time.
None of these things is true of real-world jobs, but these simplifications are what makes the model easy to reason about.
And here it is so far, in all its glory:

With a single , our performance metrics are directly connected:
But it’s hardly a distributed system if there’s only one ! Which is why we have lots of them:

We’ll call the number of these boxes . In that case, the expression becomes:
While is fixed, isn’t. We can change it however we want. To increase , we just get more !
Sadly, these aren’t free. They actually money.
While in the real world, different boxes have different price tags, in our little model we just have one. And it comes at a fixed per unit of .
Since is proportional to times a constant factor, we can actually simplify the relation to just:
The JobRate
So we have this system and jobs are coming in. But how fast are they going?
Having a of, say, means we have the to handle jobs per second (or something). But are we getting jobs per second?
Maybe we’re getting or . This is the , and whatever it is, our system must be prepared to handle it.
The is not constant — it’s actually a function of time, or , and in almost all cases we can neither predict nor control it. It could be a flat line or it could fluctuate like crazy.
Here are some examples of how it might look like:

Missed jobs
If , there are jobs that aren’t being handled.
This usually means the jobs end up being queued somewhere (possibly multiple somewheres). For example:
- Your HTTP server.
- The user’s web browser.
- A message broker like Kafka.
As jobs get queued, one of the following happens:
-
They get less and less relevant. Maybe they’re stock orders that become less valuable the longer we wait to fulfill them.
-
Their worth is constant until the deadline, after which it drops to , like HTTP transactions that time out.
-
So many jobs get queued that they cause the messaging infrastructure to crash, leading to total system failure.
The specifics don’t actually matter for the purposes of this article — in almost all cases, not handling jobs is a Very Bad Thing and somewhere we really, really don’t wanna be.
We’ll just consider it a fail state of the model.
Utilization
But what about having more than we need? That is, a situation where:
That sounds nice, but since is proportional to , it basically means we’re paying for infrastructure that's going to waste.
We can get a figure for — how much of our system is actually being used to process these jobs, versus standing idle and racking up money.
It’s simply:
-
is the scenario we’re just described. We have more than enough to handle jobs. We’re still paying for that extra though.
-
means we’re right on the money — every cent we’re pumping into our system is translated into jobs that get processed. It’s the ideal scenario. It’s also dangerously close to failure.
-
just means we’re not handling jobs, which is a Very Bad Thing and something we’re not considering right now.
The theoretical maximum here is , but having lower is almost always better than missing jobs, so you usually want to have some left over. That translates to having a that’s a little less than , like .
The exact amount of headroom you want is a hard question to answer and really depends on what you’re actually doing.
Which is why we’ll answer a different question instead. How would you want to change over time as the changes?
The answer is that you don’t!
If you can keep at whether the is or , you’re playing it safe — half of your infrastructure is idling — but you’re still winning!
On the other hand, failure looks like going down into or above . Both of those should be avoided.
Of course, if keeps changing, the only way to keep the same is changing your to match. is fixed, so we have to change instead.
That’s scaling. Being able to do that is called scalability.
The point of it all
The goal of scaling your system is to meet your target . That doesn’t just mean getting more — it also means getting less of it as needed.
People tend to focus on the getting more part. I think that’s because it’s more glamorous, in the same way people tend to talk about climbing up mountains, even though climbing back down is often harder.
In any case, building a system like that is much harder than just getting high . A real-world system is a heterogenous mass of different components that have very different scaling characteristics.
In some cases, it may not be obvious how some of these components can be scaled at all.
What’s more, as a system scales, its structure changes, and it becomes vulnerable to different sorts of issues. Problems that occur at one level of scale may not occur in another, making such issues hard to debug.
This kind of stuff happens whether you’re scaling up or down.
The cloud
In fact, scalability is so difficult that few companies attempt to achieve it by themselves.
Instead, they pay huge cloud providers to solve some parts of these problems for them, even if it means also paying a pretty steep markup on actual processing power — and .
The best example of all of this is the serverless platform.
One paper1 found that, when compared to a virtual machine, serverless platforms cost times more for the same , with times the .
That would be lunacy from a performance standpoint, but it makes perfect sense when considering scalability — your might vary by a factor of over a single day.
Compared to that, even a factor of might not be so scary, especially if it lets you avoid paying for so many engineers.
Conclusion
In short, scalability is being able to change your system’s throughput based on demand.
Sometimes, people talk about improving scalability when they actually just mean making stuff run faster.
That’s important too, but serverless platforms (and the cloud computing model in general) are proof that you can have scalability without high performance, and that people will happily pay lots of money to have it.
I hope you’ll join me for future articles, in which we’ll use slightly more complicated models to look at more advanced qualities of distributed systems!