- 1.4k words
- 8 mins
Scalability is sometimes confused with performance, but it’s not at all the same thing.
To show that, we’re going to build a simple model. But before we start, we need to answer a more basic question. It’ll allow us to decide the scope of our model.
What is performance?
When people talk about the performance of a distributed system, they’re usually considering one of two metrics:
- , which is the time it takes the system to process a single job (on average).
- , which is the number of jobs the system can process in a given period.
Lower automatically raises . But in spite of that, distributed systems generally ignore it.
That's because reducing is very hard. There's no surefire way of doing it and it suffers from severe diminishing returns.
It’s much easier to increase , even if it comes at the expense of . That’s how modern distributed systems work, and what we’re going to focus on.
Boxes and jobs
This model involves just two entities – and .
A stands for anything a server could be doing – like answering an HTTP request, processing a credit card transaction, or running a database query.
get handled by . A could stand in for a VM, container, or maybe even a process inside a single machine.
The rules for how handle jobs are very simple:
- Every takes a fixed amount of to complete. This is the .
- Every handles at the same rate.
- Every can only process one job at a time.
None of these things is true of real-world jobs, but these simplifications are what makes the model easy to reason about.
A single box
Here is the bare-bones version:
With a single , our performance metrics are directly connected:
Lots of boxes
But it’s hardly a distributed system if there’s only one . Which is why we have lots of them:
Where do these come from, though? The rules are really simple:
- We can create instantly.
- When a appears, it immediately starts accepting jobs.
- We can also remove .
- But if a is processing a , we first wait for that to complete.
We’ll call the number of these boxes . In that case, our expression becomes:
While is constant, isn’t. We can change it in any way we want. To increase , we just get more !
No such thing as a free lunch
Sadly, these aren’t free. They actually money.
While in the real world, different boxes have different price tags, in our model they have a fixed per unit of . We'll pick a cost of for simplicity, giving us:
Combining with the last equation, we find that and are directly correlated:
This tells us that if we want to handle more jobs, we need to spend more money.
The JobRate
So we have this system and jobs are coming in. But how fast are they going?
Having a of, say, means we have the to handle jobs per second. But are we getting jobs per second?
Maybe we’re getting or . That's the , and whatever it is, our system needs to handle it.
The isn't constant – it’s actually a function of time, or , and in almost all cases we can neither predict nor control it. It could be a flat line or it could fluctuate like crazy.
Here are some potential curves:
Missed jobs
If , we're not handling some of the jobs.
In the real world, that's inevitable. But in our model, any we're not handling has nowhere to go.
It becomes a missed job, which is a Very Bad Thing in any system. We'll just say it's a fail state of the model.
Utilization
But what about having more than we need? That is, a situation where:
That sounds nice, but since is proportional to , we’re paying for hardware that's going to waste.
We can get a figure for – how much of our system is actually used to process these jobs, versus standing idle and racking up money.
It’s simply:
-
is the situation we’ve just described. We have more than enough to handle jobs. We’re still paying for that extra , though.
-
means we’re right on the money. We're translating every cent we’re pumping into our system into processing jobs.
-
just means we’re not handling jobs, which is the Very Bad Thing.
In our model, we just want to have a of . But in the real world, we really don’t want to miss any jobs, so it’s better to have some left over.
That translates to having a that’s a bit less than , like .
The exact amount of headroom we want is outside the scope of our model, and really depends on what we’re actually doing.
Which is why we’ll answer a different question instead. How would we want to change over time as the changes?
We don’t want it to change
If we can keep at whether the is or , we're playing it safe – half of our infrastructure is idling – but we're still winning!
On the other hand, failure looks like going down into or above . Both are situations we want to avoid.
Of course, if keeps changing, the only way to keep the same is changing our to match. We're fixing in place, so we have to change instead.
In other words, we need the ability to create and remove based on the amount of work we need to do.
That’s what it means to scale. Being able to do that is called scalability.
Why systems scale
The goal of scaling our system is to meet our target . That doesn’t just mean getting more – it also means getting less of it as needed.
A real-world system is a mess of different components that behave and scale very differently. Building a system that can grow and shrink on demand is both harder and more important than raw performance.
Buying scalability
In fact, scalability is so difficult that few companies try to achieve it by themselves.
Instead, they pay huge cloud providers to solve some parts of these problems for them, even if they have to pay a pretty steep markup on hardware – and .
The serverless platform is the best example of this.
One paper1 found that, when compared to a virtual machine, serverless platforms cost times more for the same .
That would be lunacy from a performance standpoint, but it makes perfect sense when considering scalability – our might vary by a factor of over a single day.
Compared to that, even a factor of might not be so scary, especially if it lets us reduce costs in other ways.
Conclusion
Scalability is being able to change our system’s throughput based on demand, while performance typically refers to that throughput.
Sometimes, people talk about improving scalability when they just mean making stuff run faster.
That’s important too, but serverless platforms are proof that many people care more about scalability than performance.
Stay tuned for future posts, where we’ll dive into more advanced models of distributed systems.