System Scalability — Web servers

As business grows, we are preparing our application to be able to handle significant traffic increase, like millions of requests coming at the same time.

Single Server = Single point of failure

This design is suitable when we have scarce budget and little time to maintain the server, and there is no critical damage even if server down.

However, it poses a big challenge for commercial website, because a single server is a single point of failure. If something wrong to that one and only server, we lose all the traffic, even if we are able to recover it, customer has no patience in digital world.

Vertical Scaling

Imagine the application is data-intensive which interact with database frequently. If the webserver and database are located in the same machine, the database workload would consume most computer resources, leave no room for webserver to serve other HTTP connections.

In this case, we can separate machines for web server and database. The database will a get more powerful box to do its job and the webserver has dedicated resources to process all HTTP request.

The benefit of this approach is that we don’t need to spend a lot of time to maintain the servers. After all, there is probably only 1 or 2 boxes to take care of.

However, it is still a single server design, we still face the risk of single point failure. We probably won’t lose webserver and database at the same time, but either one is down, our application lose its functionality.

And even we put a bigger and bigger machine for more and more traffic, the machine only comes so large, there is only so much CPU/memory we can get. At some point, we hit the wall.

Horizontal Scaling

The idea of horizonal scaling is, in response to growing requests, we add more servers to the fleet, and setup a load balancer between Internet and server fleet to distribute the workload in an even manner.

If one server is down, the load balancer should detect it and reroute the traffic automatically, so our application is always up and running. This is the beauty of horizontal scaling, but on the other hand, more extra effort is needed to maintain a fleet of servers.

One thing to highlight is, in this architecture, the web server should be stateless. That means, each server can handle any request at any time, but it doesn’t know if the current request is a subsequent of previous one. This makes sense since every request is routed by load balancer so we cannot guarantee the request from a certain user will always hit the same server.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store