System Scalability — Web servers
As business grows, we are preparing our application to be able to handle significant traffic increase, like millions of requests coming at the same time.
Single Server = Single point of failure
A single server architecture may be good enough for simple purpose which doesn’t get much traffic, for example, a personal website, a marketing website for demonstration only.
This design is suitable when we have scarce budget and little time to maintain the server, and there is no critical damage even if server down.
However, it poses a big challenge for commercial website, because a single server is a single point of failure. If something wrong to that one and only server, we lose all the traffic, even if we are able to recover it, customer has no patience in digital world.
One approach to improve single-server design is by vertical scaling. That is, to separate webserver and database and scale them independently.
Imagine the application is data-intensive which interact with database frequently. If the webserver and database are located in the same machine, the database workload would consume most computer resources, leave no room for webserver to serve other HTTP connections.
In this case, we can separate machines for web server and database. The database will a get more powerful box to do its job and the webserver has dedicated resources to process all HTTP request.
The benefit of this approach is that we don’t need to spend a lot of time to maintain the servers. After all, there is probably only 1 or 2 boxes to take care of.
However, it is still a single server design, we still face the risk of single point failure. We probably won’t lose webserver and database at the same time, but either one is down, our application lose its functionality.
And even we put a bigger and bigger machine for more and more traffic, the machine only comes so large, there is only so much CPU/memory we can get. At some point, we hit the wall.
So, instead of single server, lets put multiple servers!
The idea of horizonal scaling is, in response to growing requests, we add more servers to the fleet, and setup a load balancer between Internet and server fleet to distribute the workload in an even manner.
If one server is down, the load balancer should detect it and reroute the traffic automatically, so our application is always up and running. This is the beauty of horizontal scaling, but on the other hand, more extra effort is needed to maintain a fleet of servers.
One thing to highlight is, in this architecture, the web server should be stateless. That means, each server can handle any request at any time, but it doesn’t know if the current request is a subsequent of previous one. This makes sense since every request is routed by load balancer so we cannot guarantee the request from a certain user will always hit the same server.