It is almost an implicit standard for a modern web app/service to be fault tolerant and load balanced. Depending on the technologies involved, achieving a proper level of each can be difficult. This series will explore a few ways to handle each in PHP.
An extremely simplified definition of clustering would be having multiple computers working together for a single goal. Typically you see it with distributed computing applications (network rendering of graphics, protein folding, etc.), but in the web world it is defined as having multiple servers providing a website or service. In practice, however, this can be more complicated especially when coupled with incorrect ideas around what clustering truly is.
Load balancing is the first step in setting up a clustered environment for a web application. For those of you unfamiliar, load balancing consists of having a master node that accepts web requests, and funnels them to a farm of web servers to actually process the request.
From the diagram above, you can see that each server has its own session. Because of this, the load balancer needs to keep track of which server the user hit first, and keep redirecting them back to that same server on subsequent requests. This is accomplished by what is known as "sticky sessions." Once a user has a session on a particular server, the load balancer redirects all of that user's traffic back to that server.
On the surface, this setup appears to be clustering by the simple definition above, but in reality this setup only gives you load balancing, not fault tolerance--both of which are needed to have a proper cluster. If Server 1 goes offline, the load balancer will transfer those users to the remaining nodes, but their sessions will not follow them--effectively logging out every user on Server 1.
For a cluster to work, not only do we need to load balance, but we also need to share the user sessions across all nodes. PHP is a bit unique in the sense that the session information is stored on the file system. To get proper PHP session sharing, every node essentially needs access to the other node's filesystem.
One of the ways we can get proper session sharing in our cluster is to set up a NFS (Network File System) share on a central location, and have all of the PHP servers use it for session storage.
In this example, we have set up a NFS share on the load balancer, using it as the central session storage location. On each node, we need to mount the NFS share and change the PHP session storage location to be in this mount point (symlinks are fine here).
With this configuration, we have a true cluster. The request comes in and gets sent to an available node, and that node creates a session object on the NFS share. The next request comes in (with sticky sessions off) and gets redirected to another node. Since the first session object was created on the NFS share, the new node can read it in FROM the share and the user session continues. We can take down all of the server nodes except one and the site (with sessions) will remain up. The only drawback to this kind of set up is that we are relying on a single point of failure. The load balancer is controlling web traffic AND stores all of the session information. If it goes down, not only will we lose sessions, but the user traffic cannot get to the back end servers.
In part two, we will explore some more advanced configurations of PHP clusters and the technologies involved to make it all work.