Replication - BangDB Server

Cluster

Using BangDB server, we can create cluster of nodes which can co-ordinate with each other and server clients collectively. The model of cluster that can be created by using the BangDB servers is master slaves model. In this model we have one master and many slaves. Master is the server responsible for keeping all slaves in sync. Master can take read and write requests both whereas slave can take only read requests. This restriction is necessary in order to simplify the model and also have optimal performance from the cluster.

Replication

BangDB Server follows master slave configuration for the creation of the cluster where master is always one but there can be many slaves or secondaries. Write always happens at master level and read can happen at master or slaves. Slaves can attach themselves to master any time they chose to without stopping master or denying any response to clients. That is the syncronization with slaves happen at run time without making the master or cluster stop

There are two types of initial synchronization that can happen with the slaves when they join the cluster by attaching themselves with the master

a. Using Data files and record ops buffer, and/or

b. Using Data files and log files

Using data files and record ops buffer

In this option full sync happens always no matter what. Which means even if the slave is just few bytes aways from master, master would do full sync. When slave joins the cluster by sending sync message to master, master first sends the data files (idx, dat or dir) to the slave and at the same time starts recording all ops (updates) happening at the local buffer. Once the slave responds that it has read the data files, master marks it as in sync and start sending the ops buffer (slave can respond to client's query now and may fail for few queries).

Note that ops buffer need not be very huge in volume as it only logs minimum required info to track the ops later on and hence for ex; 500MB is good enough for millions of operations. Also note that as soon as slave is in sync it can start receiving the requests and slowly will catch up with the record ops as well. This approach is good for situation where almost full sync is required and for cross data center sync

Using data files and write ahead logs

In this option partial sync can also happen (apart from full sync). Hence it saves lots of time and effort in sync process when slave is just few bytes aways from the master. When slave send sync msg, master figures out how far the slave is lagging and accordingly starts the process. It computes the amount of data to be transaferred and instead of sending the data files, it sends the write ahead log. The slave can mark itself in sync immediately (and fail to respond to few read queries until the log is fully applied) and starts replaying the log in parallel.

Finally it would have applied all log records and in complete sync with the master. Note that if the slave is lagging too far behind then option 1 may work better here simply because size of write ahead log is much more than the size of the data files hence by adopting option1 for initial sync would incur less network overhead and consume less time

Benefits of replication

Primary benefit of the replication is the redundancy that is in case of failure of master node, the slave can take over and continue serving the clients. Also data loss may not happen even if master's data corruption/loss occurs. This is very critical from the data integration and safety is concerned. But there are other benefits too of replication and that is performance. When we have cluster of nodes with a master and many slaves, then we can always distribute the reads over slaves as well. This reduces the pressure on master node which can just take the responsibility of updates resulting in high trhoughput of the cluster.

Replication also improves the elasticity of the cluster. With proper partitioning in place, nodes can be added and removed from the cluster to grow or shrink as required. This elasticity gives us the benfit of scaling up and down as required and also saves considerable money by continuously running the business and reducing the infrastructure cost associated with the committed resource

Workflow

Here are some details of the steps involved in synchronization of the slave when it joins the cluster by sending register and sync message to the master

>slave marks itself out of sync

>slave initiate the registration process by sending the register message (MS_REGISTER_SLAVE)

>master recevies the register msg, does some basic verifications adds the slave into it's list and sends the entire slave list to the registering slave with ack

>slave receives the list and ack updates it's list

>slave sends sync message to master (MS_SYNC_SLAVE) along with some info on its db files

master computes the set of tables for which sync is required

>master sets the record ops on

>master dumps the data on disk

>master sends the db files to slave

>slave keeps on sending ack when required

>master marks the slave in sync

>master starts replicating the updates to the slave

>master then sends the op record to slave

>slave applies the records at its level, contacts master when required

>finally the sync is fully done