Benchmark - Bangdb Embedded Concurrency

Bangdb is highly concurrent database. This means that with more number of CPUs, bangdb will tend to perform better. The most suitable spot for bangdb is to run around as many threads as number of CPU on the machine. Since performance was one of main design items for the bangdb, hence it was realized that the db has to be concurrent and should take advantage of number of CPUs in the machine. The Concurrency definitely adds the complexity and overhead but to settle with low performance even on higher capacity machine was not the intention.

Bangdb achieves high concurrency mainly for the following reasons;

  • The access methods (index) are highly concurrent

Bangdb implements varient of Btree which is B+ link tree. The structure maintains neighvoring links to the pages at the same level. The locking granulairty is at the page level. During a write operation, in the worst case scenario, db locks two pages at a time. However, most of the time and for most of the cases db can do the write operation with holding only single lock at a time. This enhances the possibility of multiple threads working on different pages at a time and completing their write tasks. The read operation for the Btree again takes single page lock at any given time

The hash implementation is again variant of Extendable hash. The write operation takes single page lock at any given time and read operation doesn't take any lock as it works on validation than locking and finding the record. This increases the read speed for Ehash significantly, infact read in Ehash is around 50% faster compared to Btree read.

  • The buffer pool is highly concurrent

The bangdb creates separate pool for different file types. This is mainly because db maintains different files for index, data, dir etc... and hence keeping separate pools encourages more concurrency for working on the buffer area. For example, to get a buffer header for dat file, whole hash table doesn't need to be locked when a page header for index is required. This boosts multiple threads to work on different pools at any given time hence by increasing concurreny

We can further fine tune the flushing of dirty pages or reclaiming free pages depending upon the nature and criticality of the data. For ex; for Ehash type, dir file size is quite low hence entire data can always be kept in memory without allowing flushing of dirty pages or reclaiming. The free list however is common and it indicates that all has equal rights to get free page when in pressure

  • The log is highly concurrent

The write ahead log allows multiple threads to write on the log file at a time. Also the various data structures it maintains are also separated with keeping sub use cases in mind. For example; transaction table and dirty page table would require to lock on single header at a time and since thread would already have locked the corresponding page, hence there is no need to lock the writing for that page in the log. Since log is sequential hence flush of log would not require locking the in memory log. Similarly check pointing is also lock free. Only during split of log, the lock is required

Here are some results of how btree and ehash would respond to increasing number of concurrency on a given machine. Please note that efficiency is best at around N where N is the number of CPU (4 in this case)