BangDB - Embeddable Flavor

Benchmark - Bangdb Performance

The purpose of the performance analysis of bangdb under few scenarios is to present a high level measurement figure which may help users to easily map their use cases and understand what to expect from bangdb. The performance measurement is done on commodity hardware without doing any customization

These performance numbers will vary depending upon the configuration of machine, OS, size of key and value and other parameters, hence users may see different metrics when they run on their own machines in different settings. However, the benchmark shown here and the comparison of numbers with few other dbs in next section would help user to take decision in some fashion

Following machine (commodity hardware) used for the test;

  • Model : 4 CPU cores, Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz, 64bit
  • CPU cache : 6MB
  • OS : Linux, 3.0.0-17-generic, Ununtu, x86_64
  • RAM : 8GB
  • Disk : 500GB, 7200 RPM, 16MB cache
  • File System: ext4

The Bangdb configuration;

  • Key size : 20 bytes - random
  • Val size : 100 - 400 bytes - randomly picked
  • Page size : 8KB
  • Write ahead log : ON
  • Log split check : ON, check every 30 ms
  • Buffer pool size : 512MB
  • Background log flush : every 50 ms
  • Checkpointing : every ~4 sec
  • Buffer pool background work : ON, every 60 ms
  • Number of concurrent threads : 4
  • Volume of data written : 32 MB – 320 MB (without compression)
  • Volume of log written : 56 MB – 520 MB (if Log = ON) (without compression)

Other parameters;

  • Write and Read : Random, continuous
  • Num of ops : 100K to 10M

This configuration ensures that the db runs in conservative mode where if process or db crash happens, at restart the db will recover to the point where the db crashed. There are many workers who are ensuring that the mechanisms for write ahead logging and buffer pool health. Note that in the table the data with the Log = ON depict the numbers for above configuration. However if we switch off the log and just work with everything else as it is(apart from log and related stuff) then the numbers would look like as given in the column for Log = OFF

The tests are done in various conditions;

  • Normal: No Disk Ops

The buffer is allocated sufficiently and all the reads and writes will happen from the pool itself. The db will not go to the disk for any reason except the continuous log flush. This ensures that the performance data reflects the true performance for the db. In the real world this condition is realistic when we use db as in memory cache only for example session data. Otherwise for persistent database, it is not possible to guarantee this

  • Stressed: Overflow to disk

The buffer allocated is less than the amount of operations that would take place. However, the buffer allocated is around half the amount of data to be written or read. Since all operations are random and continuous, hence it will be very difficult to flush out right set of pages and bringing other right set of pages in the buffer pool. In the real world operations are less random and also it's not continuous 100 percent read and write. But this models the real world in best possible manner slightly on the conservative side. It's important to note and we will see in the performance report that bangdb performs well in this condition too

  • Totally Stressed

The buffer allocated is much less than (around 5-10 times) the data that will be read and written into the database continuously and in random fashion. The flushing of dirty pages to reclaim some free ones makes the db to almost continuously write pages to the hard disk. The amount of read activities from disk are also high for both read and write ops. It becomes very very difficult to anticipate which pages to flush and read. The bangdb takes extra efforts to ensure that performance doesn't degrade to the level where writing to disk halts the whole system. The results show that performance degrades gracefully with the rise of amount of data vs buffer pool capacity ratio

Various sub sections of the benchmark will dig deeper into the performance analysis, please have a look at them to check out how Bangdb performs in different conditions. There is a competition analysis also done to just compare bangdb with Oracle's Berkley DB and Google's Level DB. The comparison is to learn the conditions and scenarios where individual dbs performs in some manner

The average values for the throughput(ops/sec) over the 100K – 1M operations are;

Index (Access Method) Log - ON Log - OFF
Write (ops/sec) Read (ops/sec) Write (ops/sec) Read (ops/sec)
Btree 475,000 1,025000 685,000 1,045,000
Hash 500,000 1,690,000 790,000 1,675,000