Log - BangDB

BangDB implements write ahead log. Basically all the changes to data and index files are written only after the log is written. The logging is avaliable as part of one of the configuration parameters and user can set it on/off as required. But once set for a table, the log will remain in same position (on/off) for all its life.

Write ahead log is always sequential and multiple sequential logs file are being generated in the life of the db. BangDB tweaks the basic design and algorithm for write ahead logging in order to reduce the amount of data to be logged and therby improving performance without sacrificing on the core basis and benefits of the write ahead logging. It uses modified ARIES algorith to implete the wal. The modification is basically to optimize the log-size/data-size ratio to improve the performce

Follwoing are the benefits of write ahead log

  • Reduces the number of seeks, since log is always sequential, hence improves IO and performance
  • Checkpointing - faster to recover in case of data recovery, can be switched on/off
  • Crash recovery - DB can recover from any crash and bring the database to a consistent state (state at the time of crash)
  • Durability - Since logs are written before the actual commit of data or index file changes along with data recovery mechanish, the data durability is not dependent upcon the write to data or index files
  • Transaction - Write ahead logging is the standard approach for transactional logging

Wal can be used in following manner;

  • Shared log
  • Private log

The shared or private are w.r.t table. For example if shared log is used (default way) the all tables will share the log. It helps optimize the IO, however there are certain cases where separate log is required for table and in that case private log comes handy as it creates separate log file for the table. Table which is very often used or high in data volume etc.. are few cases where private log may make sense

The BangDB implements write ahead log, the ARIES algorithm, for data durability and atomicity. The db always writes a log for every write operation even though it doesn't persist the actual data to the disk. Since db performs all its operations from the buffer pool and doesn't go to the disk at all if not needed that means all the data written in the buffer pool can vanish if the process is killed and data can be lost. Writing log for all data modification operation ensures that the all operation log is maintained and frequent flushing of the log to disk ensures that the data can be recovered from the log if required. The write ahead log provides the data recovering capability when required. For example in the event of process or machine crash etc..., BangDB recovers the data when restarted and brings the db to the state where it was when it crashed

The log is sequential and it's flushed to the disk by the background workers frequently. The user can set the frequency based on the need, higher frequency means data loss would be minimal in case of any eventuality. BangDB provides the flush frequency knob with impressive Milli sec (theoretically can be in micro sec) as the least count. The default frequency is 50ms and db performs very good even at this frequency

The high level view of the wal is as following;

When enabled, write ahead log(wal) keeps on writing the individual operations in the log file and keeps on rotating the buffer as and when it gets filled. The wal provides the log check-pointing and replay functionality which helps in recovering data when db was not closed properly or in case db/machine crashed. There are multiple workers for wal which keeps on checking to do various housekeeping jobs. One of the workers wakes up regularly and flushes log if required. The log is flushed in bulk and being sequential write it happens relatively quickly. All log flushes are synchronous hence guarantee the persistence of log data which is critical