Common Configuration and Explanation - Bangdb Embedded

The most common configuration parameters which usually are important from the BangDB point of view are listed below. To see the complete list please visit the bangdb configuration page

  1. Server Dir:

    [SERVER_DIR] This to let db know where do you want db to keep db related files. Default is the app directory. One can set it while creating or opening database as well by passing the path paramters
  2. persist type:

    [BANGDB_PERSIST_TYPE] user can set the following db types using this switch;
    1. INMEM_ONLY – This is for using db as Embedded in proc mode. The data will always be written and read from the cache. The db will never go to disk for any operation. The Application will allocate enough buffer space to be able to work out of the cache comfortably. The db can be accessed concurrently by multiple threads, but only one process can edit the db at a time. There is no overflow to disk supported in this case. While closing the application may request db to flush all the data to disk for later use as required or application can simply discard the data. Only single process can access the db but may be with many threads
    2. INMEM_PERSIST – This is similar to the above type except that the data will overflow to disk when buffer pool is full. Only single process can access the db but may be with many threads
    3. PERSIST_ONLY – This mode if for writing directly to file on the disk. In this mode there is no buffer pool and data is read and written to/from disk. This mode is solely for application that wants maximum durability and is not concerned about performance. In this mode multiple processes can work on the same db simultaneously but they can't be multi-threaded

  3. It's important to pick the right value for this configuration for db or table. Note that this config can be set per table basis, hence we can have different tables in a db with different persist type configuration values. Simple rule is, when you wish to keep all data in-memory always and you don't want write ahead log etc for the table or db then this option is good. Note that db performance is highest in terms of IOPS with this option. Hence for small table which is highly accessed this option is the winner, but you can keep large table in-memory as well for performance if you have enough RAM, ex; analytic apps

    INMEM_PERSIST is to have benefits of both the worlds. You can allocate as much memory as possible for buffer pool and also back the data by files on disk such that any overflow goes to disk. The db is implemented in such a manner which balances the performances even in the high load scenario where most of the data is going and coming from the disk. With this option you can enable(or disable) write ahead log(WAL) as well. With WAL enabled, db can be recovered from crash. This option suits for most of the cases where data size is too large as comapred to the memory allocated for the db and also WAL might play a role. Note that for transaction, WAL has to be enabled with this option

    The last option is very conservative in nature. It suits the use case where you deal with small data but highly critical data. This option forces db to do IO directly from file on the disk and there is no buffer pool or manager involved in between. Since data is writtent to the disk directly hence WAL is not required for this case

  4. Index type:

    [BANGDB_INDEX_TYPE] The supported index types at the moment are;
    1. EXTHASH – This is for extend-able hash based index support. This doesn't maintain the order of the keys in the index files but uses hash value of the key to read and write the data. Hence the scan or range scan can't be used in this index type. The constant look up time for keys in this case provides the maximum performance for the read operation. Please see the perf doc for stats
    2. BTREE – This is for B+ link Tree based index support. This maintains the order of the keys while write and hence the range scan is possible in this case. This performs equally good in both read and write. Please see the perf doc for stats

  5. Exthash is basically a extendable hash based indexing for keys whereas BTREE is B+ link tree based arrangement. When order of keys are not important then Exthash may be used, whereas if order is important then Btree is the only option. Note that Btree can be used for all scenrios but suits more when we want to have an order of keys in the db. Also note that range query can't be done with hash based indexing

    This may also be applied on per table basis, hence different tables can have different settings within a db based on requirement

  6. log:

    [BANGDB_LOG] Bangdb implements write ahead log for data recovery and durability. But user has an option to enable or disable the log depending upon need. For example for INMEM_ONLY db type user may want to disable log as application may not be interested in the persisting the data on the disk. Disabling log further improves the performance significantly.

  7. This may also be applied on per table basis, hence different tables can have different settings within a db based on requirement

  8. data size:

    [DAT_SIZE] This denotes the maximum size of data supported by the db. Default set value is 64KB but the user can change it to higher or lower value as suited. Note that this value has some impact on the available buffer pool size and RAM size of the machine. If we have higher RAM on the machine, then we can commit more memory to the buffer pool hence typically can handle higher sizes of the data. For moderate 256MB buffer pool. We can live with 1-8MB of the maximum data sizes, and if we have 1GB for buffer pool we can go for 1-16MB as well. Note that there is no theoretical limit on this but for practical purposes we should not over commit this value. For data beyond the set maximum size, user will have to split and write and the get and combine accordingly
  9. key size:

    [KEY_SIZE] Default size for index is 24 bytes (the key size). This again has no theoretical limit but one should be careful in setting this. Too high value would lead to reduced performance for both read and write. The key size has to be less than the page size, typically user should provide the space for at-least 16 keys per page, but more is better. In all probabilities the key sizes would remain limited to few bytes only hence set this according to the requirement without bloating index size

  10. There is no theratical limit on the key size, but limited by some constraints. Also for performance reasons, we should keep the key size as small as possible. The minimum size allowed for key is 8 bytes

    PAGE_SIZE: The key size can't be greater than the page size. Also for performance reasons, at minimum we should keep 24-32 keys per page

    This may also be applied on per table basis, hence different tables can have different settings within a db based on requirement

  11. buffer pool size:

    [BUFF_POOL_SIZE_HINT] This sets the buffer pool size. This is again hint to the db and bangdb then computes the best possible value for buffer pool less than but as close to this hint as possible. Providing higher value is always good but it depends on the machine configuration as well. Using BUFF_POOL_SIZE_HINT once can set the value
  12. log buffer size:

    [LOG_BUF_SIZE] When log is enabled, this size will be used for circular log buffer. The default 256MB is optimal for many scenarios but can be changed as per need. Depending upon the RAM size or amount of memory you want to commit, the log buffer size can be increased or decreased. Using LOG_BUF_SIZE, one can set the value
  13. Autocommit:

    [BANGDB_AUTOCOMMIT] When transaction is enabled and autocommit is off the user will have to explicity wrap all the ops in begin and commit/abort transaction boundary. This means that without explicitly having transaction, user will not be able to run even a single op. When off, user can continue using the db in usual manner. Again user can enable or disable the value by using API as well at run time depending upon scenario
  14. Key Comp Func Id:

    [KEY_COMP_FUNCTION_ID] This is to select the appropriate key comparison method provided by the db. Value 1 means select the lexicographically sorting method and value 2 means quasi lexicographically sorting method
  15. Value 1 - lexicographically sorting. Which means keys are sorted in strict lexicographical manner. For ex; if keys are 2, 12, 1, 112, 1a, a2, abc then the sorted list would be 1, 112, 12, 1a, 2, a2, abc

    Value 2 - Quasi lexicographically sorting. The result for sorting for the same list would be 1, 2, 12, 1a, a2, 112, abc

    Choice is purely dependented on use cases hence user should whichever works best. But when key based on integers then value 2 for the cofig is more natural. Whereas for most of the use cases value 1 is suitable to many

  16. Bangdb Transaction Cache Size:

    [BANGDB_TRANSACTION_CACHE_SIZE] This indicates the cache size for transaction house keeping. Transaction cache size in terms of number of concurrent transactions. Increasing this would decrease the probability of transaction getting aborted due to forced reclaim of cache nodes. But default works well in most of the situation
  17. Max Resultset Size:

    [MAX_RESULTSET_SIZE] The default max size of the resutset returned while range scan (in MB).The range scan query can limit the amount of data to be returned using this value. User can change this value by applying the scan filter before doing the scan or range query
  18. Application log

    (BANGDB_APP_LOG): user can use the bangdb logging mechanism for app logging need. The BangDB uses this log for internal error log as well. Using this config variable, user can switch to BangDB app logging mechanism or to syslog(for linux) or to standard input output
  19. The BANGDB_APP_LOG may have three different values: 0, 1 or 2. The value 2 means use the BangDB app logger (recommended), 1 means syslog for linux and stdout for windows, finally 0 means stdout/stderr.

  20. signal handler

    (BANGDB_SIGNAL_HANDLER_STATE): BangDB registers various signal handler to deal with any crash, illegal memory access etc... in order to ensure DB sanity. User can enable or disable the signal handling by the DB using this flag
  21. sync tran

    (BANGDB_SYNC_TRAN): The transaction writes log to the disk during commit, but this default behaviour can be changed with this flag