Configuration for BangDB
There are several parameters that BangDB exposes for users to configure the BangDB for optimal and efficient running and performance. Let's first categorise these config params for better understanding and then go into each of these. We will provide recommendations as well for each of these params
----------------------------------------------------------------------------------------------------------- Usage: -i [master | slave] -r [yes | no] -t [yes | no] -d [dbname] -s [IP:PORT] -m [IP:PORT] -b [yes | no] -v ----------------------------------------------------------------------------------------------------------- Options -------- -i: defines the server's identity [master | slave], default is SERVER_TYPE as defined in bangdb.config -r: defines replication state [yes | no], default is ENABLE_REPLICATION as defined in bangdb.config -t: defines if transaction is enabled(yes) or disabled(no) [yes | no], default is no -d: defines the dbname, default is BANGDB_DATABASE_NAME as defined in bangdb.config -s: defines IP:Port of this server, default is SERVER_ID:SERV_PORT as defined in bangdb.config -m: defines IP:Port of the master (required only for slave as it declares master with this option) -b: defines if server to be run in background as daemon default is MASTER_SERVER_ID:MASTER_SERV_PORT as defined in the bangdb.config -v: prints the alpha-numeric version of the executable Hence to run master with other values as defined in the bangdb.config, issue following command ./bangdb-server -s 192.168.1.5:7887 To run slave for this master with default other values.. ./bangdb-server -i slave -s 192.168.1.6:7887 -m 192.168.1.5:7887 etc... -----------------------------------------------------------------------------------------------------------
Set master or slave SERVER_TYPE is the config param and we can use this to set whether this server is master or slave 0 for master, 1 for slave Set whether replication is ON or OFF ENABLE_REPLICATION is the config param to set it. 0 for ON and 1 for OFF Set db name BANGDB_DATABASE_NAME is the param. By default it's always mydb Set the (this) server ip and port SERVER_ID for IP address, SERV_PORT for port. We can use ip address or name of the server SERVER_ID = 127.0.0.1 SERV_PORT = 10101 Set the master's ip and port This is mainly for slave as it has to know where is the master MASTER_SERVER_ID for ip address of master, MASTER_SERV_PORT for port of master Run the server in the background Need to use -b command line argument, can't set using bangdb.config as of now -b yes Run the server with transaction Need to use -t command line argument, can't set using bangdb.config as of now -t yes A. Configuration that affects BangDB execution The following config params are for DB, whether it is run in embedded or server manner
Possible values 0: Critical 1: Error 2: Warning 3: Info 4: Debug
KEY_SIZE This is again a config param for table and not for db. This sets the default value for keysize when not specified using TableEnv. This should be set by using TableEnv type MAX_RESULTSET_SIZE BangDB supports scan method for running range query. These scan method returns ResultSet which has list of key/vals/docs as needed by the query. MAX_RESULTSET_SIZE defines the max size of such resultsets.
we can pass this as command line arg as well when we run server directly. ./bangdb-server-2.0 -i master or ./bangdb-server-2.0 -i slave Or we can set the SERVER_TYPE param in the config file for the db, this is needed when we run bangdb using the script (bangdb-server)ENABLE_REPLICATION We can run BangDB Server with replication ON (1) or OFF (0). If OFF then slaves can't be attached. We can do this with command line arg as well;
./bangdb-server-2.0 -r yes or ./bangdb-server-2.0 -r noSERVER_ID This sets the ip address or name of the server. We can do this with command line arg as well;
./bangdb-server-2.0 -s 127.0.0.1:10101SERV_PORT This sets the port of the server. We can do this with command line arg as well;
./bangdb-server-2.0 -s 127.0.0.1:10101MASTER_SERVER_ID When a server is slave of another server, then we need to tell this server about the master. This tells the server about the ip address of the master. We can do this using command line arg as well;
./bangdb-server -m 127.0.0.1:10101MASTER_SERV_PORT When a server is slave of another server, then we need to tell this server about the master. This tells the server about the port of the master. We can do this using command line arg as well;
./bangdb-server -m 127.0.0.1:10101MAX_SLAVES This is for master, to set the limit for number of slaves OPS_REC_BUF_SIZE BangDB allows read/write operations to continue even when slave is syncing with the server. This happens using the Ops record buffer when syncing is in progress with a slave OPS_REC_BUF_SIZE sets the size in MB for the ops record. Default is good for most of the cases PING_FREQ Master and slaves checks each other liveliness using UDP based ping pong. PING_FREQ sets the frequency for the ping pong. Default value 10 sec is good enough, however you may increase or decrease the frequency as needed PING_THRESHOLD How many pings or pongs to fail before one can conclude that the other server is unreachable or down? PING_THRESHOLD defines that. Default 5 times in a row is good enough CLIENT_TIME_OUT All clients connect to the server using tcp. BangDB server handles tens of thousands of concurrent number of such connections. However, user may define if server can time out some of the connections if no requests have been received for some period of time. CLIENT_TIME_OUT defines the same in number of seconds. Default is 720 seconds NUM_CONNECTIONS_IN_POOL This is for clients only. It sets the number of connections with the server to be in the pool for performance and efficiency purposes. Default is 48, however you may increase as you need, no performance impact* due to this SLAB_ALLOC_MEM_SIZE BangDB Server uses pre allocated slabs for run time memory requirements. SLAB_ALLOC_MEM_SIZE defines the same in MB. default value of 256MB is good enough TLS_IDENTITY BangDB can run in secure mode as well and clients have to connect using the secure channel. TLS_IDENTITY can be set (reset) by the user for security purpose TLS_PSK_KEY BangDB can run in secure mode as well and clients have to connect using the secure channel. TLS_PSK_KEY can be set (reset) by the user for security purpose BANGDB_SYNC_TRAN If set then BangDB will sync forcefully with the filesystem after flush. Ideally it should be OFF (0), but in case of hard need, you may set it ON (1) BANGDB_SIGNAL_HANDLER_STATE There are various signal handlers set already, but for few extra ones, user may add the handlers. Ideally not required, but still user may switch them ON LISTENQ Queue size for the listen() call, default 10000 is quite a good number MAX_CLIENT_EVENTS Maximum number of concurrent connections to the server or num of concurrent connections. Server can handle default 10000, but change it to less number as suitable. SERVER_STAGE_OPTION BangDB server implements SEDA (Staged Event Driven Architecture). Therefore, we can organise the whole processing in different ways. There are two options available and can be selected by the user.
stage options, basically it tells server to create the number of stages to handle the clients and their requests there are two types of stages supported as of now 1. two stages, one for handling clients and other for handling the requests 2. four stages, one for handling clients, one for read, one for ops and finally one for write Note: default is option 1 and works well in most of the scenariosSERVER_OPS_WORKERS If SERVER_STAGE_OPTION = 2, then this can define how many workers to allocate for operations for db. . Default 0 is fine. Default 0 allows db to select the number of workers best suited for the given server configuration. SERVER_READ_WORKERS If SERVER_STAGE_OPTION = 2, then this can define how many workers to allocate for read (network). Default 0 is fine Default 0 allows db to select the number of workers best suited for the given server configuration. SERVER_WRITE_WORKERS If SERVER_STAGE_OPTION = 2, then this can define how many workers to allocate for write (network). Default 0 is fine Default 0 allows db to select the number of workers best suited for the given server configuration. EXT_PROG_RUN_CHLD_PROCESS For IE (information extraction), or ML/DL related activities, BangDB may run external code such as python or c. In that case this flag tells whether the external libs or code can be run in the same process or in separate process for safety purpose. Default is to run in separate process. 0 for run in separate process, 1 will allow db to run in same process in case running in separate process fails C. AI/ML related server and db configuration Checkout this discussion on ML to know more on this BRS_ACCESS_KEY BangDB supports large data as well. These large data could be binary object data or could be file. While large object data is written into LARGE_TABLE, the files are stored in BRS. BRS stands for BangDB Resource Server. BRS is line S3 and supports similar concept and API. BangDB can run as BRS or as DB + BRS, depending on configuration (as described below). User may create buckets and store files in these buckets. To access these buckets user may define the access key using this param.
ML_TRAINING_SERVER_IP BangDB can run as separate ML training server or as part of the DB as well. When it runs as part of the DB then it shares the IP else it has it's own IP Using this param, you may set the IP of the training server accordingly. ML_TRAINING_SERVER_PORT BangDB can run as separate ML training server or as part of the DB as well. When it runs as part of the DB then it shares the Port else it has it's own Port Using this param, you may set the Port of the training server accordingly. ML_PRED_SERVER_IP BangDB can run as separate ML prediction server or as part of the DB as well. When it runs as part of the DB then it shares the IP else it has it's own IP Using this param, you may set the IP of the prediction server accordingly. ML_PRED_SERVER_PORT BangDB can run as separate ML prediction server or as part of the DB as well. When it runs as part of the DB then it shares the Port else it has it's own Port Using this param, you may set the Port of the prediction server accordingly. BANGDB_ML_SERVER_TYPE This is to set up the ML cluster including the BRS For any server, this param defines what type of this server is as far as ML is concerned
0 - invalid [ default will be used - default is prediction server ] 1 - Training Server [ no prediction will happen, it's a standalone training server ] 2 - Prediction Server [ no training will happen, only for prediction ] 3 - Hybrid - both train and predict at a single placeTRAINING_PREDICT_FILES_LOC During training or prediction, DB will keep some of the files locally. This defines the place where the DB will keep those files. Default is /tmp/BRS, however, you change as you like, but ensure that DB has read/write to the folder TRAIN_PRED_MEM_BUDGET Since BangDB trains, predicts in concurrent manner, therefore it could hog the memory as we do more of these operations, esp training. Also, for performance reasons it keeps the models in the memory in loaded condition. Therefore it is important that we put a limit to the memory that it could use. TRAIN_PRED_MEM_BUDGET sets the amount of memory ML can use. The loaded models are in LRU list and DB auto loads or unloads depending upon the usage pattern
memory budget for training or prediction It depends on kind of server it is. For Training the mem budget will be used for training only, similarly for prediction it will be solely for prediction. However if the server is running in hybrid mode then it will be for both, 50% each value is in MBMAX_CONCURRENT_PRED_MODEL How many models could be trained or kept in the LRU list, this param sets that number. Default 32 is good for most of the scenario, however edit it as required D. Advanced configuration to tune core of BangDB The following are config params to tune the internal working of core BangDB. Therefore we need to be really sure before editing these. Let's go and understand these params as well PAGE_SPILT_FACTOR Since BangDB uses B+Tree* which keeps keys in sorted manner. When page splits then we need to transfer keys from one page to other. This variable decides the split factor. Simple rule is, if the ingestion of data is going to be mostly sequential (and not random) or semi sequential, then higher value is better. Else keep the default. As of now this is applicable to the entire db, however it should be for table. Will make it table specific in upcoming release LOG_FLUSH_FREQ This is frequency of log flush initiation. It's tuned for higher performance for general cases, however, you may play with the number and set what works best for you CHKPNT_ENABLED This is set to checkpointing of WAL. 0 means not checkpointing else yes. It's recommended to keep it ON, but for higher performance in certain cases you may turn it off as well CHKPNT_FREQ If checkpointing is ON then what's the frequency? Again this is set for better performance in general, however you may chose to edit it for experimentation and select the right value LOG_SPLIT_CHECK_FREQ WAL maintains append only rolling log file. It is recommended to keep checking if log file needs split at certain frequency. The value is selected for higher performance for general use cases, however you may experiment and pick the right value.
0 means don't do anything, 1 means archive in reclaim folder 2 means delete the log files usually, 2 is goodLOG_RECLAIM_DIR If LOG_RECLAIM_ACTION = 1, then it tells which directory logs should be reclaimed. Ideally when we wish to keep the log files and not delete then reclaim folder should be on network or other disk where capacity is large BUF_FLUSH_RECLAIM_FREQ This is for buffer pool and defines buffer cache dirty page flusher and the buffer cache memory reclaimer frequency in micro sec.
PREFETCH_BUF_SIZE The pre-fetch buffer max size defined in MB. DB treats this as the max limit for pre-fetching of pages in the pool
PREFETCH_SCAN_WINDOW_NUM Size of window for prefetch scan PREFETCH_EXTENT_NUM To what extent pages would be pre fetched