bg
bangdb_ml_helper – Embedded C++ – BangDB = NoSQL + AI + Stream

bangdb_ml_helper – Embedded C++

chevron

bangdb_ml_helper

Embedded, C++

static bangdb_ml_helper *getInstance(bangdb_database *bdb, long mem_budget = 0);
To get instance of the ml helper, we call this api. It takes bangdb_database as required argument. It takes mem_budget as optional parameter. The mem_budget defines the amount memory we allocate for ML related activities, bangdb will always respect this budget. This is important when we run db and ml on same box or in embedded mode or when multiple users are using it and we wish to server all of them or when we wish to ensure ml memory overflow doesn’t create problem for the users Upon success it returns reference to the ml_helper else NULL
int create_bucket(char *bucket_info);
All intermediate files, models or training/ testing related files are stored within BRS (bangdb resource server) in some bucket. This api allows us to create bucket as defined by the bucket_info which looks like following;
{access_key:, secret_key:, bucket_name:}
void set_bucket(char *bucket_info);
This is similar to create_bucket, but if there is existing bucket with the name then it will change that to this bucket.
long upload_file(char *key, char *fpath, insert_options iop);
This is any ml related file that we wish to further use for training or testing or prediction. Key is the id for the file and fpath takes the path to the file including the file name. Please note it uploads in the default bucket. To load in a particular bucket, please use other api described below It returns 0 for success else -1 for error
Please see AI section for more information.
char *train_model(char *req);
This is to train a model. It takes a training request and returns status of the training request. The training request looks like following;

{ "schema-name": "id", "algo_type": "SVM", "algo_param": { "svm_type": 1, "kernel": 2, "degree": 3, "gamma": 0.2, "cost": 1.1, "cache_size": 50, "probability": 0, "termination_criteria": 0.001, "nu": 0.5, "coef0": 0.1 }, "attr_list": [ { "name": "a1", "position": 1 }, { "name": "a2", "position": 2 } ], "training_details": { "training_source": "infile", "training_source_type": "FILE", "file_size_mb": 110, "train_speed": 1 }, "scale": "Y/N", "tune_param": "Y/N", "attr_type": "NUM/STR", "re_format": "JSON", "custom_format": { "name": "ts_rollup", "fields": { "ts": "ts", "quantity": "qty", "entityid": "eid" }, "aggr_type": 2, "gran": 1 }, "model_name": "my_model1", "udf": { "name": "udf_name", "udf_logic": 1, "bucket_name": "udf_bucket" } }

Please see AI section for more information.
char *get_model_status(char *req);
This is to get the status of the model when training request is fired. Req input parameter is like following;
req = {"schema-name":, "model_name": }
And the return value is like following;
{"schema-name":, "model_name":, "train_start_ts":, "train_end_ts":, "train_state":}
The train_state actually tells the status of the model. The value for train_state are as following;

enum ML_BANGDB_TRAINING_STATE
{
//error ML_BANGDB_TRAINING_STATE_INVALID_INPUT = 10,
ML_BANGDB_TRAINING_STATE_NOT_PRSENT,
ML_BANGDB_TRAINING_STATE_ERROR_PARSE,
ML_BANGDB_TRAINING_STATE_ERROR_FORMAT,
ML_BANGDB_TRAINING_STATE_ERROR_BRS,
ML_BANGDB_TRAINING_STATE_ERROR_TUNE,
ML_BANGDB_TRAINING_STATE_ERROR_TRAIN,
ML_FILE_TYPE_ERROR_VAL_TESTDATA,
ML_FILE_TYPE_ERROR_VAL_TRAINDATA,
ML_BANGDB_TRAINING_STATE_LIMBO,
//intermediate states
ML_BANGDB_TRAINING_STATE_BRS_GET_PENDING,
ML_BANGDB_TRAINING_STATE_BRS_GET_DONE,
ML_BANGDB_TRAINING_STATE_REFORMAT_DONE,
ML_BANGDB_TRAINING_STATE_SCALE_TUNING_DONE,
ML_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING,
//training done
ML_BANGDB_TRAINING_STATE_TRAINING_DONE, //25
ML_BANGDB_TRAINING_STATE_DEPRICATED,
};

The above is true for ML related model status.
For IE (Information Extraction) related model status use following;

enum IE_BANGDB_TRAINING_STATE
{
//error IE_BANGDB_TRAINING_STATE_INVALID_INPUT = 10,
IE_BANGDB_TRAINING_STATE_NOT_PRSENT,
IE_BANGDB_TRAINING_STATE_ERROR_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_HELPER_FILES,
IE_BANGDB_TRAINING_STATE_ERROR_BRS_FEATURE_EX,
IE_BANGDB_TRAINING_STATE_ERROR_BRS_HELP_FILES,
IE_BANGDB_TRAINING_STATE_ERROR_PRE_NER_TRAIN,
IE_BANGDB_TRAINING_STATE_LIMBO,
IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN,
IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_PRE_REL_TRAIN, //20
IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN,
IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_REL_LIST_BRS,
IE_FILE_TYPE_ERROR_VAL_TRAINDATA,
IE_FILE_TYPE_ERROR_VAL_TESTDATA,
IE_FILE_TYPE_ERROR_VAL_CLASSDATA,
IE_FILE_TYPE_ERROR_VAL_TOTALEXDATA,
//intermediate states
IE_BANGDB_TRAINING_STATE_BRS_GET_PENDING,
IE_BANGDB_TRAINING_STATE_BRS_GET_DONE,
IE_BANGDB_TRAINING_STATE_HELPER_DONE, //30
IE_BANGDB_TRAINING_STATE_PRE_NER_DONE,
IE_BANGDB_TRAINING_STATE_NER_DONE,
IE_BANGDB_TRAINING_STATE_PRE_REL_DONE,
IE_BANGDB_TRAINING_STATE_REL_DONE,
IE_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING,
IE_BANGDB_TRAINING_STATE_BRS_RELLIST_UPLOAD_PENDING,
//training done
IE_BANGDB_TRAINING_HELP_DONE, //37
IE_BANGDB_TRAINING_STATE_TRAINING_DONE, //38
IE_BANGDB_TRAINING_STATE_DEPRICATED,
};

Please see AI section for more information.
int del_model(char *req);
This is used to delete the model by passing req parameter. Here is how req looks like; req = {"schema-name":, "model_name": }
int del_train_request(char *req);
This is to delete the training request. Helpful when training got stuck for some reasons and the status was not updated properly. Here is how req looks like;
req = {"schema-name":, "model_name": }
//predict request must contain the algo type as well //void *arg is for sorted list of positions of features char *predict(char *req, void *arg = NULL);
The predict api is used to predict for a particular data or event. It takes req as parameter and default parameter arg which describes the sorted position of the different features. It’s not required most of the time. Here is how request looks like;
{schema-name, attr_type: NUM, data_type:event, re_arrange:N, re_format:N, model_name: model_name, data:"1 1:1.2 2:3.2 3:1.1"}
{schema-name, attr_type: NUM, data_typee:FILE, re_arrange:N, re_format:N, model_name: model_name, data:inputfile}
{schema-name, attr_type: NUM, data_type:event, re_arrange:N, re_format:JSON, model_name: model_name, data:{k1:v1, k2:v2, k3:v3}}
resultset *get_training_requests( resultset *prev_rs, char *accountid);
This returns all the training requests made so far for an account (or schema). The prev_rs should be NULL for the first call and for subsequent calls, just pass the previous rs.
Upon success it returnss 0 else -1 for error
char *get_request(char *req);
This returns request (training) from the ml housekeeping. The request is as follows;
req = {"schema-name":, "model_name": }
It returns response with status or NULL for error or if req not found
int set_status(char *status);
This sets the status for a particular train request. The status is as follows;
status = {"schema-name":, "model_name":, “status”: }
Upon success it returns 0 else -1 for error
char *get_model_pred_status(char *req);
Given a request get the prediction status. The req is as follows;
req = {"schema-name":, "model_name": }
retval = {"schema-name":, "model_name":, “pred_req_state”:, “file_name”:}
int del_pred_request(char *req);
Deletes the request. The input param req is as follows;
req = {"schema-name":, "model_name": “file_name”:}
It returns 0 for success and -1 for error
long upload_file(char *bucket_info, char *key, char *fpath, insert_options iop);
This is any ml related file that we wish to further use for training or testing or prediction. Key is the id for the file and fpath takes the path to the file including the file name. Please note it uploads in the give bucket It returns 0 for success else -1 for error
long download_file(char *bucket_info, char *key, char *fname, char *fpath);
It downloads the file from the given bucket, key. It renames the file as “fname” and stores the file at “fpath” It returns 0 for success else -1 for error.
long get_object(char *bucket_info, char *key, char **data, long *datlen);
It gets the object(binary or otherwise) from the given bucket, key. It fills data with the object and sets the datlen as length or size of the object.
It returns 0 for success else -1 for error.
int del_file(char *bucket_info, char *key);
This deletes the given file (key) from the given bucket (bucket_info). It returns 0 for success and -1 for error
int del_bucket(char *bucket_info);
This deletes the given bucket. It returns 0 for success and -1 for error.
LONG_T count_buckets();
This returns number of buckets else -1 for error.
int count_slices(char *bucket_info, char *key);
Since BRS (bangdb resource server) stores large files and objects in chunks, therefore we can count how many slices are there for the given file (key) by calling this function.
It returns count of slices else -1 for error.
LONG_T count_objects(char *bucket_info);
This counts the number of objects in the given bucket else returns -1 for error.
char *count_objects_details(char *bucket_info);
This gives the details of all the objects in the given bucket(bucket_info) else returns NULL for error. Please note it may set error in the returned json value as well.
long count_models(char *accountid);
This counts the models for a given account else returns -1 for error
int get_ref_count();
This returns reference count of all the references of the ml_helper held by different objects
//this is to test if brs is local to the BE server DB bool is_brs_local();
This is useful to know if brs (bangdb resource server) is local or remote. Mostly used by clients
char *list_objects(char *bucket_info, char *skey = NULL, int list_size_mb = MAX_RESULTSET_SIZE);
This returns json string with the list of objects in a given bucket for a given key or for all keys (in case of skey is NULL). It may return NULL for error as well. list_size_mb defines the max size of the list, by default it would return 2MB of data or less
//returns json with the name of buckets, else error //{"access_key":"akey", "secret_key":"skey"}
char *list_buckets(char *user_info);
This returns the list of all buckets for the user given by user_info which looks like following;
{"access_key":"akey", "secret_key":"skey"}
It may return NULL as well in case of error
int close_bangdb_ml_helper(bool force = false);
This closes the bangdb ml helper. Since reference count is maintained within the ml_helper therefore if force is set as false and there are open references then it would not close the ml_helper. But if force is set to be true or number of references is 0 then it will close the ml_helper. It returns 0 for success else -1 for error.