bg
bangdb ml helper – Embedded – BangDB = NoSQL + AI + Stream

bangdb ml helper – Embedded

chevron

BangDB ML Helper

Embedded

C++

Selected

Java

Selected

static bangdb_ml_helper *getInstance(bangdb_database *bdb, long mem_budget = 0);
To get instance of the ml helper, we call this api. It takes bangdb_database as required argument. It takes mem_budget as optional parameter. The mem_budget defines the amount memory we allocate for ML related activities, bangdb will always respect this budget. This is important when we run db and ml on same box or in embedded mode or when multiple users are using it and we wish to server all of them or when we wish to ensure ml memory overflow doesn’t create problem for the users Upon success it returns reference to the ml_helper else NULL
int create_bucket(char *bucket_info);
All intermediate files, models or training/ testing related files are stored within BRS (bangdb resource server) in some bucket. This api allows us to create bucket as defined by the bucket_info which looks like following;
{access_key:, secret_key:, bucket_name:}
void set_bucket(char *bucket_info);
This is similar to create_bucket, but if there is existing bucket with the name then it will change that to this bucket.
long upload_file(char *key, char *fpath, insert_options iop);
This is any ml related file that we wish to further use for training or testing or prediction. Key is the id for the file and fpath takes the path to the file including the file name. Please note it uploads in the default bucket. To load in a particular bucket, please use other api described below It returns 0 for success else -1 for error
Please see AI section for more information.
char *train_model(char *req);
This is to train a model. It takes a training request and returns status of the training request. The training request looks like following;

{ "schema-name": "id", "algo_type": "SVM", "algo_param": { "svm_type": 1, "kernel": 2, "degree": 3, "gamma": 0.2, "cost": 1.1, "cache_size": 50, "probability": 0, "termination_criteria": 0.001, "nu": 0.5, "coef0": 0.1 }, "attr_list": [ { "name": "a1", "position": 1 }, { "name": "a2", "position": 2 } ], "training_details": { "training_source": "infile", "training_source_type": "FILE", "file_size_mb": 110, "train_speed": 1 }, "scale": "Y/N", "tune_param": "Y/N", "attr_type": "NUM/STR", "re_format": "JSON", "custom_format": { "name": "ts_rollup", "fields": { "ts": "ts", "quantity": "qty", "entityid": "eid" }, "aggr_type": 2, "gran": 1 }, "model_name": "my_model1", "udf": { "name": "udf_name", "udf_logic": 1, "bucket_name": "udf_bucket" } }

Please see AI section for more information.
char *get_model_status(char *req);
This is to get the status of the model when training request is fired. Req input parameter is like following;
req = {"schema-name":, "model_name": }
And the return value is like following;
{"schema-name":, "model_name":, "train_start_ts":, "train_end_ts":, "train_state":}
The train_state actually tells the status of the model. The value for train_state are as following;

enum ML_BANGDB_TRAINING_STATE
{
//error ML_BANGDB_TRAINING_STATE_INVALID_INPUT = 10,
ML_BANGDB_TRAINING_STATE_NOT_PRSENT,
ML_BANGDB_TRAINING_STATE_ERROR_PARSE,
ML_BANGDB_TRAINING_STATE_ERROR_FORMAT,
ML_BANGDB_TRAINING_STATE_ERROR_BRS,
ML_BANGDB_TRAINING_STATE_ERROR_TUNE,
ML_BANGDB_TRAINING_STATE_ERROR_TRAIN,
ML_FILE_TYPE_ERROR_VAL_TESTDATA,
ML_FILE_TYPE_ERROR_VAL_TRAINDATA,
ML_BANGDB_TRAINING_STATE_LIMBO,
//intermediate states
ML_BANGDB_TRAINING_STATE_BRS_GET_PENDING,
ML_BANGDB_TRAINING_STATE_BRS_GET_DONE,
ML_BANGDB_TRAINING_STATE_REFORMAT_DONE,
ML_BANGDB_TRAINING_STATE_SCALE_TUNING_DONE,
ML_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING,
//training done
ML_BANGDB_TRAINING_STATE_TRAINING_DONE, //25
ML_BANGDB_TRAINING_STATE_DEPRICATED,
};

The above is true for ML related model status.
For IE (Information Extraction) related model status use following;

enum IE_BANGDB_TRAINING_STATE
{
//error IE_BANGDB_TRAINING_STATE_INVALID_INPUT = 10,
IE_BANGDB_TRAINING_STATE_NOT_PRSENT,
IE_BANGDB_TRAINING_STATE_ERROR_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_HELPER_FILES,
IE_BANGDB_TRAINING_STATE_ERROR_BRS_FEATURE_EX,
IE_BANGDB_TRAINING_STATE_ERROR_BRS_HELP_FILES,
IE_BANGDB_TRAINING_STATE_ERROR_PRE_NER_TRAIN,
IE_BANGDB_TRAINING_STATE_LIMBO,
IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN,
IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_PRE_REL_TRAIN, //20
IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN,
IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_REL_LIST_BRS,
IE_FILE_TYPE_ERROR_VAL_TRAINDATA,
IE_FILE_TYPE_ERROR_VAL_TESTDATA,
IE_FILE_TYPE_ERROR_VAL_CLASSDATA,
IE_FILE_TYPE_ERROR_VAL_TOTALEXDATA,
//intermediate states
IE_BANGDB_TRAINING_STATE_BRS_GET_PENDING,
IE_BANGDB_TRAINING_STATE_BRS_GET_DONE,
IE_BANGDB_TRAINING_STATE_HELPER_DONE, //30
IE_BANGDB_TRAINING_STATE_PRE_NER_DONE,
IE_BANGDB_TRAINING_STATE_NER_DONE,
IE_BANGDB_TRAINING_STATE_PRE_REL_DONE,
IE_BANGDB_TRAINING_STATE_REL_DONE,
IE_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING,
IE_BANGDB_TRAINING_STATE_BRS_RELLIST_UPLOAD_PENDING,
//training done
IE_BANGDB_TRAINING_HELP_DONE, //37
IE_BANGDB_TRAINING_STATE_TRAINING_DONE, //38
IE_BANGDB_TRAINING_STATE_DEPRICATED,
};

Please see AI section for more information.
int del_model(char *req);
This is used to delete the model by passing req parameter. Here is how req looks like; req = {"schema-name":, "model_name": }
int del_train_request(char *req);
This is to delete the training request. Helpful when training got stuck for some reasons and the status was not updated properly. Here is how req looks like;
req = {"schema-name":, "model_name": }
//predict request must contain the algo type as well //void *arg is for sorted list of positions of features char *predict(char *req, void *arg = NULL);
The predict api is used to predict for a particular data or event. It takes req as parameter and default parameter arg which describes the sorted position of the different features. It’s not required most of the time. Here is how request looks like;
{schema-name, attr_type: NUM, data_type:event, re_arrange:N, re_format:N, model_name: model_name, data:"1 1:1.2 2:3.2 3:1.1"}
{schema-name, attr_type: NUM, data_typee:FILE, re_arrange:N, re_format:N, model_name: model_name, data:inputfile}
{schema-name, attr_type: NUM, data_type:event, re_arrange:N, re_format:JSON, model_name: model_name, data:{k1:v1, k2:v2, k3:v3}}
resultset *get_training_requests( resultset *prev_rs, char *accountid);
This returns all the training requests made so far for an account (or schema). The prev_rs should be NULL for the first call and for subsequent calls, just pass the previous rs.
Upon success it returnss 0 else -1 for error
char *get_request(char *req);
This returns request (training) from the ml housekeeping. The request is as follows;
req = {"schema-name":, "model_name": }
It returns response with status or NULL for error or if req not found
int set_status(char *status);
This sets the status for a particular train request. The status is as follows;
status = {"schema-name":, "model_name":, “status”: }
Upon success it returns 0 else -1 for error
char *get_model_pred_status(char *req);
Given a request get the prediction status. The req is as follows;
req = {"schema-name":, "model_name": }
retval = {"schema-name":, "model_name":, “pred_req_state”:, “file_name”:}
int del_pred_request(char *req);
Deletes the request. The input param req is as follows;
req = {"schema-name":, "model_name": “file_name”:}
It returns 0 for success and -1 for error
long upload_file(char *bucket_info, char *key, char *fpath, insert_options iop);
This is any ml related file that we wish to further use for training or testing or prediction. Key is the id for the file and fpath takes the path to the file including the file name. Please note it uploads in the give bucket It returns 0 for success else -1 for error
long download_file(char *bucket_info, char *key, char *fname, char *fpath);
It downloads the file from the given bucket, key. It renames the file as “fname” and stores the file at “fpath” It returns 0 for success else -1 for error.
long get_object(char *bucket_info, char *key, char **data, long *datlen);
It gets the object(binary or otherwise) from the given bucket, key. It fills data with the object and sets the datlen as length or size of the object.
It returns 0 for success else -1 for error.
int del_file(char *bucket_info, char *key);
This deletes the given file (key) from the given bucket (bucket_info). It returns 0 for success and -1 for error
int del_bucket(char *bucket_info);
This deletes the given bucket. It returns 0 for success and -1 for error.
LONG_T count_buckets();
This returns number of buckets else -1 for error.
int count_slices(char *bucket_info, char *key);
Since BRS (bangdb resource server) stores large files and objects in chunks, therefore we can count how many slices are there for the given file (key) by calling this function.
It returns count of slices else -1 for error.
LONG_T count_objects(char *bucket_info);
This counts the number of objects in the given bucket else returns -1 for error.
char *count_objects_details(char *bucket_info);
This gives the details of all the objects in the given bucket(bucket_info) else returns NULL for error. Please note it may set error in the returned json value as well.
long count_models(char *accountid);
This counts the models for a given account else returns -1 for error
int get_ref_count();
This returns reference count of all the references of the ml_helper held by different objects
//this is to test if brs is local to the BE server DB bool is_brs_local();
This is useful to know if brs (bangdb resource server) is local or remote. Mostly used by clients
char *list_objects(char *bucket_info, char *skey = NULL, int list_size_mb = MAX_RESULTSET_SIZE);
This returns json string with the list of objects in a given bucket for a given key or for all keys (in case of skey is NULL). It may return NULL for error as well. list_size_mb defines the max size of the list, by default it would return 2MB of data or less
//returns json with the name of buckets, else error //{"access_key":"akey", "secret_key":"skey"}
char *list_buckets(char *user_info);
This returns the list of all buckets for the user given by user_info which looks like following;
{"access_key":"akey", "secret_key":"skey"}
It may return NULL as well in case of error
int close_bangdb_ml_helper(bool force = false);
This closes the bangdb ml helper. Since reference count is maintained within the ml_helper therefore if force is set as false and there are open references then it would not close the ml_helper. But if force is set to be true or number of references is 0 then it will close the ml_helper. It returns 0 for success else -1 for error.

public static synchronized BangDBMLHelper getInstance(BangDBDatabase bdb, long mem_budget)
To get instance of the ml helper, we call this api. It takes BangDBDatabase as required argument. It takes mem_budget as optional parameter. The mem_budget defines the amount memory we allocate for ML related activities, bangdb will always respect this budget. This is important when we run db and ml on same box or in embedded mode or when multiple users are using it and we wish to server all of them or when we wish to ensure ml memory overflow doesn’t create problem for the users Upon success it returns reference to the ml_helper else NULL
public String toString()
Returns the detail of the object as string
public int createBucket(String bucket_info)
All intermediate files, models or training/ testing related files are stored within BRS (bangdb resource server) in some bucket. This api allows us to create bucket as defined by the bucket_info which looks like following;
{access_key:, secret_key:, bucket_name:} public void setBucket(String bucket_info)
This is similar to create_bucket, but if there is existing bucket with the name then it will change that to this bucket.
public long uploadFile(String key, String path, InsertOptions flag)
This is any ml related file that we wish to further use for training or testing or prediction. Key is the id for the file and fpath takes the path to the file including the file name. Please note it uploads in the default bucket. To load in a particular bucket, please use other api described below
It returns 0 for success else -1 for error Please see AI section for more information.
public long uploadFile(String bucketInfo, String key, String path, InsertOptions flag)
Same as above, except it will upload the file in the given bucketInfo. Aove api will put in the default bucket.
public String trainModel(String req)
This is to train a model. It takes a training request and returns status of the training request. The training request looks like following;

{ "schema-name": "id", "algo_type": "SVM", "algo_param": { "svm_type": 1, "kernel": 2, "degree": 3, "gamma": 0.2, "cost": 1.1, "cache_size": 50, "probability": 0, "termination_criteria": 0.001, "nu": 0.5, "coef0": 0.1 }, "attr_list": [ { "name": "a1", "position": 1 }, { "name": "a2", "position": 2 } ], "training_details": { "training_source": "infile", "training_source_type": "FILE", "file_size_mb": 110, "train_speed": 1 }, "scale": "Y/N", "tune_param": "Y/N", "attr_type": "NUM/STR", "re_format": "JSON", "custom_format": { "name": "ts_rollup", "fields": { "ts": "ts", "quantity": "qty", "entityid": "eid" }, "aggr_type": 2, "gran": 1 }, "model_name": "my_model1", "udf": { "name": "udf_name", "udf_logic": 1, "bucket_name": "udf_bucket" } }

Please see AI section for more information.
public long uploadStreamDataForTrain(String req)
public String getModelStatus(String req)
This is to get the status of the model when training request is fired. Req input parameter is like following;

req = {"schema-name":, "model_name": }
And the return value is like following;

{"schema-name":, "model_name":, "train_start_ts":, "train_end_ts":, "train_state":}

The train_state actually tells the status of the model. The value for train_state are as following;

enum ML_BANGDB_TRAINING_STATE
{
//error
ML_BANGDB_TRAINING_STATE_INVALID_INPUT = 10,
ML_BANGDB_TRAINING_STATE_NOT_PRSENT,
ML_BANGDB_TRAINING_STATE_ERROR_PARSE,
ML_BANGDB_TRAINING_STATE_ERROR_FORMAT,
ML_BANGDB_TRAINING_STATE_ERROR_BRS,
ML_BANGDB_TRAINING_STATE_ERROR_TUNE,
ML_BANGDB_TRAINING_STATE_ERROR_TRAIN,
ML_FILE_TYPE_ERROR_VAL_TESTDATA,
ML_FILE_TYPE_ERROR_VAL_TRAINDATA,
ML_BANGDB_TRAINING_STATE_LIMBO,
//intermediate states
ML_BANGDB_TRAINING_STATE_BRS_GET_PENDING,
ML_BANGDB_TRAINING_STATE_BRS_GET_DONE,
ML_BANGDB_TRAINING_STATE_REFORMAT_DONE,
ML_BANGDB_TRAINING_STATE_SCALE_TUNING_DONE,
ML_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING,
//training done
ML_BANGDB_TRAINING_STATE_TRAINING_DONE, //25
ML_BANGDB_TRAINING_STATE_DEPRICATED,
};

The above is true for ML related model status.
For IE (Information Extraction) related model status use following;

enum IE_BANGDB_TRAINING_STATE
{
//error
IE_BANGDB_TRAINING_STATE_INVALID_INPUT = 10,
IE_BANGDB_TRAINING_STATE_NOT_PRSENT,
IE_BANGDB_TRAINING_STATE_ERROR_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_HELPER_FILES,
IE_BANGDB_TRAINING_STATE_ERROR_BRS_FEATURE_EX,
IE_BANGDB_TRAINING_STATE_ERROR_BRS_HELP_FILES,
IE_BANGDB_TRAINING_STATE_ERROR_PRE_NER_TRAIN,
IE_BANGDB_TRAINING_STATE_LIMBO,
IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN,
IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_PRE_REL_TRAIN, //20
IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN,
IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_REL_LIST_BRS,
IE_FILE_TYPE_ERROR_VAL_TRAINDATA,
IE_FILE_TYPE_ERROR_VAL_TESTDATA,
IE_FILE_TYPE_ERROR_VAL_CLASSDATA,
IE_FILE_TYPE_ERROR_VAL_TOTALEXDATA,
//intermediate states
IE_BANGDB_TRAINING_STATE_BRS_GET_PENDING,
IE_BANGDB_TRAINING_STATE_BRS_GET_DONE,
IE_BANGDB_TRAINING_STATE_HELPER_DONE, //30
IE_BANGDB_TRAINING_STATE_PRE_NER_DONE,
IE_BANGDB_TRAINING_STATE_NER_DONE,
IE_BANGDB_TRAINING_STATE_PRE_REL_DONE,
IE_BANGDB_TRAINING_STATE_REL_DONE,
IE_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING,
IE_BANGDB_TRAINING_STATE_BRS_RELLIST_UPLOAD_PENDING,
//training done
IE_BANGDB_TRAINING_HELP_DONE, //37
IE_BANGDB_TRAINING_STATE_TRAINING_DONE, //38
IE_BANGDB_TRAINING_STATE_DEPRICATED,
};

Please see AI section for more information.
public int setModelStatus(String req)
This sets the status for a particular train request. The status is as follows;
status = {"schema-name":, "model_name":, “status”: }
Upon success it returns 0 else -1 for error
public int delModel(String req)
This is used to delete the model by passing req parameter. Here is how req looks like;
req = {"schema-name":, "model_name": }
public int delTrainRequest(String req)
This is to delete the training request. Helpful when training got stuck for some reasons and the status was not updated properly. Here is how req looks like;
req = {"schema-name":, "model_name": }
public String predict(String req)
The predict api is used to predict for a particular data or event. It takes req as parameter and default parameter arg which describes the sorted position of the different features. It’s not required most of the time. Here is how request looks like;
{schema-name, attr_type: NUM, data_type:event, re_arrange:N, re_format:N, model_name: model_name, data:"1 1:1.2 2:3.2 3:1.1"}
{schema-name, attr_type: NUM, data_typee:FILE, re_arrange:N, re_format:N, model_name: model_name, data:inputfile}
{schema-name, attr_type: NUM, data_type:event, re_arrange:N, re_format:JSON, model_name: model_name, data:{k1:v1, k2:v2, k3:v3}}
public int predict_async(String req)
// not for embd
public String getModelPredStatus(String req)
Given a request get the prediction status. The req is as follows;
req = {"schema-name":, "model_name": }
retval = {"schema-name":, "model_name":, “pred_req_state”:, “file_name”:}
public int delPredRequest(String req)
Deletes the request. The input param req is as follows;
req = {"schema-name":, "model_name": “file_name”:}
It returns 0 for success and -1 for error
public ResultSet getTrainRequests(String req, String levk)
This returns all the training requests made so far for an account (or schema). The prev_rs should be NULL for the first call and for subsequent calls, just pass the previous rs.
Upon success it returnss 0 else -1 for error
public String getRequestDetail(String req)
This returns request (training) from the ml housekeeping. The request is as follows;
req = {"schema-name":, "model_name": }
It returns response with status or NULL for error or if req not found
public String listBuckets(String req)
This returns the list of all buckets for the user given by user_info which looks like following;
{"access_key":"akey", "secret_key":"skey"}
It may return NULL as well in case of error
public String listAllBuckets(String req)
public String listObjects(String req, String skey, int listSizeMB)
This returns json string with the list of objects in a given bucket for a given key or for all keys (in case of skey is NULL). It may return NULL for error as well. list_size_mb defines the max size of the list, by default it would return 2MB of data or less
public long getModelCount(String req)
This counts the models for a given account else returns -1 for error
public int reinitMDM(String req) only for admin and server case
// not for embdd
public boolean isBRSLocal() returns if BRS is local, useful for distributed mode or server
// not for embdd as it’s always true
public long downloadFile(String bucketInfo, String key, String fname, String fpath)
It downloads the file from the given bucket, key. It renames the file as “fname” and stores the file at “fpath”
It returns 0 for success else -1 for error.
public byte[] getObject(String bucketInfo, String key)
It gets the object(binary or otherwise) from the given bucket, key. It fills data with the object and sets the datlen as length or size of the object.
It returns 0 for success else -1 for error.
public long countBuckets()
This returns number of buckets else -1 for error.
public long countObjects(String bucket_info)
This counts the number of objects in the given bucket else returns -1 for error.
public String countObjectsDetails(String bucket_info)
This gives the details of all the objects in the given bucket(bucket_info) else returns NULL for error. Please note it may set error in the returned json value as well.
public int countSlices(String bucket_info, String key)
Since BRS (bangdb resource server) stores large files and objects in chunks, therefore we can count how many slices are there for the given file (key) by calling this function.
It returns count of slices else -1 for error.
public int delFile(String bucket_info, String key)
It deletes the file specified with key and bucket_info. Bucket info is as follows;
bucketInfo = {\"bucket_name\":\"ml_bucket_info\", \"access_key\":\"brs_access_key\", \"secret_key\":\"brs_secret_key\"}
it returns 0 for success else -1 for error
public int delBucket(String bucket_info)
It deletes the bucket as specified. Bucket info looks like following;
bucketInfo = {\"bucket_name\":\"ml_bucket_info\", \"access_key\":\"brs_access_key\", \"secret_key\":\"brs_secret_key\"}
it returns 0 for success else -1 for error
public synchronized void closeMLHelper(boolean force)
This closes the bangdb ml helper. Since reference count is maintained within the ml_helper therefore if force is set as false and there are open references then it would not close the ml_helper. But if force is set to be true or number of references is 0 then it will close the ml_helper. It returns 0 for success else -1 for error.