ML Helper

BangDB ML Helper offers several APIs to help simplify the ML related activities. The type offers features from Training model, prediction, versioning of model, deployment to managing large files and binary objects related to ML. Check out the few real world examples for to learn more or try them out on BangDB.

C++

Java

To create MLHelper object

BangDB MLHelper(train_pred_brs_info *tpbinfo, const char *conf_path = NULL, bool isssl = true)

To create a bucket to store all intermediate training and testing files

int createBucket(const char *bucket_info)

bucket_info is the name for the bucket to be created. It returns -1 for error.

To create or to change name of the bucket

void setBucket(const char *bucket_info)

To upload the files required to train or predict

long uploadFile(const char *key, const char *fpath, InsertOptions iop)

The key is the id of the file fpath takes the path to the file including the file name.

It returns -1 for error.

This is to train a model we should call trainModel API. This API returns immediately and if successful then it schedules training of the model. User should call getModelStaus() for sometime until it returns the end status.

int trainModel(const char *req)

It takes a training request and returns status of the training request. It returns -1 for error.

To get status of the model when training request is fired

char *getModelStatus(const char *req)

Req input parameter is like following:

req = {"schema-name":, "model_name": }

And the return value is like following:

{"schema-name":, "model_name":, "train_start_ts":, "train_end_ts":, "train_state":,}

The above is true for ML related model status. For IE (Information Extraction) related model status use following:

IE_BANGDB_TRAINING_STATE is an enum with following options:

error

IE_BANGDB_TRAINING_STATE_INVALID_INPUT = 10,
IE_BANGDB_TRAINING_STATE_NOT_PRSENT,
IE_BANGDB_TRAINING_STATE_ERROR_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_HELPER_FILES,
IE_BANGDB_TRAINING_STATE_ERROR_BRS_FEATURE_EX,
IE_BANGDB_TRAINING_STATE_ERROR_BRS_HELP_FILES,
IE_BANGDB_TRAINING_STATE_ERROR_PRE_NER_TRAIN,
IE_BANGDB_TRAINING_STATE_LIMBO,
IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN,
IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_PRE_REL_TRAIN, //20
IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN,
IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_REL_LIST_BRS,
IE_FILE_TYPE_ERROR_VAL_TRAINDATA,
IE_FILE_TYPE_ERROR_VAL_TESTDATA,
IE_FILE_TYPE_ERROR_VAL_CLASSDATA,
IE_FILE_TYPE_ERROR_VAL_TOTALEXDATA

intermediate states

IE_BANGDB_TRAINING_STATE_BRS_GET_PENDING,
IE_BANGDB_TRAINING_STATE_BRS_GET_DONE,
IE_BANGDB_TRAINING_STATE_HELPER_DONE, //30
IE_BANGDB_TRAINING_STATE_PRE_NER_DONE,
IE_BANGDB_TRAINING_STATE_NER_DONE,
IE_BANGDB_TRAINING_STATE_PRE_REL_DONE,
IE_BANGDB_TRAINING_STATE_REL_DONE,
IE_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING,
IE_BANGDB_TRAINING_STATE_BRS_RELLIST_UPLOAD_PENDING

training done

IE_BANGDB_TRAINING_HELP_DONE, //37
IE_BANGDB_TRAINING_STATE_TRAINING_DONE, //38
IE_BANGDB_TRAINING_STATE_DEPRICATED

Please see bangdb common for more info.

It returns NULL for error or errcode as -1, else errcode for success. User should free the memory using delete[].

To delete the mode

int delModel(const char *req)

This delete model by passing req parameter. req = {"schema_name":,"model_name":} It returns -1 for error.

To delete training request

int delTrainRequest(const char *req)

This is to delete the training request. Helpful when training got stuck for some reasons and the status was not updated properly. It returns -1 for error.

To predict for a particular data or event

char *predict(const char *req)

Here is how req looks like:

{schema-name, attr_type: NUM, data_type:event, re_format:N, model_name: model_name, data:"1 1:1.2 2:3.2 3:1.1"}

It returns NULL for error or errcode as -1 else errcode. User should free the memory using delete[].

To get to training request all all models for a particular schema

ResultSet *getTrainingRequests(const char *schema)

It returns NULL for error code.

To get training request for a particular model

char *getRequest(const char *req) req : {“schema_name": ,"model_name": }

It returns NULL for error or errcode as -1 else errcode. User should free the memory using delete[].

This sets the status for a particular training request

int setModelStatus(const char *status) status = {“schema_name": ,"model_name": ,"status": }

It returns -1 for error.

To get prediction status

char *getModelPredStatus(const char *req) req = {"schema-name":, "model_name": }

It returns NULL for error or errcode as -1 else errcode. User should free the memory using delete[].

To delete prediction request

int delPredRequest(const char *req) req = {"schema-name":, "model_name": “file_name":}

It returns -1 for error.

To upload any ML related file

long uploadFile(const char *bucket_info, const char *key, const char *fpath, InsertOptions iop)

Key is the id for the file and fpath takes the path to the file including the file name.

To download a file from a given bucket

long downloadFile(const char *bucket_info, const char *key, const char *fname, const char *fpath)

It returns -1 for error.

To get the binary from the given buckets

long getObject(const char *bucket_info, const char *key, const char **data, long *datlen)

It gets the object(binary or otherwise) from the given bucket, key. It fills data with the object and sets the datlen as length or size of the object. It returns -1 for error.

To delete a file from a bucket

int delFile(const char *bucket_info, const char *key)

It returns -1 for error.

To delete a bucket

int delBucket(const char *bucket_info)

It returns -1 for error.

To count the number of buckets

long countBuckets()

It returns -1 for error or count for success.

To get number of slices are there for the given file

int countSlices(const char *bucket_info, const char *key)

Since BRS (bangdb resource server) stores large files and objects in chunks, therefore we can count how many slices are there for the given file (key) by calling this function. It returns -1 for error for count for success.

To count object in a given bucket

long countObjects(const char *bucket_info)

It returns -1 for error.

To get details of all the objects in a given bucket

char *countObjectsDetails(const char *bucket_info)

It returns NULL for error else the details. User should free memory using delete[].

Count the number of models for a schema

long countModels(const char *schema)

It returns -1 for error else count.

To get list of objects for a given buckets

char *listObjects(const char *bucket_info, const char *key = NULL, int list_size_mb = 0)

This returns json string with the list of objects in a given bucket for a given key or for all keys It returns NULL for error else the object list. User should free the memory of returned data using delete[].

To get list of buckets present

char *listBuckets(const char *user_info)

This returns the list of all buckets for the user given by user_info which looks like following:

{"access_key":"akey", "secret_key":"skey"}

It returns NULL for error else the object list. User should free the memory of returned data using delete[].

To get data from stream to train model

long uploadStreamDataForTrain(const char *req)

It returns -1 for error.

To close the BangDB MLHelper

void close BangDB ML Helper ()

To delete MLHelper object

virtual ~ BangDB ML Helper()