bg
BangDB ML Helper API – BangDB = NoSQL + AI + Stream

BangDB ML Helper API

chevron

BangDBMLHelper API

Client API

BangDBMLHelper offers several APIs to help simplify the ML related activities. The type offers features from Training model, prediction, versioning of model, deployment to managing large files and binary objects related to ML

C++

Selected

Java

Selected

To create mlhelper object

BangDBMLHelper(train_pred_brs_info *tpbinfo, const char *conf_path = NULL, bool isssl = true)
To create a bucket to store all intermediate training and testing files.
int createBucket(const char *bucket_info) Bucket_info is the name for the bucket to be created It returns -1 for error
To create or to change name of the bucket
void setBucket(const char *bucket_info)
To upload the files required to train or predict.
long uploadFile(const char *key, const char *fpath, InsertOptions iop) The key is the id of the file fpath takes the path to the file including the file name. InsertOptions is a enum with values: INSERT_UNIQUE, //if non-existing then insert else return UPDATE_EXISTING, //if existing then update else return INSERT_UPDATE, //insert if non-existing else update DELETE_EXISTING, //delete if existing UPDATE_EXISTING_INPLACE, //only for inplace update INSERT_UPDATE_INPLACE, //only for inplace update Please see more on this at bangdb common It returns -1 for error
This is to train a model we should call trainModel API. This API returns immediately and if successful then it schedules training of the model. User should call getModelStaus() for sometime until it returns the end status
int trainModel(const char *req) It takes a training request and returns status of the training request It returns -1 for error
To get status of the model when training request is fired
char *getModelStatus(const char *req) Req input parameter is like following; req = {"schema-name":, "model_name": } And the return value is like following; {"schema-name":, "model_name":, "train_start_ts":, "train_end_ts":, "train_state":___} ML_BANGDB_TRAINING_STATE is an enum with values //error ML_BANGDB_TRAINING_STATE_INVALID_INPUT = 10, ML_BANGDB_TRAINING_STATE_NOT_PRSENT, ML_BANGDB_TRAINING_STATE_ERROR_PARSE, ML_BANGDB_TRAINING_STATE_ERROR_FORMAT, ML_BANGDB_TRAINING_STATE_ERROR_BRS, ML_BANGDB_TRAINING_STATE_ERROR_TUNE, ML_BANGDB_TRAINING_STATE_ERROR_TRAIN, ML_FILE_TYPE_ERROR_VAL_TESTDATA, ML_FILE_TYPE_ERROR_VAL_TRAINDATA, ML_BANGDB_TRAINING_STATE_LIMBO, //intermediate states ML_BANGDB_TRAINING_STATE_BRS_GET_PENDING, ML_BANGDB_TRAINING_STATE_BRS_GET_DONE, ML_BANGDB_TRAINING_STATE_REFORMAT_DONE, ML_BANGDB_TRAINING_STATE_SCALE_TUNING_DONE, ML_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING, //training done ML_BANGDB_TRAINING_STATE_TRAINING_DONE, //25 ML_BANGDB_TRAINING_STATE_DEPRICATED The above is true for ML related model status. For IE (Information Extraction) related model status use following; IE_BANGDB_TRAINING_STATE i:s an enum with values //error IE_BANGDB_TRAINING_STATE_INVALID_INPUT = 10, IE_BANGDB_TRAINING_STATE_NOT_PRSENT, IE_BANGDB_TRAINING_STATE_ERROR_BRS, IE_BANGDB_TRAINING_STATE_ERROR_HELPER_FILES, IE_BANGDB_TRAINING_STATE_ERROR_BRS_FEATURE_EX, IE_BANGDB_TRAINING_STATE_ERROR_BRS_HELP_FILES, IE_BANGDB_TRAINING_STATE_ERROR_PRE_NER_TRAIN, IE_BANGDB_TRAINING_STATE_LIMBO, IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN, IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN_BRS, IE_BANGDB_TRAINING_STATE_ERROR_PRE_REL_TRAIN, //20 IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN, IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN_BRS, IE_BANGDB_TRAINING_STATE_ERROR_REL_LIST_BRS, IE_FILE_TYPE_ERROR_VAL_TRAINDATA, IE_FILE_TYPE_ERROR_VAL_TESTDATA, IE_FILE_TYPE_ERROR_VAL_CLASSDATA, IE_FILE_TYPE_ERROR_VAL_TOTALEXDATA, //intermediate states IE_BANGDB_TRAINING_STATE_BRS_GET_PENDING, IE_BANGDB_TRAINING_STATE_BRS_GET_DONE, IE_BANGDB_TRAINING_STATE_HELPER_DONE, //30 IE_BANGDB_TRAINING_STATE_PRE_NER_DONE, IE_BANGDB_TRAINING_STATE_NER_DONE, IE_BANGDB_TRAINING_STATE_PRE_REL_DONE, IE_BANGDB_TRAINING_STATE_REL_DONE, IE_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING, IE_BANGDB_TRAINING_STATE_BRS_RELLIST_UPLOAD_PENDING, //training done IE_BANGDB_TRAINING_HELP_DONE, //37 IE_BANGDB_TRAINING_STATE_TRAINING_DONE, //38 IE_BANGDB_TRAINING_STATE_DEPRICATED Please see more on this at bangdb common It returns NULL for error or errcode as -1, else errcode for success User should free the memory using delete[]
To delete the mode
int delModel(const char *req) This delete model by passing req parameter. req = {“schema_name":,"model_name":} It returns -1 for error
To delete training request
int delTrainRequest(const char *req) This is to delete the training request. Helpful when training got stuck for some reasons and the status was not updated properly. It returns -1 for error
To predict for a particular data or event.
char *predict(const char *req) Here is how req looks like; {schema-name, attr_type: NUM, data_type:event, re_format:N, model_name: model_name, data:"1 1:1.2 2:3.2 3:1.1"} Here, attr_type is an enum with following values: ML_BANGDB_ATTR_TYPE_INVALID = 0, ML_BANGDB_ATTR_TYPE_NUM, ML_BANGDB_ATTR_TYPE_STR, ML_BANGDB_ATTR_TYPE_HYBRID, Data_type is an enum with following values: ML_PREDICT_DATA_TYPE_INVALID = 0, ML_PREDICT_DATA_TYPE_FILE, ML_PREDICT_DATA_TYPE_EVENT re_format is also an enum with following values ML_BANGDB_ML_DATA_FORMAT_LIBSVM = 0, ML_BANGDB_ML_DATA_FORMAT_CSV, ML_BANGDB_ML_DATA_FORMAT_ARFF, ML_BANGDB_ML_DATA_FORMAT_JSON, ML_BANGDB_ML_DATA_FORMAT_INVALID Please see more on this at bangdb common It returns NULL for error or errcode as -1 else errcode. User should free the memory using delete[]
To get to training request all all models for a particular schema
ResultSet *getTrainingRequests(const char *schema) It returns NULL for error code.
To get training request for a particular model
char *getRequest(const char *req) req : {“schema_name": ,"model_name": } It returns NULL for error or errcode as -1 else errcode. User should free the memory using delete[]
This sets the status for a particular training request
int setModelStatus(const char *status) status = {“schema_name": ,"model_name": ,"status": } It returns -1 for error
To get prediction status
char *getModelPredStatus(const char *req) req = {"schema-name":, "model_name": } It returns NULL for error or errcode as -1 else errcode. User should free the memory using delete[]
To delete prediction request
int delPredRequest(const char *req) req = {"schema-name":, "model_name": “file_name":}.It returns 0 for success and -1 for error It returns -1 for error
To upload any ml related file
long uploadFile(const char *bucket_info, const char *key, const char *fpath, InsertOptions iop) Key is the id for the file and fpath takes the path to the file including the file name.
To Download a file from a given bucket
long downloadFile(const char *bucket_info, const char *key, const char *fname, const char *fpath) It returns -1 for error
To get the binary from the given buckets
long getObject(const char *bucket_info, const char *key, const char **data, long *datlen) It gets the object(binary or otherwise) from the given bucket, key. It fills data with the object and sets the datlen as length or size of the object. It returns -1 for error
To delete a file from a bucket
int delFile(const char *bucket_info, const char *key) It returns -1 for error
To delete a bucket
int delBucket(const char *bucket_info) It returns -1 for error
To count the number of buckets
long countBuckets() It returns -1 for error or count for success
To get number of slices are there for the given file
int countSlices(const char *bucket_info, const char *key) Since BRS (bangdb resource server) stores large files and objects in chunks, therefore we can count how many slices are there for the given file (key) by calling this function. It returns -1 for error for count for success
To count object in a given bucket
long countObjects(const char *bucket_info) It returns -1 for error
To get details of all the objects in a given bucket
char *countObjectsDetails(const char *bucket_info) It returns NULL for error else the details. User should free memory using delete[]
Count the number of models for a schema
long countModels(const char *schema) It returns -1 for error else count
To get list of objects for a given buckets
char *listObjects(const char *bucket_info, const char *key = NULL, int list_size_mb = 0) This returns json string with the list of objects in a given bucket for a given key or for all keys It returns NULL for error else the object list. User should free the memory of returned data using delete[]
To get list of buckets present
char *listBuckets(const char *user_info) This returns the list of all buckets for the user given by user_info which looks like following; {"access_key":"akey", "secret_key":"skey"} It returns NULL for error else the object list. User should free the memory of returned data using delete[]
To get data from stream to train model
long uploadStreamDataForTrain(const char *req) It returns -1 for error
To closed the bangdb ml helper
void closeBangDBMLHelper()
To delete mlhelper object
virtual ~BangDBMLHelper()

To get instance of the ml helper

public static synchronized BangDBMLHelper getInstance(String[] train_pred_brs_info) train_pred_brs_info contains port and ip for following in order overall length of array should be 6 order - brs, pred, train
To get detail of the object as string
public String toString() Returns the detail of the MLHelper object as string
To create a bucket to store all intermediate training and testing files.
public int createBucket(String bucket_info) All intermediate files, models or training/ testing related files are stored within BRS (bangdb resource server) in some bucket. This creates a bucket as defined by the bucket_info which looks like following; {access_key:, secret_key:, bucket_name:} It returns -1 for error
To create new bucket if doesn't exist otherwise update the bucket name to this name
public void setBucket(String bucket_info) It returns -1 for error
To upload training or prediction files
public long uploadFile(String key, String path, InsertOptions flag) Key is the id for the file and fpath takes the path to the file including the file name. InsertOptions is enum with values INSERT_UNIQUE, //if non-existing then insert else return UPDATE_EXISTING, //if existing then update else return INSERT_UPDATE, //insert if non-existing else update DELETE_EXISTING, //delete if existing UPDATE_EXISTING_INPLACE, //only for inplace update INSERT_UPDATE_INPLACE; //only for inplace update
To upload file in the given bucket
public long uploadFile(String bucketInfo, String key, String path, InsertOptions flag) InsertOptions is an enum with following: INSERT_UNIQUE, //if non-existing then insert else return UPDATE_EXISTING, //if existing then update else return INSERT_UPDATE, //insert if non-existing else update DELETE_EXISTING, //delete if existing UPDATE_EXISTING_INPLACE, //only for inplace update INSERT_UPDATE_INPLACE, //only for inplace update Please see more on this at bangdb common It returns -1 for error else 0 or more than 0
This is to train a model we should call trainModel API. This API returns immediately and if successful then it schedules training of the model. User should call getModelStaus() for sometime until it returns the end status
public int trainModel(String req) It takes a training request and returns the status of the training request. The training request looks like following; { "schema-name": "id", "algo_type": "SVM", "algo_param": { "svm_type": 1, "kernel": 2, "degree": 3, "gamma": 0.2, "cost": 1.1, "cache_size": 50, "probability": 0, "termination_criteria": 0.001, "nu": 0.5, "coef0": 0.1 }, "attr_list": [ { "name": "a1", "position": 1 }, { "name": "a2", "position": 2 } ], "training_details": { "training_source": "infile", "training_source_type": "FILE", "file_size_mb": 110, "train_speed": 1 }, "scale": "Y/N", "tune_param": "Y/N", "attr_type": "NUM/STR", "re_format": "JSON", "custom_format": { "name": "ts_rollup", "fields": { "ts": "ts", "quantity": "qty", "entityid": "eid" }, "aggr_type": 2, "gran": 1 }, "model_name": "my_model1", "udf": { "name": "udf_name", "udf_logic": 1, "bucket_name": "udf_bucket" } }
To data from stream to train a model
public long uploadStreamDataForTrain(String req) It returns -1 for error
To get the status of the model when training request is fired
public String getModelStatus(String req) Req input parameter is like following-req = {"schema-name":, "model_name": } And the return value is like following: {"schema-name":, "model_name":, "train_start_ts":, "train_end_ts":, "train_state":} The train_state actually tells the status of the model. The value for train_state are as following; ML_BANGDB_TRAINING_STATE is an enum with values //error ML_BANGDB_TRAINING_STATE_INVALID_INPUT = 10, ML_BANGDB_TRAINING_STATE_NOT_PRSENT, ML_BANGDB_TRAINING_STATE_ERROR_PARSE, ML_BANGDB_TRAINING_STATE_ERROR_FORMAT, ML_BANGDB_TRAINING_STATE_ERROR_BRS, ML_BANGDB_TRAINING_STATE_ERROR_TUNE, ML_BANGDB_TRAINING_STATE_ERROR_TRAIN, ML_FILE_TYPE_ERROR_VAL_TESTDATA, ML_FILE_TYPE_ERROR_VAL_TRAINDATA, ML_BANGDB_TRAINING_STATE_LIMBO, //intermediate states ML_BANGDB_TRAINING_STATE_BRS_GET_PENDING, ML_BANGDB_TRAINING_STATE_BRS_GET_DONE, ML_BANGDB_TRAINING_STATE_REFORMAT_DONE, ML_BANGDB_TRAINING_STATE_SCALE_TUNING_DONE, ML_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING, //training done ML_BANGDB_TRAINING_STATE_TRAINING_DONE, //25 ML_BANGDB_TRAINING_STATE_DEPRICATED The above is true for ML related model status. For IE (Information Extraction) related model status use following; IE_BANGDB_TRAINING_STATE i:s an enum with values //error IE_BANGDB_TRAINING_STATE_INVALID_INPUT = 10, IE_BANGDB_TRAINING_STATE_NOT_PRSENT, IE_BANGDB_TRAINING_STATE_ERROR_BRS, IE_BANGDB_TRAINING_STATE_ERROR_HELPER_FILES, IE_BANGDB_TRAINING_STATE_ERROR_BRS_FEATURE_EX, IE_BANGDB_TRAINING_STATE_ERROR_BRS_HELP_FILES, IE_BANGDB_TRAINING_STATE_ERROR_PRE_NER_TRAIN, IE_BANGDB_TRAINING_STATE_LIMBO, IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN, IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN_BRS, IE_BANGDB_TRAINING_STATE_ERROR_PRE_REL_TRAIN, //20 IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN, IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN_BRS, IE_BANGDB_TRAINING_STATE_ERROR_REL_LIST_BRS, IE_FILE_TYPE_ERROR_VAL_TRAINDATA, IE_FILE_TYPE_ERROR_VAL_TESTDATA, IE_FILE_TYPE_ERROR_VAL_CLASSDATA, IE_FILE_TYPE_ERROR_VAL_TOTALEXDATA, //intermediate states IE_BANGDB_TRAINING_STATE_BRS_GET_PENDING, IE_BANGDB_TRAINING_STATE_BRS_GET_DONE, IE_BANGDB_TRAINING_STATE_HELPER_DONE, //30 IE_BANGDB_TRAINING_STATE_PRE_NER_DONE, IE_BANGDB_TRAINING_STATE_NER_DONE, IE_BANGDB_TRAINING_STATE_PRE_REL_DONE, IE_BANGDB_TRAINING_STATE_REL_DONE, IE_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING, IE_BANGDB_TRAINING_STATE_BRS_RELLIST_UPLOAD_PENDING, //training done IE_BANGDB_TRAINING_HELP_DONE, //37 IE_BANGDB_TRAINING_STATE_TRAINING_DONE, //38 IE_BANGDB_TRAINING_STATE_DEPRICATED,
To set the status of a model
public int setModelStatus(String req) This sets the status for a particular train request. The req is as follows; req = {"schema-name":, "model_name":, “status”: } Upon success it returns 0 else -1 for error
To delete a model
public int delModel(String req) This is used to delete the model by passing req parameter Here , req = {“schema_name”: ,”model_name”: }
To delete the training request
public int delTrainRequest(String req) This is to delete the training request. Helpful when training got stuck for some reasons and the status was not updated properly. Here is how req looks like; req = {"schema-name":, "model_name": }
To predict for a particular data or event
public String predict(String req) here , the req json looks like: {schema-name, attr_type: NUM, data_type:event, re_arrange:N, re_format:N, model_name: model_name, data:"1 1:1.2 2:3.2 3:1.1"} Here, attr_type is an enum with following values: ML_BANGDB_ATTR_TYPE_INVALID = 0, ML_BANGDB_ATTR_TYPE_NUM, ML_BANGDB_ATTR_TYPE_STR, ML_BANGDB_ATTR_TYPE_HYBRID, Data_type is an enum with following values: ML_PREDICT_DATA_TYPE_INVALID = 0, ML_PREDICT_DATA_TYPE_FILE, ML_PREDICT_DATA_TYPE_EVENT re_format is also an enum with following values ML_BANGDB_ML_DATA_FORMAT_LIBSVM = 0, ML_BANGDB_ML_DATA_FORMAT_CSV, ML_BANGDB_ML_DATA_FORMAT_ARFF, ML_BANGDB_ML_DATA_FORMAT_JSON, ML_BANGDB_ML_DATA_FORMAT_INVALID Please see more on this at bangdb common
To predict for files only
public int predict_async(String req)
To get status of prediction request
public String getModelPredStatus(String req) Given a request get the prediction status. The req is as follows; req = {"schema-name":, "model_name": } It returns NULL for error or errcode as -1 else errcode.
To delete prediction request
public int delPredRequest(String req) Deletes the request. The input param req is as follows; req = {"schema-name":, "model_name": “file_name”:} It returns 0 for success and -1 for error
To get list of all the training request
public ResultSet getTrainRequests(String req) This returns all the training requests made so far for a schema. The prev_rs should be NULL for the first call and for subsequent calls, just pass the previous rs. Upon success it returns 0 else -1 for error
To get training request from the ml housekeeping
public String getRequestDetail(String req) It returns response with status or NULL for error or if req not found
To get the buckets list for a user
public String listBuckets(String req) This returns the list of all buckets for the user given by req which looks like following; {"access_key":"akey", "secret_key":"skey"} It may return NULL as well in case of error
To get list of all buckets
public String listAllBuckets(String req)
To get list of object in a given bucket
public String listObjects(String req, String skey, int listSizeMB) This returns a json string with the list of objects in a given bucket for a given key or for all keys (in case of skey is NULL). It may return NULL for error as well. list_size_mb defines the max size of the list, by default it would return 2MB of data or less
To count number of models for a given schema
public long getModelCount(String req) This counts the models for a given schema else returns -1 for error
For admin settings
public int reinitMDM(String req)
To check if the BRS is local or its a distributed system
public boolean isBRSLocal() returns if BRS is local, useful for distributed mode or server
To download a file from BRS
public long downloadFile(String bucketInfo, String key, String fname, String fpath) The key is the name/id of the file to be downloaded and bucketinfo details information about the bucket from which the file has to be downloaded and fpath is the location on the local system where to download the file with name of the file as fname. It returns 0 for success else -1 for error.
To get object from a particular bucket
public byte[] getObject(String bucketInfo, String key) It gets the object(binary or otherwise) from the given bucket, key. It returns 0 for success else -1 for error.
To get the number of buckets
public long countBuckets() This returns a number of buckets else -1 for error.
To get the count of object in a particular bucket
public long countObjects(String bucket_info) This counts the number of objects in the given bucket else returns -1 for error.
To get list of object in a particular bucket
public String countObjectsDetails(String bucket_info) This gives the details of all the objects in the given bucket(bucket_info) else returns NULL for error.
To count the number of slices for a given key
public int countSlices(String bucket_info, String key) BRS stores large files and objects in chunks, therefore we can count how many slices are there for the given file (key) by calling this function. It returns count of slices else -1 for error.
To delete a file from a particular bucket
public int delFile(String bucket_info, String key) it returns 0 for success else -1 for error
To delete or drop a bucket from BRS
public int delBucket(String bucket_info)
To closes the bangsb ml helper
public synchronized void closeMLHelper() it returns 0 for success else -1 for error