bg
BangDBMLHelper – Embedded Java – BangDB = NoSQL + AI + Stream

BangDBMLHelper – Embedded Java

chevron

BangDBMLHelper

Embedded, Java

public static synchronized BangDBMLHelper getInstance(BangDBDatabase bdb, long mem_budget)
To get instance of the ml helper, we call this api. It takes BangDBDatabase as required argument. It takes mem_budget as optional parameter. The mem_budget defines the amount memory we allocate for ML related activities, bangdb will always respect this budget. This is important when we run db and ml on same box or in embedded mode or when multiple users are using it and we wish to server all of them or when we wish to ensure ml memory overflow doesn’t create problem for the users Upon success it returns reference to the ml_helper else NULL
public String toString()
Returns the detail of the object as string
public int createBucket(String bucket_info)
All intermediate files, models or training/ testing related files are stored within BRS (bangdb resource server) in some bucket. This api allows us to create bucket as defined by the bucket_info which looks like following;
{access_key:, secret_key:, bucket_name:} public void setBucket(String bucket_info)
This is similar to create_bucket, but if there is existing bucket with the name then it will change that to this bucket.
public long uploadFile(String key, String path, InsertOptions flag)
This is any ml related file that we wish to further use for training or testing or prediction. Key is the id for the file and fpath takes the path to the file including the file name. Please note it uploads in the default bucket. To load in a particular bucket, please use other api described below
It returns 0 for success else -1 for error Please see AI section for more information.
public long uploadFile(String bucketInfo, String key, String path, InsertOptions flag)
Same as above, except it will upload the file in the given bucketInfo. Aove api will put in the default bucket.
public String trainModel(String req)
This is to train a model. It takes a training request and returns status of the training request. The training request looks like following;

{ "schema-name": "id", "algo_type": "SVM", "algo_param": { "svm_type": 1, "kernel": 2, "degree": 3, "gamma": 0.2, "cost": 1.1, "cache_size": 50, "probability": 0, "termination_criteria": 0.001, "nu": 0.5, "coef0": 0.1 }, "attr_list": [ { "name": "a1", "position": 1 }, { "name": "a2", "position": 2 } ], "training_details": { "training_source": "infile", "training_source_type": "FILE", "file_size_mb": 110, "train_speed": 1 }, "scale": "Y/N", "tune_param": "Y/N", "attr_type": "NUM/STR", "re_format": "JSON", "custom_format": { "name": "ts_rollup", "fields": { "ts": "ts", "quantity": "qty", "entityid": "eid" }, "aggr_type": 2, "gran": 1 }, "model_name": "my_model1", "udf": { "name": "udf_name", "udf_logic": 1, "bucket_name": "udf_bucket" } }

Please see AI section for more information.
public long uploadStreamDataForTrain(String req)
public String getModelStatus(String req)
This is to get the status of the model when training request is fired. Req input parameter is like following;

req = {"schema-name":, "model_name": }
And the return value is like following;

{"schema-name":, "model_name":, "train_start_ts":, "train_end_ts":, "train_state":}

The train_state actually tells the status of the model. The value for train_state are as following;

enum ML_BANGDB_TRAINING_STATE
{
//error
ML_BANGDB_TRAINING_STATE_INVALID_INPUT = 10,
ML_BANGDB_TRAINING_STATE_NOT_PRSENT,
ML_BANGDB_TRAINING_STATE_ERROR_PARSE,
ML_BANGDB_TRAINING_STATE_ERROR_FORMAT,
ML_BANGDB_TRAINING_STATE_ERROR_BRS,
ML_BANGDB_TRAINING_STATE_ERROR_TUNE,
ML_BANGDB_TRAINING_STATE_ERROR_TRAIN,
ML_FILE_TYPE_ERROR_VAL_TESTDATA,
ML_FILE_TYPE_ERROR_VAL_TRAINDATA,
ML_BANGDB_TRAINING_STATE_LIMBO,
//intermediate states
ML_BANGDB_TRAINING_STATE_BRS_GET_PENDING,
ML_BANGDB_TRAINING_STATE_BRS_GET_DONE,
ML_BANGDB_TRAINING_STATE_REFORMAT_DONE,
ML_BANGDB_TRAINING_STATE_SCALE_TUNING_DONE,
ML_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING,
//training done
ML_BANGDB_TRAINING_STATE_TRAINING_DONE, //25
ML_BANGDB_TRAINING_STATE_DEPRICATED,
};

The above is true for ML related model status.
For IE (Information Extraction) related model status use following;

enum IE_BANGDB_TRAINING_STATE
{
//error
IE_BANGDB_TRAINING_STATE_INVALID_INPUT = 10,
IE_BANGDB_TRAINING_STATE_NOT_PRSENT,
IE_BANGDB_TRAINING_STATE_ERROR_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_HELPER_FILES,
IE_BANGDB_TRAINING_STATE_ERROR_BRS_FEATURE_EX,
IE_BANGDB_TRAINING_STATE_ERROR_BRS_HELP_FILES,
IE_BANGDB_TRAINING_STATE_ERROR_PRE_NER_TRAIN,
IE_BANGDB_TRAINING_STATE_LIMBO,
IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN,
IE_BANGDB_TRAINING_STATE_ERROR_NER_TRAIN_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_PRE_REL_TRAIN, //20
IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN,
IE_BANGDB_TRAINING_STATE_ERROR_REL_TRAIN_BRS,
IE_BANGDB_TRAINING_STATE_ERROR_REL_LIST_BRS,
IE_FILE_TYPE_ERROR_VAL_TRAINDATA,
IE_FILE_TYPE_ERROR_VAL_TESTDATA,
IE_FILE_TYPE_ERROR_VAL_CLASSDATA,
IE_FILE_TYPE_ERROR_VAL_TOTALEXDATA,
//intermediate states
IE_BANGDB_TRAINING_STATE_BRS_GET_PENDING,
IE_BANGDB_TRAINING_STATE_BRS_GET_DONE,
IE_BANGDB_TRAINING_STATE_HELPER_DONE, //30
IE_BANGDB_TRAINING_STATE_PRE_NER_DONE,
IE_BANGDB_TRAINING_STATE_NER_DONE,
IE_BANGDB_TRAINING_STATE_PRE_REL_DONE,
IE_BANGDB_TRAINING_STATE_REL_DONE,
IE_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING,
IE_BANGDB_TRAINING_STATE_BRS_RELLIST_UPLOAD_PENDING,
//training done
IE_BANGDB_TRAINING_HELP_DONE, //37
IE_BANGDB_TRAINING_STATE_TRAINING_DONE, //38
IE_BANGDB_TRAINING_STATE_DEPRICATED,
};

Please see AI section for more information.
public int setModelStatus(String req)
This sets the status for a particular train request. The status is as follows;
status = {"schema-name":, "model_name":, “status”: }
Upon success it returns 0 else -1 for error
public int delModel(String req)
This is used to delete the model by passing req parameter. Here is how req looks like;
req = {"schema-name":, "model_name": }
public int delTrainRequest(String req)
This is to delete the training request. Helpful when training got stuck for some reasons and the status was not updated properly. Here is how req looks like;
req = {"schema-name":, "model_name": }
public String predict(String req)
The predict api is used to predict for a particular data or event. It takes req as parameter and default parameter arg which describes the sorted position of the different features. It’s not required most of the time. Here is how request looks like;
{schema-name, attr_type: NUM, data_type:event, re_arrange:N, re_format:N, model_name: model_name, data:"1 1:1.2 2:3.2 3:1.1"}
{schema-name, attr_type: NUM, data_typee:FILE, re_arrange:N, re_format:N, model_name: model_name, data:inputfile}
{schema-name, attr_type: NUM, data_type:event, re_arrange:N, re_format:JSON, model_name: model_name, data:{k1:v1, k2:v2, k3:v3}}
public int predict_async(String req)
// not for embd
public String getModelPredStatus(String req)
Given a request get the prediction status. The req is as follows;
req = {"schema-name":, "model_name": }
retval = {"schema-name":, "model_name":, “pred_req_state”:, “file_name”:}
public int delPredRequest(String req)
Deletes the request. The input param req is as follows;
req = {"schema-name":, "model_name": “file_name”:}
It returns 0 for success and -1 for error
public ResultSet getTrainRequests(String req, String levk)
This returns all the training requests made so far for an account (or schema). The prev_rs should be NULL for the first call and for subsequent calls, just pass the previous rs.
Upon success it returnss 0 else -1 for error
public String getRequestDetail(String req)
This returns request (training) from the ml housekeeping. The request is as follows;
req = {"schema-name":, "model_name": }
It returns response with status or NULL for error or if req not found
public String listBuckets(String req)
This returns the list of all buckets for the user given by user_info which looks like following;
{"access_key":"akey", "secret_key":"skey"}
It may return NULL as well in case of error
public String listAllBuckets(String req)
public String listObjects(String req, String skey, int listSizeMB)
This returns json string with the list of objects in a given bucket for a given key or for all keys (in case of skey is NULL). It may return NULL for error as well. list_size_mb defines the max size of the list, by default it would return 2MB of data or less
public long getModelCount(String req)
This counts the models for a given account else returns -1 for error
public int reinitMDM(String req) only for admin and server case
// not for embdd
public boolean isBRSLocal() returns if BRS is local, useful for distributed mode or server
// not for embdd as it’s always true
public long downloadFile(String bucketInfo, String key, String fname, String fpath)
It downloads the file from the given bucket, key. It renames the file as “fname” and stores the file at “fpath”
It returns 0 for success else -1 for error.
public byte[] getObject(String bucketInfo, String key)
It gets the object(binary or otherwise) from the given bucket, key. It fills data with the object and sets the datlen as length or size of the object.
It returns 0 for success else -1 for error.
public long countBuckets()
This returns number of buckets else -1 for error.
public long countObjects(String bucket_info)
This counts the number of objects in the given bucket else returns -1 for error.
public String countObjectsDetails(String bucket_info)
This gives the details of all the objects in the given bucket(bucket_info) else returns NULL for error. Please note it may set error in the returned json value as well.
public int countSlices(String bucket_info, String key)
Since BRS (bangdb resource server) stores large files and objects in chunks, therefore we can count how many slices are there for the given file (key) by calling this function.
It returns count of slices else -1 for error.
public int delFile(String bucket_info, String key)
It deletes the file specified with key and bucket_info. Bucket info is as follows;
bucketInfo = {\"bucket_name\":\"ml_bucket_info\", \"access_key\":\"brs_access_key\", \"secret_key\":\"brs_secret_key\"}
it returns 0 for success else -1 for error
public int delBucket(String bucket_info)
It deletes the bucket as specified. Bucket info looks like following;
bucketInfo = {\"bucket_name\":\"ml_bucket_info\", \"access_key\":\"brs_access_key\", \"secret_key\":\"brs_secret_key\"}
it returns 0 for success else -1 for error
public synchronized void closeMLHelper(boolean force)
This closes the bangdb ml helper. Since reference count is maintained within the ml_helper therefore if force is set as false and there are open references then it would not close the ml_helper. But if force is set to be true or number of references is 0 then it will close the ml_helper. It returns 0 for success else -1 for error.