catr – Computing attribute – BangDB = NoSQL + AI + Stream

catr – Computing attribute


Computed attributes


This is useful to compute extra set of attributes from the event stream as we ingest data from the source. For ex; if we get attribute a, b, c,... etc. and we wish to compute a3 based on some logic, then we need to define that here. This is how it looks;

"catr":[{"name":"m", "type":9, "opnm":"MUL", "stat": 3, "iatr":["b", "c", "d"]}, {"name":"n", "type":11, "stat":1, "opnm":"comp_int", "iatr":["g", "h"]}, {"name":"o", "type":5, "opnm":"string_add", "iatr":["a", "b"]}, {"name":"p", "type":5, "opid":3, "opnm":"myudf3", "iatr":["c", "b"]}, {"name":"mexp", "type":9, "opnm":"MATH_EXP", "iatr":["((($g+$h)*2)+($g*$h))"]}, {"name":"x", "type":11, "opnm":"PRED", "model":"mymodel1", "algo":"SVM", "attr_type":"HYB", "iatr":["a", "b", "c"]} ]
Let's look at each one by one;

{"name":"m", "type":9, "opnm":"MUL", "stat": 3, "iatr":["b", "c", "d"]}
it says that comupte new attribute m of type 9(long), from (b, c, d) using opnm : "MUL" (multiply) and enable "stat" as well (type 3, running stats).
attribute type; 5 - for string, char * 9 - long (can be used for int short as well) 11 - double

"iatr" tells the input attribute, opid tells which operation to use. T
Note: Everywhere in stream, "iatr" is used of set of attributes coming from "this" stream, i.e. the stream for which it's defined
here are few default operations that can be used or user may upload a udf (user defined function - as explained in udf section separately) and use that. Following default operations are available within db;
when we wish to use "opid" instead of "opnm", we may use following enum BANGDB_DEFAULT_UDF { // following are for computations of value of different attribute BANGDB_DEFAULT_UDF_COPY = 1, BANGDB_DEFAULT_UDF_ADD, BANGDB_DEFAULT_UDF_MUL, BANGDB_DEFAULT_UDF_DIV, BANGDB_DEFAULT_UDF_PERCENT, BANGDB_DEFAULT_UDF_SUB, BANGDB_DEFAULT_UDF_UPPER, // for string, it's upper case, for double it's ceiling, long doesn't care BANGDB_DEFAULT_UDF_LOWER, // for string, it's lower case, for double it's floor, long doesn't care BANGDB_DEFAULT_UDF_COPY_VAL, BANGDB_DEFAULT_UDF_LOG_BASE_E, BANGDB_DEFAULT_UDF_LOG_BASE_2, BANGDB_DEFAULT_UDF_LOG_BASE_10, BANGDB_DEFAULT_UDF_MATH_EXP, BANGDB_DEFAULT_UDF_INVALID = 1024 }; In the "opnm", we would however add following; "COPY" // simply copy the attribute val "ADD" // add two attributes values "MUL" // multiply the attributes values "DIV" // divide the left attribute with the right one "PERCENT" // compute percentage, left of right "SUB" // subtract right one from left one "UPPER" // convert attribute value to upper case "LOWER" // convert attribute values to lower case "COPY_VAL" // this is to copy the value, not the attribute value, but whatever value is provided "LOG_E" // log to the base e "LOG_2" // log to the base 2, ln "LOG_10" // log to the base 10 "MATH_EXP" // math expression, involving attributes and fixed values
Another one

{"name":"n", "type":11, "stat":1, "opnm":"comp_int", "iatr":["g", "h"]}
It tells, compute attribute n of type(double) from input attribyte (g, h) using a udf name comp_int (implemented and uploaded by the user) and enable "stat": 1 (counting)

{"name":"o", "type":5, "opnm":"string_add", "iatr":["a", "b"]}
It computes an attribute 0 of type 5(string) from input attributes (a, b) using udf string_add

{"name":"p", "type":5, "opid":3, "opnm":"myudf3", "iatr":["c", "b"]
This is also in similar lines, but it has both opid and opnm, in such case it uses opid 3.
Note: When both "opid" and "opnm" are given then "opid" is used
{"name":"mexp", "type":9, "opid":13, "iatr":["((($g+$h)*2)+($g*$h))"]
This is bit different, here it computes attribute "myexp" of type 9(long) using "opid": 13(BANGDB_DEFAULT_UDF_MATH_EXP) which says use math expression as defined in the "iatr" for the given input attributes. Here it adds g and h values then multiplies by 2 and the adds with mulitple of g and h

This should be simple enough, but it has lots of value as we can create new attributes and associate them with the stream before further processing.
Note: This is the first processing done when the event is ingested after "refr" is done

Now, let's look at how to do prediction on stream.

{"name":"x", "type":11, "opnm":"PRED", "model":"mymodel1", "algo":"SVM", "attr_type":"HYB", "iatr":["a", "b", "c"]}
Let's say when we ingest events in any stream, we wish to use set of attributes in the event and then use pre-trained model to do prediction and store the prediction output in the stream itself in some attribute.
Here, we use "catr" and pretty much use the defined structure of the "catr" expect few additions and they are;

"opnm": here we use "PRED"
"model": name of the model that we have trained
"algo": name of the algorithm using which we trained the model
"attr_type": type of the attributes, if all are numerical use "NUM", when all are string, use "STR" and when we have mixed use "HYB"
"iatr": this contains all the attributes that should be participating in the prediction. Order is important here. However, if not clear, then just put what makes sense at that point in time, later during prediction, DB will correct this and use the right one (by learning from the trained model, how the model was trained?)

Rest all is same, we can use "stat" on the attribute, further this attribute can participate in different computations of the "catr", etc...
Note: The order in which the items are defined in the "catr":[] array is important. DB computes these in the order in which they are defined. Therefore, one "catr" could use the attribute computed by other "catr" only if the other "catr" was defined before this "catr"

Few examples

Let's say, we have a stream of data with attributes {a,b,c,m}, here is the schema for the same;

{ "schema":"myapp", "streams":[ { "name":"product", "type":1, "swsz":86400, "inpt":[], "attr": [ {"name":"a","type":5,"sidx":1,"stat":2,"ridx":1}, {"name":"b","type":9,"stat":3}, {"name":"m","type":11,"stat":3}, {"name":"c","type":5,"kysz":24,"stat":2} ] } ] }
Now, let's compute several other attributes as required;
1. Lower the attribute a, i.e replace "Sachin" to "sachin"
Here we would like to apply replace the attribute a with lower form of the same. Therefore for us here both "name" and "iatr" will be the same.

To replace the attribute, we have "fnr" tag that we can set. "fnr" can take following values;
"fnr" key options; 1 = just add this attribute after compute into the stream [ default ] 2 = replace if existing, else just add 3 = add only if it's not existing (missing value case)
Therefore, we can do following;
2. Add missing fixed value, if missing then add else ignore
3. Compute new attribute using math expression