catr – Computing attribute – BangDB = NoSQL + AI + Stream

catr – Computing attribute


Computed attributes


This is useful to compute extra set of attributes from the event stream as we ingest data from the source. For ex; if we get attribute a, b, c,... etc. and we wish to compute a3 based on some logic, then we need to define that here. This is how it looks;

"catr":[{"name":"m", "type":9, "opid":3, "stat": 3, "iatr":["b", "c", "d"]}, {"name":"n", "type":11, "stat":1, "opnm":"comp_int", "iatr":["g", "h"]}, {"name":"o", "type":5, "opnm":"string_add", "iatr":["a", "b"]}, {"name":"p", "type":5, "opid":3, "opnm":"myudf3", "iatr":["c", "b"]}, {"name":"mexp", "type":9, "opid":13, "iatr":["((($g+$h)*2)+($g*$h))"]} ]
Let's look at each one by one;

{"name":"m", "type":9, "opid":3, "stat": 3, "iatr":["b", "c", "d"]}
it says that comupte new attribute m of type 9(long), from (b, c, d) using opid : 3 (multiply) and enable "stat" as well (type 3, running stats).
attribute type; 5 - for string, char * 9 - long (can be used for int short as well) 11 - double

"iatr" tells the input attribute, opid tells which operation to use. T
Note: Everywhere in stream, "iatr" is used of set of attributes coming from "this" stream, i.e. the stream for which it's defined
here are few default operations that can be used or user may upload a udf (user defined function - as explained in udf section separately) and use that. Following default operations are available within db;
enum BANGDB_DEFAULT_UDF { // following are for computations of value of different attribute BANGDB_DEFAULT_UDF_COPY = 1, BANGDB_DEFAULT_UDF_ADD, BANGDB_DEFAULT_UDF_MUL, BANGDB_DEFAULT_UDF_DIV, BANGDB_DEFAULT_UDF_PERCENT, // following is for applying rule for refr, join BANGDB_DEFAULT_UDF_EQ, // matches two variables for equality, returns 1 or 0 BANGDB_DEFAULT_UDF_LT, BANGDB_DEFAULT_UDF_LTE, BANGDB_DEFAULT_UDF_GT, BANGDB_DEFAULT_UDF_GTE, // for refr/join etc computation, whether to compare refr attr with this event attr or fixed data BANGDB_CMP_STREAM_TYPE, BANGDB_CMP_FIXED_TYPE, BANGDB_CMP_STREAM_TYPE_$, BANGDB_CMP_HYBRID_TYPE, BANGDB_DEFAULT_UDF_SUB, BANGDB_DEFAULT_UDF_INVALID = 1024 };
Another one

{"name":"n", "type":11, "stat":1, "opnm":"comp_int", "iatr":["g", "h"]}
It tells, compute attribute n of type(double) from input attribyte (g, h) using a udf name comp_int (implemented and uploaded by the user) and enable "stat": 1 (counting)

{"name":"o", "type":5, "opnm":"string_add", "iatr":["a", "b"]}
It computes an attribute 0 of type 5(string) from input attributes (a, b) using udf string_add

{"name":"p", "type":5, "opid":3, "opnm":"myudf3", "iatr":["c", "b"]
This is also in similar lines, but it has both opid and opnm, in such case it uses "myudf3"(opnm) first and in case it's not there then it uses opid 3

{"name":"mexp", "type":9, "opid":13, "iatr":["((($g+$h)*2)+($g*$h))"]
This is bit different, here it computes attribute "myexp" of type 9(long) using "opid": 13(BANGDB_CMP_STREAM_TYPE_$) which says use math expression as defined in the "iatr" for the given input attributes. Here it adds g and h values then multiplies by 2 and the adds with mulitple of g and h

This should be simple enough, but it has lots of value as we can create new attributes and associate them with the stream before further processing.
Note: This is the first processing done when the event is ingested