Query in BangDB – BangDB = NoSQL + AI + Stream

Query in BangDB


Querying BangDB

Filter for scan and data retrieval in BangDB

For scanning data in BangDB, we may use primary key based scan or secondary key based scan or text key(reversed) based scan or all of these together. This makes the data scan a very robust and flexible process. To help users to deal with definition of these queries, we use dataQuery type. It's not required that we use this type, we could simply write the query in json form and operate

Scan always returns resultset, which is nothing but an iterable list of key, val which allows certain operations as well. This list defined by type resultset.

Scan may return NULL as well if error is encountered, hence user has to handle NULL as well.

Since table or stream may contain large amount of data, hence it will not be able to return all of them at once, hence it will keep returning as required or called by the user.

User may set the limits as well and certain other conditions for filtering. These affect the way data is retrieved and also amount of data is retrieved, both in terms of number of rows or size of the data. It is defined by ScanFilter;

ScanFiler is defined as; public ScanOperator skeyOp; public ScanOperator ekeyOp; public ScanLimitBy limitBy; public int limit; public int skipCount; public int onlyKey; public int reserved; ScanOperator is defined as below; GT, // greater than GTE, // greater than equal to LT, // less than LTE, // less than equal to EQ, // equal to NE; // not equal to The ScanOperator is always applied to primary keys only and not the secondary keys. For secondary keys, we use dataQuery which is defined below ScanLimitBy is used to limit the size of the data that should be retrieved in a single call LIMIT_RESULT_SIZE, // limit by size, it takes integer which is in MB LIMIT_RESULT_ROW, // limit by the number of rows OnlyKey is 0 if we wish to retrieve both key and value, else 1 for only key [ Note: we overload this for special scenario for composite index, discussed later ] reserved is also not used most of the time, however, we may leverage this in some cases, as described later
Once we call scan, then it may return partial data and hence we need to keep calling this as needed to get all the data. Here is sample pseudo code for calling scan
Typical way to call scan function is as follows; ScanFilter sf = new ScanFilter(); ResultSet rs = null; while(true) { rs = tbl.scan(rs, pk1, pk2, sf); if(rs == null) break; while(rs.hasNext()) { // use rs rs.getNextkey(), rs.getNextVal() rs.moveNext(); } } This will allow user to retrieve the data. If user wishes to break before data retrieval is done, then user will have to clear the rs by calling rs.clear();

Scan API

Let's look at the typical scan api in the BangDB. It has following signatures;

For non-json data, i.e. text or opaque data; Applicable or exposed by BangDBTable; For NORMAL_TABLE; public ResultSet scan(ResultSet prev_rs, String pk_skey, String pk_ekey, ScanFilter sf, Transaction txn) The same is supported for long and byte[] pk_skey and pk_ekey as well. For document or json data scan; WIDE_TABLE public ResultSet scanDoc(ResultSet prev_rs, String pk_skey, String pk_ekey, String idx_filter_json, ScanFilter sf) The same is supported for long and byte[] pk_skey and pk_ekey as well. This one has one extra argument, idx_filter_scan, and this is used for querying using keys other than primary keys.
For NORMAL_TABLE or for scan(), it's straight forward as we can only use primary keys there. When we wish to scan entire table then we may pass null for pk_skey and pk_ekey for non long type. For long we may use 0 and LONG_MAX_VAL

For WIDE_TABLE or non-primary key based scan, we have detailed discussed below;

Non primary key based scan

Apart from primary keys, we can use secondary and text(reverse) keys to query data. If we create indexes on these secondary keys then it will boost performance but the index is not required for querying data using these secondary non-primary keys. However, it's highly recommended to strategically create these secondary, reverse indexes for high performance and efficient query

Now, let's see what's these secondary keys are. Let's consider a sample event or doc/data;

{ "name": "sachin", "org": "bangdb", "address": { "home": { "city": "bangalore", "state": "ka", "pin": 560034 }, "office": { "city": "bangalore", "state": "ka", "pin": 560095 } }, "fav-qoute": "The happiness of your life depends on the quality of your thoughts" }
As you see, there could be multiple ways to query here, few examples are;
query1 = using "name",  ex; where "name" = "sachin" etc...

query2 = using  "" = "bangalore"

query3 = using match text, like "quality, thought" [ Note, we use reverse index here, search with list of tokens ]

 and so on...

Further we may wish to organize a key in composite manner for the suitability of use cases

Here in this doc, we have primary key as long, string or composite and then query using primary key in interesting ways;

While long, opaque and string primary keys are fine, composite key is quite interesting and useful in many scenarios;

Let's say we wish to have primary key as composite key with following arrangement;

city:name or city:org:name etc.

Now we have quite flexibility in querying in different manner;

query4 = find all docs where city could be any city but name is "sachin"; here we may use


Or name has "sac" as initial characters

*:sac$% [ $% means match everything before these chars but after ':' ]

Or any city and any name as long as org is "bangdb"

*:bangdb:* [ using city:org:name as key arrangement ]

query5 = find all the doc where is equal to = $

This allows users to scan with data present in the doc itself [ helpful in stream ]
See the next section for example code