Filter for scan and data retrieval in BangDB

For scanning data in BangDB, we may use primary key based scan or secondary key based scan or text key(reversed) based scan or all of these together. This makes the data scan a very robust and flexible process. To help users to deal with definition of these queries, we use dataQuery type. It's not required that we use this type, we could simply write the query in json form and operate.

Scan always returns resultset, which is nothing but an iterable list of key, val which allows certain operations as well. This list defined by type resultset.

Scan may return NULL as well if error is encountered, hence user has to handle NULL as well. Since table or stream may contain large amount of data, hence it will not be able to return all of them at once, hence it will keep returning as required or called by the user.

User may set the limits as well and certain other conditions for filtering. These affect the way data is retrieved and also amount of data is retrieved, both in terms of number of rows or size of the data. It is defined by ScanFilter.

// ScanFiler is defined as
  
public ScanOperator skeyOp;
public ScanOperator ekeyOp;
public ScanLimitBy limitBy;
public int limit;
public int skipCount;
public int onlyKey;
public int reserved;
// ScanOperator is defined as below

  GT, // greater than
  GTE, // greater than equal to
  LT, // less than
  LTE, // less than equal to
  EQ, // equal to
  NE; // not equal to The ScanOperator is always applied to primary keys only and not the secondary keys.

For secondary keys, we use dataQuery which is defined below ScanLimitBy is used to limit the size of the data that should be retrieved in a single call LIMIT_RESULT_SIZE.

  • limit by size, it takes integer which is in MB LIMIT_RESULT_ROW
  • limit by the number of rows OnlyKey is 0

If we wish to retrieve both key and value, else 1 for only key.

Once we call scan, then it may return partial data and hence we need to keep calling this as needed to get all the data. Here is sample pseudo code for calling scan. Typical way to call scan function is as follows:

ScanFilter sf = new ScanFilter();
ResultSet rs = null;
while (true) {
   rs = tbl.scan(rs, pk1, pk2, sf);
   if (rs == null) break;
   while (rs.hasNext()) {
      // use rs 
      rs.getNextkey(),
      rs.getNextVal()
      rs.moveNext();
   }
}

This will allow user to retrieve the data. If user wishes to break before data retrieval is done, then user will have to clear the rs by calling.

rs.clear();

Scan API

Let's look at the typical scan API in the BangDB. It has following signatures:

For non-json data, i.e. text or opaque data.

Applicable or exposed by BangDBTable.

For NORMAL_TABLE:

public ResultSet scan( ResultSet prev_rs,
  String pk_skey,
  String pk_ekey,
  ScanFilter sf,
  Transaction txn
)

The same is supported for long and byte[] pk_skey and pk_ekey as well.

For document or json data scan:

For WIDE_TABLE:

public ResultSet scanDoc( ResultSet prev_rs,
  String pk_skey,
  String pk_ekey,
  String idx_filter_json,
  ScanFilter sf
)

The same is supported for long and byte[] pk_skey and pk_ekey as well. This one has one extra argument, idx_filter_scan, and this is used for querying using keys other than primary keys.

For NORMAL_TABLE or for scan(), it's straight forward as we can only use primary keys there. When we wish to scan entire table then we may pass null for pk_skey and pk_ekey for non long type. For long we may use 0 and LONG_MAX_VAL

For WIDE_TABLE or non-primary key based scan, we have detailed discussed below.

Non primary key based scan

Apart from primary keys, we can use secondary and text(reverse) keys to query data. If we create indexes on these secondary keys then it will boost performance but the index is not required for querying data using these secondary non-primary keys. However, it's highly recommended to strategically create these secondary, reverse indexes for high performance and efficient query.

Now, let's see what's these secondary keys are. Let's consider a sample event or doc/data.

C++
Java

Here is a sample program which does most of the operations to help you understand the APIs and their usage.

{
  "name":"sachin",
  "org":"bangdb",
  "address":{
     "home":{
        "city":"bangalore",
        "state":"ka",
        "pin":560034
     },
     "office":{
        "city":"bangalore",
        "state":"ka",
        "pin":560095
     }
  },
  "fav-qoute":"The happiness of your life depends on the quality of your thoughts"
}

As you see, there could be multiple ways to query here, few examples are:

query1 = using "name",  ex; where "name" = "sachin" etc...
query2 = using  "address.home.city" = "bangalore"
query3 = using match text, like "quality, thought"
// [ Note, we use reverse index here, search with list of tokens ]
// and so on...

Further we may wish to organize a key in composite manner for the suitability of use cases Here in this doc, we have primary key as long, string or composite and then query using primary key in interesting ways; While long, opaque and string primary keys are fine, composite key is quite interesting and useful in many scenarios; Let's say we wish to have primary key as composite key with following arrangement:

city:name 
//or
city:org:name
//etc.

Now we have quite flexibility in querying in different manner:

query4 = find all docs where city could be any city but name is "sachin"; 
//here we may use
*:sachin

Or name has "sac" as initial characters

 *:sac$% [ $% means match everything before these chars but after ':' ]

Or any city and any name as long as org is "bangdb"

*:bangdb:* [ using city:org:name as key arrangement ]
query5 = find all the doc where home.city is equal to office.city home.city = $office.city

This allows users to scan with data present in the doc itself (helpful in stream). See the next section for example code.

DataQuery Type API for client

C++
Java

Create DataQuery object

DataQuery();

To add a query when filter value is string

void addQuery(
  const char *filterKey, 
  ScanOperator comp_op, 
  const char *filterVal, 
  JoinOperator jOp = JO_AND
);

To add a query when filter value is long

void addQuery(const char *filterKey, ScanOperator comp_op, long filterVal, JoinOperator jOp = JO_AND);

To add a query when filter value is double

void addQuery(const char *filterKey, ScanOperator comp_op, double filterVal, JoinOperator jOp = JO_AND);

To add a query when filter values are list of words

void addQuery(const char *matchWordList, JoinOperator wordJoin, JoinOperator queryJoin, const char *field);

To get QueryType

void setQueryType(int type);

To get the query

const char *getQuery();

User should delete the returned data using delete[] To print query

void printQuery();

To delete DataQuery object

virtual ~DataQuery();

Please see Table API for details on Scan API

To add a query when filter values are string

public void addQuery(String filterKey, ScanOperator cmpOp, String filterVal, JoinType joinOp)

To add a query when filter values are long

public void addQuery(String filterKey, ScanOperator cmpOp, long filterVal, JoinType joinOp)

To add a query when filter value is double

public void addQuery(String filterKey, ScanOperator cmpOp, double filterVal, JoinType joinOp)

To add filter Query

public void addQueryFilter(String filterKey, ScanOperator cmpOp, String filterVal, JoinType joinOp)
public void addQueryFilter(String filterKey, ScanOperator cmpOp, long filterVal, JoinType joinOp)
public void addQueryFilter(String filterKey, ScanOperator cmpOp, double filterVal, JoinType joinOp) 

To get the query

public String getQuery()

Please see Table API for details on Scan API