bg
Index – BangDB = NoSQL + AI + Stream

Index

BangDB 2.0

BangDB 2.0 supports several indexing methods to be able to query them efficiently at run time

Following indexing techniques could be used depending upon the use cases;

primary key index secondary key nested key composite key geospatial reverse
Let's discuss all these in detail now.

Primary key

Every data, value or document has a primary key associated with it in BangDB, explicitly or implicitly (timestamp). Therefore, we will need to store these data in such a manner where get() or scan() works efficiently.
When we wish to store in sorted manner then we should use BTREE as method else EHASH should be used.
To set the method for arranging the primary key, we will have to tell this while creating the table. This can be done by setting up the right parameter in the table_env object.
The parameter that we set for this is idx_type.
By default it is set to BTREE, but still we can set it as per the value defined in enum bangdb_index_type;

//types of pk arrangement enum bangdb_index_type { HASH,  Not supported anymore EXTHASH, BTREE, HEAP,  Not supported anymore INVALID_INDEX_TYPE, };
As you see there are only two options here, BTREE or EXTHASH.
Note: Once set and table created, we can't change this for the lifetime of the table

Query
The index on primary key is implicitly used when we call get() for a key. When we do scan(k1, k2); then db searches using the index.
Let's see some actual query using this.

Let's say we have a table "mytable" where the primary key is indexed using BTREE, and we wish to scan the table for all the keys between k1 and k2.
If it's a NORMAL_TABLE;

resultset *rs = NULL;
scan_filter sf;    // to set some simple filter criteria
while(true)
{
       rs = tbl->scan(rs, k1, k2, &sf);
       f(!rs)
             break;
       while(rs->hasNext())
       {
               // use rs key and val
               rs->moveNext();
       }  
}

Secondary Key

Secondary keys or indexes are created once the table is created or opened. This can only be done for Wide_Table type. Please see bangdb_table_type for more info. Using addIndex() method we can do this. Again we use table_env to set the property of the index.
Here are methods to do the same;

int addIndex(const char *idxName, table_env *tenv);
The above is the generic method to add index, however for simplicity there are two helper methods are provided to create, one index for string key and other for nominal key. We can use these two assuming all default setting is acceptable or we can use the generic one which gives us more flexibility.
The helper ones are following
//always creates index with following property BTREE, INMEM_PERSIST, QUASI_LEXICOGRAPH, SORT_ASCENDING, log = off int addIndex_str(const char *idxName, int idx_size, bool allowDuplicates); int addIndex_num(const char *idxName, bool allowDuplicates);
Now, let's create index for sting as follows on a field "name"
tbl->addIndex_str("name", 32, true);
We just created an index for field "name", with max size as 32 and allowed duplicate entries for the same.
Now, when we put document into the table, db will add index for the name column as well
const char *doc = "{"name":"sachin", "org":"bangdb", "address":{"city":"Bangalore","state":"KA"}, "fav_quote":"We build too many walls and not enough bridges"}";
tbl->put_doc(doc);
We should check for error returned by the api put_doc, but ignored for brevity.
Here, we did not provide explicit primary key, therefore db will add "_pk" field in the doc and use it as primary key. This "_pk" is timestamp. Also, we didn't specify if we wanted reverse index to be created or now, later we will see this as well.
Now let's scan using the index;
const char *query = "{"query":[{"key":"name", "cmp_op":4, "val":"sachin"}]}";
// this query is to find all docs where name = "sachin", 4 is EQ enum value for scan_operator enum.
// we can also create this using dataQuery as follows;
/*
dataQuery dq;
dq.addFilter("name", EQ, "sachin", JO_INVALID);
const char *query = dq.getQuery();
We can keep adding like this to create complex query
*/
resultset *rs = NULL;
scan_filter sf;
while(true)
{
    rs = tbl->scan_doc(rs, NULL, NULL, query, &sf);
    if(!rs)
        break;
    while(rs->hasNext())
    {
        // keys and vals
        rs->moveNext();
    }
}

Nested Key

Like in previous section, taking the same document, now let's say we wish to index city as well so that we could search people using city. To do this we will create the nested index like this;

tbl->addIndex("address.city", 40, true);
And now we can search using the scan_doc again for both name and city like following; Note than joinop is 0 which means AND, see the join_operator enum in the enum section.
Therefore, the query looks like following;
scan the table where name is sachin and city is bangalore
const char *query = {"query":[{"key":"name", "cmp_op":4, "val":"sachin"},{"joinop":0},{"key":"address.city", "cmp_op":4, "val":"Bangalore"}]};

resultset *rs = NULL;
scan_filter sf;
while(true)
{
    rs = tbl->scan_doc(rs, NULL, NULL, query, &sf);
    if(!rs)
        break;
    while(rs->hasNext())
    {
        // keys and vals
        rs->moveNext();
    }
}

Composite Key

Composite index again can be created for WIDE_TABLE type. To create composite index for primary key, we need to create table with following setting;

table_env te;
te.set_table_type(WIDE_TABLE);
te.set_key_type(COMPOSITE_KEY);
bangdb_table *tbl = bdb->getTable((char*)"topview", &te);
There are several helper function to create composite key before putting the data into the table. Here data could be json doc or normal text or opaque data.
Here is an example for putting data into the table
	char *k, *v;
	FDT fk, fv;

	k = makeComposite((char*)"1", (char*)"a", NULL);
	v = (char*)"val13";
	fk.data = k;
	fk.length = strlen(k);
	fv.data = v;
	fv.length = strlen(v);
	tbl->put(&fk, &fv, INSERT_UNIQUE);
	fk.free();

        // more data insert

	k = makeComposite((char*)"2", (char*)"a", NULL);
	v = (char*)"val9";
	fk.data = k;
	fk.length = strlen(k);
	fv.data = v;
	fv.length = strlen(v);
	tbl->put(&fk, &fv, INSERT_UNIQUE);
	fk.free();

	k = makeComposite((char*)"1", (char*)"b", NULL);
	v = (char*)"val7";
	fk.data = k;
	fk.length = strlen(k);
	fv.data = v;
	fv.length = strlen(v);
	tbl->put(&fk, &fv, INSERT_UNIQUE);
	fk.free();
Now, let's fire some query;
         resultset *rs = NULL;
        //to scan all data
        while(true)
        {
               rs = tbl->scan_doc(NULL);
               if(!rs)
                     break;
               while(rs->hasNext())
               {
                       rs->moveNext();
              }
        }

        rs = NULL;
        char *tval1, *tval2;
	FDT skey, ekey;
	scan_filter sf;
	sf.skey_op = GTE;
	sf.ekey_op = LTE;
       
	tval1 = (char*)"1:*";
	tval2 = (char*)"1:*";
	
	skey.data = tval1;
	skey.length = strlen(tval1);
	ekey.data = tval2;
	ekey.length = strlen(tval2);
	
        while(true)
        {
               rs = tbl->scan_doc(NULL, &skey, &ekey);
               if(!rs)
                     break;
               while(rs->hasNext())
               {
                       rs->moveNext();
              }
        }
We can also match with the second part of the key using % char, like following;
	tval1 = (char*)"1:b$%";
	tval2 = (char*)"3:b";

        // to match all data with 1:b... until 3:b
	skey.data = tval1;
	skey.length = strlen(tval1);
	ekey.data = tval2;
	ekey.length = strlen(tval2);

        while(true)
         {
               rs = tbl->scan_doc(NULL, &skey, &ekey);
               if(!rs)
                     break;
               while(rs->hasNext())
               {
                       rs->moveNext();
              }
         }
Similarly;
	tval1 = (char*)"*:a";
	tval2 = (char*)"*:a";
	
	skey.data = tval1;
	skey.length = strlen(tval1);
	ekey.data = tval2;
	ekey.length = strlen(tval2);

        while(true)
         {
               rs = tbl->scan_doc(NULL, &skey, &ekey);
               if(!rs)
                     break;
               while(rs->hasNext())
               {
                       rs->moveNext();
              }
         }

Reverse index

This also can be done with WIDE_TABLE type only. Therefore to allow reverse indexing in the table, we need to create or open table like following;

	table_env tenv;
	tenv.set_table_type(WIDE_TABLE);
	tenv.set_rev_idx(true);
	bangdb_table *tbl = bdb->getTable("mywidetable", &tenv);
Now taking the same document as above, let's say we would like to reverse index the content of the field "fav_quote", then we would like to add following during put
const char *rev_idx = "{"_rev_idx_all":0, "_rev_idx_key_list":["fav_quote"]}";

// note that _rev_idx_all if set 1 then it will reverse index all the fields in the doc. Also _rev_idx_key_list is not required in that condition

tbl->put_doc(doc, NULL, rev_idx, INSERT_UNIQUE);
Now to scan for query like above but will add reverse idx condition
const char *query = {"query":[{"key":"name", "cmp_op":4, "val":"sachin"},{"joinop":0},{"key":"address.city", "cmp_op":4, "val":"Bangalore"},{"joinop":0},{"match_words":"wall, bridge", "joinop":0, "field":"fav_quote"}]}; // now call the scan same way to get the data
Please note, we could use primary key filter every where by passing the keys during scan.
We could also put normal text (not json doc) and scan using put_text and scan_text.
In case of put_text(), db will reverse index the input text. For scan, simply create a list of all the tokens for search in a char*[] and then call scan_text()