Index Concept

BangDB allows index on a property or set of properties to be created and later used during the query time. Having index on properties help the retrieval process by using the stored reference to the actual data. The usefulness of index on certain properties in certain cases is well known and established.

There are however challenges in managing the overhead of indexes and at the same time optimal use of resources during persistence or retrieval. BangDB tries to address these challenges by implementing several stuff around it while implementing the index. For example the index on different data types are treated differently internally which allows db to perfrom various operations in much efficient manner and at the same time avoiding lot of writes and reads at different places.

Index Implementation Overview

BangDB creates separate file for index (with .idx extension) with the name of the file as the index name. The index file basically keeps the key/field on which it's indexed and the offset to the actual data. Hence the concept of BangDB index is similar to that of other database systems.

When data is stored, BangDB always index on primary key, however user may create secondary indexes as many as required. BangDB always creates primary index implicitly however for secondary keys user need to create index with appropriate configuration. The primary key can be indexed using Btree or Hash however the secondary index can be created using Btree only at the moment

Types which support Index

BangDB supports index and operations around it for wideTable and wideConnection. Hence user need to create wideTable and create all indexes on it. Then get a wideConnection to the table and use put and scan etc... operations.

Note that wideTable and wideConnection is super set of table and connection, hence support all the APIs and use cases that normal table and connection would support. But it also adds the support of index (for both unstructured and structured data) for wideTable and wideConnection

Index on data

BangDB allows user to add index on opaque or structured data both. This comes handy because many a times we wish to store data which is not structured but we would like to retrieve them using some key other that the primary key which is not explicit in the data. In this case user needs to specify the index key and value with every put. The wideConnection allows user to do that.

For structured document like json (currently supported one), user may just have to create index on a particular field in the document and then with every put, BangDB adds appropriate indexes implicitly.

Sorting

BangDB uses Btree for storing the index. This means that user may do scan over the index to select the data. In coming release Hash based index will also be supported.

User can specify whether the sorting will be in increasing or decreasing order. However when scan is done, it returns resultset object which can be traversed in any direction.

Nested Index

For json data, user can add index on nested fields as well. For example;

	 jstr = "{"fname":"Alan", "lname":"Turing", "life":{"Nationality":"British", "education":{"school":"sherborne school", "university":"cambridge"},
	 		 "born":"23 june 1912", "died":"7 june 1954"}};
	 
	 //In this case we can add index for life.education.university. Note that the individual fields are separated by '.'.
	 
	 wtbl->addindex_str(" life.education.university", 32, true);
	 
	 //The 32 is fixed size for the value contained for the index and 'true' means duplicates are allowed.
	 

Now while searching we can scan using this index and specify the values accordingly, we will get the appropriate resultset.

Composite Keys

BangDB allows user to create composite key using two or more keys(fields in json doc) and store data for the key. This helps in scenarios where we need to define the composite key as unique even though the individual keys are non unique. The DB creates index on the composite key and allows users to search accordingly. Note that user need to set the key_type explicitly to tell db that composite keys will be used for the table. For ex;

	  table_env tenv;
		...
	  tenv.set_key_type(COMPOSITE_KEY);

	  table *tbl = db->gettable(tableName, OPENCREATE, &tenv);

	  //BangDB will use all the parts of the composite key while doing the search so that right results are returned.
	  
	  

Unique or Non-unique key

BangDB allows duplicate keys to be stored. This is true for both primary key and for secondary keys. For table user may set the allow_duplicate property on table_env or later when table is created or opened, one can call the setAllowDuplicate() to set or unset the duplicates. Note that in case of duplicate is allowed, it's advisable to use scan to retrieve data as there can be more than one row for a given key

ResultSet Operations

Indexes help in retrieving data, but currently user can scan a table using one field at a time. For example, let's say we have created two indexes, one on "name" and other on "department". So we will get two resultsets when we scan first on name and then on department.

	 rs1 = conn->scan(name, skey1, ekey1, &sf);
	 
	 rs2 = conn->scan(department, skey2, ekey2, &sf);
	 
	 //Now we can do various operations on these result sets. 

	 //Add two result sets (add will not duplicate keys which are there in both the sets)

	 rs1->add(rs2);

	 //Append (will simply append the sets, not care for duplicates)

	 rs1->append(rs2);

	 //Intersect

	 rs1->intersect(rs2);

	 //This can be done on multiple resultsets as required