bg

Geospatial Index

chevron

Geospatial Index

geoindex query for table or stream

BangDB allows geospatial index to be created on table or on streams. Once done then user can query the data using the index.
It also allows user to augment the query with other constraints as required. For simplicity, it uses the similar query structure as it uses for most of the other types of queries
Therefore we can query for something like;

1. Find all entities within a distance of X meters from a point p (lat, lon) 2. Find all entities within a box defined by bot-left and top-left rectangle 3. Find all restaurants named "Starbucks coffee" within 1000 meters from my current location 4. Find all book stores near coffee restaurant within 2000 meters from my current location etc..
To deal with geospatial index, BangDB takes the (lat,long), converts it into a geohash (string type) and stores the geohash in sorted manner using B+tree. Therefore it aligns well with the normal indexing model internally.
We have two different ways to deal with geoindex, explicit way for a table and implicit way for stream

GeoIndex for table

Let's say we have a table which stores all the shops in a given city. We assign a unique (lat,long) to every shop. Now, in order to be able to run the query based on geoindex, we need to create the index. To do so, we can use TableEnv type for the same.
We just need to set up following;
TableEnv tenv; tenv.setKeyType(COMPOSITE_KEY); tenv.setSortMethod(LEXICOGRAPH); tenv.setAllowDuplicate(true); // if we wish to have duplicate index // now create the index on the table tbl.addIndex("geohash", &tenv); // note geohash is just the field name which will contain the gephash of the lat,long // that's it

GeoIndex for stream

Here it's even simple. Following are steps
Let's say in a given stream we have "lat" and "lon" as attributes which contains latitude and longitude for a given location. Data/event is flowing into the db. Now we must create the geohash index and add it to the events in the incoming stream. To do so, we will use "catr" {"name":"geohash", "type":5, "sidx":1, "opnm":"GEOHASH", "iatr":["lat", "lon"]} This will allows db to compute the geohash and index it properly for every single event (record) coming into the stream See catr for more info
Now to query the data we can use following json structure to set up the geospatial filter
{"location":{"lat":12.8282,"lon":77.5454},"index":"geohash","distance":1000} or {"location":{"lat":12.8282,"lon":77.5454},"index":"geohash","distance":1000,"neighbors":1} // uses neighbors or {"box":{"left_bot":{"lat":12.8282,"lon":77.5454},"right_top":{"lat":12.9282,"lon":77.9454}},"index":"geohash"} we can add "precision" as well [ range 1 to 12 ] which forces db to use it we can add "neighbors":1 to force db to use neighbors for better accuracy
This is as simple as that.

Now to add further filter, we can use the dataQuery to add filter and then use "addGeoQuery" function to add the above json structure Same scan apis will be used to do the query.
Same way we can also query stream, using scan api

Query from CLI

It's rather simple for query from the cli; look at few examples below.
// query a stream select * from schema1.stream1 where shopname = "starbucks coffee" and area = "MG Road" and geofilter {"location"{"lat":12.2345,"lon":77.2123},"index":"geohash","distance":1000} or to simply count, we can use count(*) select * from schema1.stream1 ... // to query a table, we use exactly same format except we use table name instead of schema.stream select * from mytable where shopname = "starbucks coffee" and area = "MG Road" and geofilter {"location"{"lat":12.2345,"lon":77.2123},"index":"geohash","distance":1000}