Graph & Cypher in BangDB - BangDB = NoSQL + AI + Stream

Graph & Cypher in BangDB


Graph and Cypher in BangDB

1. Introduction

Data in a graph table for BangDB is defined as triples. A triple contains subject, object and relationship (predicate) between them. All data is stored as triple within the DB. BangDB does clever arrangements and housekeeping to store the data such that various queries can be written and run efficiently.

The structure of the query is very similar to “Cypher”. BangDB uses Cypher-like queries to process the data. The structure looks like following.

CREATE () -[]-> () - for creating node or triple S=>() -[]-> () - for querying data
The ‘()’ denotes subject or object and ‘[]’ denotes relation (predicate) with ‘->’ defining the direction. The arrangement is always “subject Predicate Object”.
The node has a label associated with it. Every node is written as “label:name”.
There are basically following keywords associated with all the queries.
CREATE - to create a single node, or triple RETURN - selecting attributes for any query AS - selecting columns/attributes with alias COUNT - counting all using COUNT(*) or COUNT(A.col) AVG - average of any attribute MIN - min value MAX - max value STD - standard deviation SUM - sum EXKURT - ex-kurtosis SKEW - skewness
Data is processed from left to right. There could be several triples chained to form a query, like.
S1=>() -[]-> () -[]-> () …
Here in the above example, the first triple will intermediate-output a set of results, these intermediate-output will become input from subsequent processing etc. Therefore, it will keep evaluating from left to right using the intermediate results. The subject for subsequent chained query will be the intermediate result of the previous triple and so on.

In some cases, we would like to keep object of the triple as subject for the subsequent triple, then we can use the structure like following.
S2=>[S1=>() -[]-> ()] -[]-> () …
We will see the examples for these in subsequent sections

We will use bangdb cli to perform these exercises. But before we go there, let’s see how BangDB Cypher is different from the original Cypher

2. Difference between Cypher used by BangDB and Cypher used by Neo4j

Following are basic differences.

A. For all queries (retrieval), the query starts with “=>”, instead of “Match” B. The queries could be nested using the “[]” bracket which becomes subject for subsequent nodes. For example, look at following two queries. S2=>[S1=>(Person:*)-[ACTED_IN]->(Movie:*)]-[LIVES_IN]->(City:Bangalore) S1=>(Person:*)-[ACTED_IN]->(Movie:*)-[DIRECTED_BY]->(Person:*)-[LIVES_IN]->(City:SFO) First one will first fetch all the persons who have acted in any movies, and who lives in city Bangalore Second one will fetch all the persons who live in “SFO” who have directed movies in which some persons have acted in etc… C. Specific methods like; #ALL_PATH, #SHORT_PATH, #ALL_NODE is supported to find all paths between any two nodes, shortest path between two nodes and all nodes in between two nodes respectively D. Here we use “label:name” instead of “name”label” as in Cypher. This is mainly because it’s more intuitive to use labels first and then name. If we wish to take any name, we can put ‘*’ in place of it. In-fact we can put ‘*’ for label or relation as well E. The variable or alias begins with’@’, like (@p Person:Sachin) etc.

3. Creating graph table

We must create and use a graph table before storing and querying data. There could be different tables that we can create within a db to keep different kinds of network data as we like USE GRAPH For example, to create or use table my_graph_table

bangdb> CREATE GRAPH my_graph_table

4. Creating data

‘CREATE’ keyword should be used to create data. Let’s create movie Matrix (single node)

bangdb> CREATE (Movie:Matrix { "title":"The Matrix","released":1997 }) { "msg" : [ "success" ], "errcode" : 0 }
Let’s create triple here, Tom Hanks and Forrest Gump nodes with relation ACTED_IN as predicate
bangdb> CREATE (Person:"Tom Hanks" { "born":1956 })-[ACTED_IN { "roles": ["Forrest"]}]->(Movie:"Forrest Gump" { "released":1994 }) { "msg" : [ "success" ], "errcode" : 0 }
bangdb> CREATE ("Person:Robert Zemeckis" { "born":1951 })-[DIRECTED]->("Movie:Forrest Gump") { "msg" : [ "success" ], "errcode" : 0 }

5. Matching patterns (querying data)

In BangDB we follow a similar to Cypher format for matching data or preferably querying data. However, BangDB doesn’t use ‘MATCH’ keyword/clause for the same instead it uses something like => before the query For ex; S=> To find the data we’ve created so far, we can start looking for all nodes labelled with the Movie label.

bangdb> S=>(Movie:*) { "num_items" : 2, "nodes" : [ "{\"released\":1994,\"label\":\"Movie\",\"name\":\"Forrest Gump\",\"_pk\":\"889565392:607499282203478465\",\"_v\":1}", "{\"title\":\"The Matrix\",\"released\":1997,\"label\":\"Movie\",\"name\":\"Matrix\",\"_pk\":\"889565392:7731255011213178614\",\"_v\":1}" ] }
Note: the _pk and _v is added by the db for its housekeeping reasons

We can also look for a specific Movie, like “Forrest Gump”
bangdb> S=>(@p "Person:Tom Hanks")-[@r ACTED_IN]->(@m Movie:*); RETURN AS Title, r.roles AS Role +---------------+------------------+ | Role | Title | +---------------+------------------+ | ["Forrest"] | Forrest Gump | +---------------+------------------+ We could also see in json format by setting output as JSON { "rows" : [ { "Title" : "Forrest Gump", "Role" : "[\"Forrest\"]" } ] }
Let’s add another film where Tom Hanks acted in. Since Tom Hanks already exists hence, we may just use the name
bangdb> CREATE (Person:"Tom Hanks")-[ACTED_IN {"roles":["Zachry"]}]->(Movie:"Cloud Atlas" {"released":2012}) { "errcode" : 0, "msg" : [ "success" ] }
Now it’s a good time to create another graph and do following.

CREATE (Movie:matrix {"title":"The Matrix", "released":1997}) CREATE (Movie:cloudAtlas {"title":"Cloud Atlas", "released":2012}) CREATE (Movie:forrestGump {"title":"Forrest Gump", "released":1994}) CREATE (Person:keanu {"fullname":"Keanu Reeves", "born":1964}) CREATE (Person:robert {"fullname":"Robert Zemeckis", "born":1951}) CREATE (Person:tom {"fullname":"Tom Hanks", "born":1956}) CREATE (Person:tom)-[ACTED_IN {"roles":["Forrest"]}]->(Movie:forrestGump) CREATE (Person:tom)-[ACTED_IN {"roles":["Zachry"]}]->(Movie:cloudAtlas) CREATE (Person:robert)-[DIRECTED]->(Movie:forrestGump)

6. Filtering results

Select all the people/ persons

bangdb> S=>(@p Person:*);RETURN AS name, p.fullname AS FullName +---------- +-----------------------+ |name | FullName | +----------+------------------------+ |tom | Tom Hanks | +----------+------------------------+ |keanu | Keanu Reeves | +----------+------------------------+ |robert | Robert Zemeckis | +----------+------------------------+ bangdb> S=>(@p Person:*); RETURN COUNT(*) AS People { "count" : 3 }
Select movie title “The Matrix”
bangdb> S=>(Movie:* {title="The Matrix"}) { "num_items" : 1, "nodes" : [ "{\"title\":\"The Matrix\",\"released\":1997,\"label\":\"Movie\",\"name\":\"matrix\",\"_pk\":\"889565392:7007891359330926487\",\"_v\":1}" ] }
BangDB allows conditions to be also defined for filtering data, such that movie released > 2000
bangdb> S=>(@p Person:*)-[@r ACTED_IN]->(@m Movie:* {released> 2000}); RETURN p.fullname AS FullName, r.roles AS Roles, m.title AS Title +-------------+----------------+----------------+ | FullName | Roles | Title | +-------------+----------------+----------------+ | Tom Hanks | ["Zachry"] | Cloud Atlas | +-------------+----------------+----------------+ Or in json { "rows" : [ { "Title" : "Cloud Atlas", "Roles" : "[\"Zachry\"]", "FullName" : "Tom Hanks" } ] }

7. Aggregation

Aggregation in Cypher (for BangDB) just works. Users select aggregate columns along with non-aggregated columns. The non-aggregated columns become groupby keys.

To find out how often an actor and director worked together, run the following

bangdb> S=>(@p Person:*)-[@d DIRECTED]->(@m Movie:*)<-[@a ACTED_IN]-(@p2 Person:*); RETURN AS actor, AS director, COUNT(*) AS collabs +-------+-----------+---------+ | actor | director | collabs | +-------+-----------+---------+ | tom | robert | 1 | +-------+-----------+---------+
To find how many times actors have had appearances
bangdb> S=>(@p Person:*)-[@a ACTED_IN]->(@m Movie:*); RETURN p.fullname AS actor, COUNT(*) AS appearances +---------------+------------------+ | actor | appearances | +---------------+------------------+ | Tom Hanks | 2 | +---------------+------------------+

8. Shortest path

One of the most important tasks with graph is to find if there is a connection between two nodes and if there are more than one then finds the shortest path BangDB supports this with minor modifications in the Cypher construct (but to simplify only) and use this for finding the shortest path, here is how it goes

Let’s add a triple to have some multi-level connection. Let’s add a triple for movie “forrestGump” winning “oscar”

bangdb> CREATE (Movie:forrestGump)-[WON_AWARD]->(Award:oscar) { "msg" : [ "success" ], "errcode" : 0 }
Now, let’s find the shortest path between Tom Hanks and Oscar
bangdb> S=>(Person:tom)-[#SHORT_PATH *]->(Award:oscar) { "nodes" : [ { "label" : "Person", "name" : "tom", "_pk" : "2471870506:978343320853564008" }, { "_pk" : "889565392:3325115057284420805", "name" : "forrestGump", "label" : "Movie" }, { "label" : "Award", "name" : "oscar", "_pk" : "3098302716:362283378203006142" } ], "path_count" : 1, "edges" : [ { "rel" : "ACTED_IN", "_to_node" : "889565392:3325115057284420805", "_from_node" : "2471870506:978343320853564008" }, { "_to_node" : "3098302716:362283378203006142", "_from_node" : "889565392:3325115057284420805", "rel" : "WON_AWARD" } ] }
As you see, the returned result describes the path. If there was more than one path (more than one movie of Tom Hanks winning an Oscar, then it would have returned the shortest path one. TO list all such paths between two nodes, we can use #ALL_PATH as shown in next section

9. All paths

bangdb> S=>(Person:tom)-[#ALL_PATH *]->(Award:oscar) { "paths" : [ { "nodes" : [ { "label" : "Person", "_pk" : "2471870506:978343320853564008", "name" : "tom" }, { "_pk" : "889565392:3325115057284420805", "label" : "Movie", "name" : "forrestGump" }, { "label" : "Award", "_pk" : "3098302716:362283378203006142", "name" : "oscar" } ], "edges" : [ { "_from_node" : "2471870506:978343320853564008", "_to_node" : "889565392:3325115057284420805", "rel" : "ACTED_IN" }, { "_from_node" : "889565392:3325115057284420805", "rel" : "WON_AWARD", "_to_node" : "3098302716:362283378203006142" } ] } ], "path_count" : 1 }

10. Query for no direct relation

First create a graph table

bangdb> CREATE GRAPH g
Let’s create following nodes and triples
CREATE (Person:john {"firstname":"John"}) CREATE (Person:joe {"firstname":"Joe"}) CREATE (Person:steve {"firstname":"Steve"}) CREATE (Person:sara {"firstname":"Sara"}) CREATE (Person:maria {"firstname":"Maria"}) CREATE (Person:john)-[FRIEND]->(Person:joe)-[:FRIEND]->(Person:steve) CREATE (Person:john)-[FRIEND]->(Person:sara)-[:FRIEND]->(Person:maria)
query which finds a user called 'John' and 'John’s' friends (though not his direct friends) before returning both 'John' and any friends-of-friends that are found.
bangdb> S=>(@p Person:john)-[@f FRIEND]->(@p2 Person:*)-[@f2 FRIEND]->(@p3 Person:*); RETURN AS self, AS fof +------+--------+ | self | fof | +------+--------+ | john | steve | +------+--------+ | john | maria | +------+--------+

11. Query with pattern in the value/field

Next up we will add filtering to set more parts in motion: We take a list of user names and find all nodes with names from this list, match their friends and return only those followed users who have a 'name' property starting with 'S'.

bangdb> S=>(@p Person:*)-[@f1 FRIEND]->(@p2 Person:* {name = "s$%"});RETURN, +--------+--------+ | name | name | +--------+--------+ | steve | steve | +--------+--------+ | sara | sara | +--------+--------+

12. Graph in Stream

BangDB streams can also use Graph for the events which are time-series data. To use Graph in the stream, we may simply edit the schema and add set of relations as needed.
Example; Let’s say we have a schema where we receive purchase events for e-commerce and we would like to link user and the product he/she purchases, then we would simply add following;

Here is the short schema definition for example;
{ "schema":"grpahschema", "streams":[ { "name":"order","type":1, "swsz":86400, "inpt":[], "attr":[ {"name":"product","type":"STRING","sidx":1,"kysz":32,"stat":"UCOUNT"}, {"name":"user","type":"STRING","sidx":1,"stat":"UCOUNT"} ], "rels":[{"sub":"user","obj":"product","rel":"BUYS"}] } ] }
that’s it, where the “user” and “product” are attributes in the Order stream. Once defined, db would simply keep creating the relations (and creating nodes as required).

To query, we can use Cypher as described above using usual methods.