Aggregation in Graph

The Graph in BangDB supports following aggregations.

UCOUNT : unique counting, this can be used for numerical or categorical/string attributes

AVG : average for numerical type

MIN : minimum for numerical type

MAX : maximum for numerical type

STD : standard deviation for numerical type

SUM : sum/total for numerical type

EXKURT : kurtosis for numerical type

SKEW : skewness for numerical type

But we often want to aggregate something group by something else. For example, average sales per region, max number of products bought per category, number of unique visitors per page of a website etc. There could also be more than one group by variables, for example, total number of phones purchased in a period of time group by model and color, etc.

The Cypher in BangDB supports these queries in elegant way. We just need to tell what aggregation we wish to do on a particular attribute and then select another attribute in the same query which will be used as group by automatically. Users don’t need to specifically define the group by attributes in the query.

Also, it’s important to note that we can select as many different aggregates as we wish and different set of groupby keys (non-aggregate keys in the RETURN statement) as we need. Now it’s time to look at few queries

Let’s create different graph and add few triples

CREATE GRAPH g2 USE GRAPH g2

CREATE (person:sachin)-[BUYS {"amount":211.45}]->(product:computer) CREATE (person:sachin)-[BUYS {"amount":123.75}]->(product:computer) CREATE (person:manu)-[BUYS {"amount":231.2}]->(product:hardware) CREATE (person:ramesh)-[BUYS {"amount":345}]->(product:grocery) CREATE (person:ramesh)-[BUYS {"amount":165.5}]->(product:electronics)

Let’s count different persons and number of times they bought any product

S=>(@p person:*)-[@b BUYS]->(@s product:*); RETURN p.name AS buyer, COUNT(s.name) AS NUMT

+-------+--------+ |buyer |NUMT | +-------+--------+ |manu |1 | +-------+--------+ |ramesh |2 | +-------+--------+ |sachin |2 | +-------+--------+

As you see, BangDB used buyer as group by attribute and counted the number of times they purchased any products

Let’s count different persons and number of times they bought different unique products

S=>(@p person:*)-[@b BUYS]->(@s product:*); RETURN p.name AS buyer, COUNT(s.name) AS NUMT, UCOUNT(s.name) AS unique_prods

+------+----+------------+ |buyer |NUMT|unique_prods| +------+----+------------+ |manu |1 | 1 | +------+----+------------+ |ramesh|2 | 2 | +------+----+------------+ |sachin|2 | 1 | +------+----+------------+

Here, we have two aggregates, with single group by.

Let’s now also see average sales values (price) for each person along with other values that we are retrieving

S=>(@p person:*)-[@b BUYS]->(@s product:*); RETURN p.name AS buyer, COUNT(s.name) AS NUMT, UCOUNT(s.name) AS unique_prods, AVG(b.amount) AS avg_price

+------+----+------------+----------+ |buyer |NUMT|unique_prods|avg_price | +------+----+------------+----------+ |manu |1 | 1 |231.200000| +------+----+------------+----------+ |ramesh|2 | 2 |255.250000| +------+----+------------+----------+ |sachin|2 | 1 |167.600000| +------+----+------------+----------+

Now, let’s sort the result by avg_price and limit it to 2 rows only

S=>(@p person:*)-[@b BUYS]->(@s product:*); RETURN p.name AS buyer, COUNT(s.name) AS NUMT, UCOUNT(s.name) AS unique_prods, AVG(b.amount) AS avg_price SORT_DESC avg_price LIMIT 2

+------+----+------------+----------+ |buyer |NUMT|unique_prods|avg_price | +------+----+------------+----------+ |ramesh|2 |2 |255.250000| +------+----+------------+----------+ |manu |1 |1 |231.200000| +------+----+------------+----------+

This is as simple as that. Let’s now see standard deviation, min, max and sum as well

S=>(@p person:*)-[@b BUYS]->(@s product:*); RETURN p.name AS buyer, COUNT(s.name) AS NUMT, UCOUNT(s.name) AS unique_prods, AVG(b.amount) AS avg_sales, MIN(b.amount) AS MIN_Price MAX(b.amount) AS MAX_PRICE STD(b.amount) AS STD_DEV_AMOUNT SUM(b.amount) AS SUM_TOTAL

+------+----+------------+----------+----------+--------------+----------+----------+ |buyer |NUMT|unique_prods|avg_sales |SUM_TOTAL |STD_DEV_AMOUNT|MAX_PRICE |MIN_Price | +------+----+------------+----------+----------+--------------+----------+----------+ |manu |1 | 1 |231.200000|231.200000|0 |231.200000|231.200000| +------+----+------------+----------+----------+--------------+----------+----------+ |ramesh|2 | 2 |255.250000|510.500000|126.925667 |345 |165.500000| +------+----+------------+----------+----------+--------------+----------+----------+ |sachin|2 | 1 |167.600000|335.200000|62.013265 |211.450000|123.750000| +------+----+------------+----------+----------+--------------+----------+----------+

Now let’s say we wish to see unique results for buyer and product he/she bought

S=>(@p person:*)-[@b BUYS]->(@s product:*); RETURN p.name AS buyer s.name AS product

+-----------+------+ |product |buyer | +-----------+------+ |hardware |manu | +-----------+------+ |computer |sachin| +-----------+------+ |computer |sachin| +-----------+------+ |grocery |ramesh| +-----------+------+ |electronics|ramesh| +-----------+------+

Here we see that the pair sachin and computer are repeating, which is true since sachin bought computer twice, but we wish to see unique pair and for this we will use UNIQUE keyword

S{UNIQUE}=>(@p person:*)-[@b BUYS]->(@s product:*); RETURN p.name AS buyer s.name AS product

+-----------+------+ |product |buyer | +-----------+------+ |hardware |manu | +-----------+------+ |computer |sachin| +-----------+------+ |grocery |ramesh| +-----------+------+ |electronics|ramesh| +-----------+------+

It gives us now the desired results.

Now let’s list unique buyers.

S{UNIQUE}=>(@p person:*)-[@b BUYS]->(@s product:*); RETURN p.name AS buyer

+------+ |buyer | +------+ |manu | +------+ |sachin| +------+ |ramesh| +------+ |ramesh| +------+

Here we were expecting only unique names, but we see “ramesh” repeated twice. This is because the “UNIQUE” keyword enforces uniqueness of (sub, obj) pair and ramesh bought two different products hence they are unique together.

In order to see unique list based on what we wish to RETURN then we should use “UNIQUE_IN_CONTEXT” since we want query to simply return unique values based on what we return (or select)

S{UNIQUE_IN_CONTEXT}=>(@p person:*)-[@b BUYS]->(@s product:*); RETURN p.name AS buyer

+------+ |buyer | +------+ |manu | +------+ |sachin| +------+ |ramesh| +------+