Group By

This as name suggests, gpby (groupby), which is computed continuously and is always ready to be consumed. Every processing with the db for stream is always updated for every single event and this enables us to deal with continuously running run time queries. This is super efficient, fast and clean way for continuous running analysis.

"gpby":[
   {
      "gpat":[
         "a",
         "b"
      ],
      "iatr":"c",
      "stat":1,
      "gran":3600,
      "swsz":86400,
      "kysz":48
   },
   {
      "gpat":[
         "a"
      ],
      "iatr":"d",
      "stat":2,
      "gran":300,
      "swsz":864000,
      "kysz":32
   }
]

Let's look at each one now at a time.

{"gpat":["a", "b"], "iatr":"c", "stat":1, "gran":3600, "swsz":86400, "kysz":48}

This tells the db to groupby (a, b) for input attribute c, set the keysize as 48 (this is to specify max length of the gpby name) for granularity "gran": 3600 with sliding window size of 1 day (86400 sec) and finally "stat" dictates what to do here.

For example: "stat" : 1 means count. This allows db to keep computing gpby for c by (a, b) every hour. so in typical sql language sense, it's select c from table1 groupby a, b.

The composite string is formed as follows; nts:attr1:attr2 Here attr1 and attr2 the attributes that appear in gpat array i.e. groupby attributes. The order is followed as it's defined in the gpat array. nts means nth time stamp tick since begining of epoch. It depends on gran value. Therefore, if gran is 60 sec, the nts is nth minute since epoch

For example: if we replace a with category id, b with page id and c with visitor id, then this gpby scheme tells "count num of visitors groupby category and page in one hour" etc.

{"gpat":["a"], "iatr":"d", "stat":2, "gran":300, "swsz":864000, "kysz":32}

This one is also similar except it tells do unique count (stat:2)

So for example, it could be unique visitor count every day etc.