bg
predictive customer message analysis in streaming manner
Published on Oct 28, 2021

Customer message analysis – predictive & streaming

Customer message / reviews analysis in real-time, streaming and predictive manner for better customer/user experience.

Read time : 5 min or less
Hands on demo: In less than 30 min you can run this use case on your machine, just follow the steps

Customer message analysis – predictive & streaming

Resources

To implement the use case “customer message analysis in a predictive and streaming manner”, you can use the following resources

  • code, files, data
  • Video of the demo/use case
  • BangDB binaries

Scenario

Users and customers are sending their messages or reviews from their devices. Several such messages are streaming from different users into the system. We must first be able to ingest these messages in a real-time manner. Further, we should be able to process every single message and take corrective action as needed.

The processes would include the following;

  1. set the streams and sliding window and ingest the data in these streams in a continuous manner
  2. find out the sentiment of the message [ positive, negative ] using IE (information extraction) (NOTE: we can find out many different sentiments/ emotions as we want, the demo deals with only two). We need to train a model here for this
  3. filter messages with negative sentiment and put them in a separate stream for further action/processing
  4. find out a definitive pattern and send such events matching the pattern to another stream for further review/action. The pattern is as follows;
    1. Any particular product that gets minimum 3 consecutive negative sentiment messages
    2. from different users in 1000 sec, find this pattern in a continuous sliding manner
  5. store few triples in graph store like (user, POSTS_REVIEW, prod) and (prod, HAS_REVIEWS, revid), revised id review id and prod is product
  6. set running stats for different attributes in the event such as unique count for users or min/max/avg/SDTV/sum/Kurt for the amount spent etc.
  7. set up the reverse index for messages such that it can be used for text search by the user
  8. set up secondary indexes for several attributes that could be helpful in query and also internal stream joins/ filter etc.

Relevant application areas

  • eCommerce
  • Payment and banking
  • Ridesharing and cabs on hire, ex-Uber, Ola
  • Home delivery entities (food, etc)

Complexities

There are several challenges here, some of them could be

  1. Volume and Velocity. The number of messages could be very high, as these could be several users sending messages per second across the geographical areas. Hence data ingestion in real-time is critical
  2. The messages could be in English or in other vernacular languages, hence we need to extract sentiment from unstructured data, and keep improving or updating the models in real-time
  3. Extracting patterns from the streaming set of events in a continuous manner, this requires CEP on the streaming data which is very hard to implement on SQL or regular NoSQL databases
  4. Storing certain triples (sub, obj, predicate) in a graph that is continuously updated as events arrive, helps link data and/or events
  5. Different database queries along with text search which requires many secondary and reverses indexes
  6. Infrastructure deployment and maintenance if too many silos are used. Further automation is difficult to achieve in typical deployment models

Benefits of BangDB

  1. Use lightweight high performance BangDB agents or another messaging framework to stream data into the BangDB. BangDB is a high-performance database with an ingestion speed of over 5K+ events per second per server leading to half a billion events processing per commodity server in a day
  2. Integrated stream processing within BangDB allows users to simply start the process with a simple JSON schema definition. There are no extra silos set up for streaming infrastructure
  3. Integrated AI within BangDB allows users to simply train, deploy and predict incoming data without having to set up separate infra and then exporting data/ importing model, etc. The entire process can be automated within BangDB
  4. BangDB is a multi-model database and it also allows Graph to be integrated with streams such that the graph is updated on streaming data with triples
  5. BangDB supports many kinds of indexes including reverse indexes, hence running rich queries along with searches on BangDB is quite simple
  6. Integrated with Grafana for visualization of time-series data

Overview of the solution

  1. We have a stream schema ecomm_schema. Here in these streams, we will be ingesting data from various sources
  2. Ingestion of data happens as and when data is created. Therefore agent monitors a set of files here and as we write data into these files agent will parse the data and send it to banged server. We could directly write data using CLI or using a program that uses the bangda client etc…
  3. We have 3 different data sources here;
    • product data – this is rathe non-streaming data, but still, we could ingest these using agent
    • order data – as and when order is placed
    • customer or user reviews/ messages. This should be high volume streaming data
  4. Sample data is provided here, however, you may add more data to run it at a larger scale, etc.

Steps to run the demo on your own

  • Set up BangDB on your machine

                       Note: If you have got the BangDB already, you may skip the step

                       a. Get the BangDB, please check out https://landingpage.ninja/bang/download or https://github.com/sachin-sinha/BangDB/releases

                       b. Follow the read me file available in the folder and install the DB

Check out or clone this repo to get the files for the use case, and go to customer_reviews dir

Note: It will be good to have several shell terminals for the server, agent, client, and mon folder

> cd customer_reviews

copy the binaries (server, agent, cli) to the folders here in this folder (note: it’s not required but simple to run the demo)

  • copy bangda-server-2.0 binary to the server/ folder
  • copy bangda-agent-2.0 binary to the agent/ folder
  • copy bangda-CLI-2.0 binary to the client/ folder
  • Before running the demo from scratch, you should simply clean the database, and ensure the agent. the conf file is reset, and the files which are being monitored by the agent are also reset. To do this run the reset.sh file in the base folder Also, please ensure the file and dir attributes are pointing to the right file and folder respectively in the agent/agent.conf file

 

  • Run the server, agent, and client

Run the server

> cd server

> ./bangdb-server-2.0 -c hybrid -w 18080

> cd ..

Note: we are running banged in hybrid listening mode (TCP and HTTP) both, HTTP port is 18080. This will come in handy in ingesting data using agents, interactions using CLI, etc. and at the same time visualization using Grafana

Run the agent

> cd  agent

> ./bangdb-agent-2.0

> cd ..

Run the client

> cd cli

> ./bangdb-CLI-2.0

you will see something like this on the prompt

server [ 127.0.0.1 : 10101 ] is master with repl = OFF

 __     _    _   _   ____   ___    ___
|   \  / \  | \ | | | ___\ |   \  |   \
|   / /   \ |  \| | | | __ |    | |   /
|   \/ ___ \| | \ | | |__|||    | |   \
|___/_/   \_|_| |_| |_____||___/  |___/

command line tool for db+stream+ai+graph

please type 'help' to get more info, 'quit' or 'exit' to return

bangdb> 

  • Register the schema (set of streams)

Let’s first register the stream schema into which we will be receiving the data

bangdb> register schema ecomm_schema.txt
success

now let’s ingest a few data into the streaming product.

NOTE: to help users ingest some events, there is a simple script “sendline.sh” which takes the following arguments;

bash sendline <fromfile> <tofile> <numrec> <stopsec>

send <fromfile> to <tofile> with <numrec> num of events <stopsec> per sec

In a real scenario, the application or program will write these events into some log file and the agent will keep sending data to the server. For demo purposes, we will simulate the application/program by using sending. sh

  • Send product data to the server

> cd mon/

> bash sending.sh ../data/ecomm_product.txt prod.txt 1 1

Note: you can send 1000’s of events per second by using

> bash sending.sh ../data/ecomm_product.txt prod.txt 1000 1

  • Send order data to the server

bash sending.sh ../data/ecomm_order.txt order.txt 1 1

Now come back to the CLI shell terminal and train a model for sentiment analysis.

BangDB will keep using this sentiment model for adding the “sentiment” attribute for every event as it arrives

  • Train sentiment model

When you train a model from CLI, here is what you will see, you may follow as it’s shown here or simply follow the workflow as cli will keep asking questions.

NOTE: The sentiment model requires a knowledge base for the context. It’s always a good idea to train a KB for the context/ area in which we work. Therefore for better accuracy and performance, we should ideally train a model. However, for the demo purpose, we have a sample kb (which is trained on minimal data), which can be used but is not sufficient. If you want proper KB for sentiment analysis for customer reviews/comments/messages then please send me a mail (sachin@bangdb.com), and I will forward the link to you. For production, we must use properly trained KB file

bangdb> train model user_sentiment
what's the name of the schema for which you wish to train the model?: ecomm
do you wish to read earlier saved ml schema for editing/adding? [ yes |  no ]: 


	BangDB supports following algorithm, pls select from these
	Classification (1) | Regression (2) | Lin-regression/Classification (3) | Kmeans (4) | Custom (5)
	| IE - ontology (6) | IE - NER (7) | IE - Sentiment (8) | IE - KB (9) | TS - Forecast (10) 
	| DL - resnet (11) | DL - lenet (12) | DL - face detection (13) | DL - shape detection (14) | SL - object detection (15)

what's the algo would you like to use (or Enter for default (1)): 8
what's the input (training data) source? [ local file (1) | file on BRS (2) | stream (3) ] (press enter for default (1)): 1
enter the training file name for upload (along with full path): ../data/review_train.txt


	we need to do the mapping so it can be used on streams later
	This means we need to provide attr name and its position in the training file

need to add mapping for [ 2 ] attributes as we have so many dimensions
enable attr name: sentiment
enable attr position: 0
enable attr name: msg
enable attr position: 1


we also need to provide the labels for which the model will be trained
    enter the label name: positive
do you wish to add more labels? [ yes |  no ]: yes
    enter the label name: negative
do you wish to add more labels? [ yes |  no ]: 
    enter the name of the KB model file (full path)(for ex; /mydir/total_word_feature_extractor.dat): total_word_feature_extractor.dat
    Do you wish to upload the file? [ yes |  no ]: yes
training request : 
{
   "training_details" : {
      "train_action" : 0,
      "training_source_type" : 1,
      "training_source" : "review_train.txt",
      "file_size_mb" : 1
   },
   "model_name" : "user_sentiment",
   "algo_type" : "IE_SENT",
   "labels" : [
      "positive",
      "negative"
   ],
   "schema-name" : "ecomm",
   "total_feature_ex" : "total_word_feature_extractor.dat",
   "attr_list" : [
      {
	 "position" : 0,
	 "name" : "sentiment"
      },
      {
	 "name" : "msg",
	 "position" : 1
      }
   ]
}
do you wish to start training now? [ yes |  no ]: yes
model [ user_sentiment ] scheduled successfully for training
you may check the train status by using 'show train status' command
do you wish to save the schema (locally) for later reference? [ yes |  no ]: 

Now you can see the status of the model training

bangdb> show models
+--------------------+--------------+-------+------------+-----------+------------------------+------------------------+
|key                 |model name    |   algo|train status|schema name|train start time        |train end time          |
+--------------------+--------------+-------+------------+-----------+------------------------+------------------------+
|ecomm:user_sentiment|user_sentiment|IE_SENT|passed      |ecomm      |Sat Oct 16 00:07:11 2021|Sat Oct 16 00:07:12 2021|
+--------------------+--------------+-------+------------+-----------+------------------------+------------------------+

Now let’s ingest the customer reviews and see the output

> cd mon

> bash sending.sh ../data/user_msg.txt reviews.txt 1 1

come back to cli terminal and select a few events from the stream “reviews” in the “comm”schema

bangdb> select * from ecomm.reviews
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|key             |val                                                                                                                             |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329532924119|{"uid":"sal","prod":"ipad","msg":"finally the order arrived but i am returning it due to delay","tag":"return","revid":"rev13","|
|                |_pk":1634329532924119,"sentiment":"negative","_v":1}                                                                            |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329531921928|{"uid":"raman","prod":"guitar","msg":"finally order is placed, delivery date is still ok do it's fine","tag":"order","revid":"re|
|                |v12","_pk":1634329531921928,"sentiment":"positive","_v":1}                                                                      |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329530919064|{"uid":"sal","prod":"iphone","msg":"just ordered for p3 and i got a call that the delivery is delayed","tag":"order","revid":"re|
|                |v11","_pk":1634329530919064,"sentiment":"positive","_v":1}                                                                      |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329529916681|{"uid":"raman","prod":"guitar","msg":"the product is in cart, i want to order but it's not going","tag":"cart","revid":"rev10","|
|                |_pk":1634329529916681,"sentiment":"negative","_v":1}                                                                            |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329528914003|{"uid":"mike","prod":"football","msg":"how amazing to get the packet before time, great work xyz","tag":"order","revid":"rev9","|
|                |_pk":1634329528914003,"sentiment":"positive","_v":1}                                                                            |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329527911595|{"uid":"sal","prod":"ipad","msg":"not sure why the product is not yet delivered, it said it will be done 3 days ago","tag":"orde|
|                |r","revid":"rev8","_pk":1634329527911595,"sentiment":"negative","_v":1}                                                         |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329526909432|{"uid":"rose","prod":"guitar","msg":"not sure if this site works or not, frustating","tag":"order","revid":"rev7","_pk":16343295|
|                |26909432,"sentiment":"negative","_v":1}                                                                                         |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329525906102|{"uid":"hema","prod":"p3","msg":"the tabla got set very smoothly, thanks for the quality service","tag":"order","revid":"rev6","|
|                |_pk":1634329525906102,"sentiment":"positive","_v":1}                                                                            |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329524902468|{"uid":"hema","prod":"tabla","msg":"i received the product, it looks awesome","tag":"order","revid":"rev5","_pk":163432952490246|
|                |8,"sentiment":"positive","_v":1}                                                                                                |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329523899985|{"uid":"rose","prod":"guitar","msg":"order placed, money debited but status is still pending","tag":"order","revid":"rev4","_pk"|
|                |:1634329523899985,"sentiment":"negative","_v":1}                                                                                |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
total rows retrieved = 10 (10)
more data to come, continue .... [y/n]: 

As you see, the attribute “sentiment” is added with the value predicted by the model user_sentiment

Now let’s check out the events in the filter stream. We see that all negative events are also available in the stream negtative_reviews

bangdb> select * from ecomm.negative_reviews
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|key             |val                                                                                                                             |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329532924119|{"uid":"sal","prod":"ipad","msg":"finally the order arrived but i am returning it due to delay","tag":"return","revid":"rev13","|
|                |_pk":1634329532924119,"_v":1}                                                                                                   |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329529916681|{"uid":"raman","prod":"guitar","msg":"the product is in cart, i want to order but it's not going","tag":"cart","revid":"rev10","|
|                |_pk":1634329529916681,"_v":1}                                                                                                   |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329527911595|{"uid":"sal","prod":"ipad","msg":"not sure why the product is not yet delivered, it said it will be done 3 days ago","tag":"orde|
|                |r","revid":"rev8","_pk":1634329527911595,"_v":1}                                                                                |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329526909432|{"uid":"rose","prod":"guitar","msg":"not sure if this site works or not, frustating","tag":"order","revid":"rev7","_pk":16343295|
|                |26909432,"_v":1}                                                                                                                |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329523899985|{"uid":"rose","prod":"guitar","msg":"order placed, money debited but status is still pending","tag":"order","revid":"rev4","_pk"|
|                |:1634329523899985,"_v":1}                                                                                                       |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329522897451|{"uid":"sal","prod":"ipad","msg":"even after contacting customer care, we have no update yet","tag":"order","revid":"rev3","_pk"|
|                |:1634329522897451,"_v":1}                                                                                                       |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329521895545|{"uid":"sal","prod":"ipad","msg":"the order 2 was placed 4 days ago, still there is no response, i am still waiting for any conf|
|                |irmation","tag":"order","revid":"rev2","_pk":1634329521895545,"_v":1}                                                           |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329520891590|{"uid":"sachin","prod":"cello","msg":"even after calling 20 times, the customer care is not responding at all","tag":"order","re|
|                |vid":"rev1","_pk":1634329520891590,"_v":1}                                                                                      |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
total rows retrieved = 8 (8)

As you see, the events got automatically collected in this stream, we can further set the notification as well which will allow the server to take actions / sends notifications in an automated manner

But if you see, we don’t have any event in the negative_reviews_pattern stream. This is because we haven’t sent the events which could have formed the pattern. To remind, the pattern is defined as “at least 3 consecutive negative events for the same product but from different users within 1000 sec”. We would like to extract these patterns in a continuous manner and store these events in the negative_reviews_pattern stream

Let’s now add two events that are negative (as you note, the last event is predicted as negative so another three more negative events should trigger a pattern)

bangdb> insert into ecomm.reviews values null {"uid":"alan","prod":"ipad","msg":"finally the order arrived but i am returning it due to delay","tag":"return","revid":"rev14"}
success

bangdb> insert into ecomm.reviews values null {"uid":"john","prod":"ipad","msg":"frustating that product is not delievered yet","tag":"return","revid":"rev15"}
success

bangdb> insert into ecomm.reviews values null {"uid":"johny","prod":"ipad","msg":"frustating and disappointing that product is not delievered yet","tag":"return","revid":"rev16"}
success

Now select from the pattern stream

bangdb> select * from ecomm.negative_reviews_pattern
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|key             |val                                                                                                                             |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
|1634329705652574|{"uid":"john","prod":"ipad","sentiment":"negative","revid":"rev15","_pk":1634329705652574,"uid":"sal","prod":"ipad","_jpk1":1634|
|                |329669688454,"_v":1}                                                                                                            |
+----------------+--------------------------------------------------------------------------------------------------------------------------------+
total rows retrieved = 1 (1)

As you see it has two kids (since we select both as per schema definition – see the ecomm_schema.txt). The first one was where the pattern started and the one where it got completed.

You can play with this and see how it works. If another negative event comes for the product which forms the pattern then it will get collected. else if broken then the next time when the pattern is seen, the server will send that event to the stream, etc.

Now, let’s see the triples as stored by the server in a graph structure, we will run Cypher queries

bangdb> USE GRAPH ecomm_graph
USE GRAPH ecomm_graph successful

bangdb> S=>(@u uid:*)-[POSTS_REVIEWS]->(@p prod:guitar)
+---------+-------------+-----------+
|sub      |pred         |        obj|
+---------+-------------+-----------+
|uid:raman|POSTS_REVIEWS|prod:guitar|
+---------+-------------+-----------+
|uid:raman|POSTS_REVIEWS|prod:guitar|
+---------+-------------+-----------+
|uid:rose |POSTS_REVIEWS|prod:guitar|
+---------+-------------+-----------+
|uid:rose |POSTS_REVIEWS|prod:guitar|
+---------+-------------+-----------+

bangdb>  S1=>(@u uid:hema)-[POSTS_REVIEWS]->(@p prod:*)-[HAS_REVIEWS]->(@r revid:*)
+----------+-----------+----------+
|sub       |pred       |       obj|
+----------+-----------+----------+
|prod:tabla|HAS_REVIEWS|revid:rev5|
+----------+-----------+----------+
|prod:p3   |HAS_REVIEWS|revid:rev6|
+----------+-----------+----------+

And so on. Please see the documentation for more info on stream, graph, ml, etc… you may get help from CLI, forex; help on graph, type “help graph”, for ml type “help ml” etc…

Further, you can run for higher volume with high speed and high volume implementation of the use case. You can train more models, add more triples, etc. as required.

Get started with BangDB

RELATED STORIES

Why AI needs Graph and Streaming database for higher efficiency
Why AI needs Graph and Streaming database for higher efficiency
AI has become necessary entity for any kind of data processing today when it comes to data analysis....
Read More
REAN model to achieve higher conversions through hyper personalisation and recommendations
REAN model to achieve higher conversions through hyper personalisation and recommendations
BangDB implements REAN Model to enable conversion through personalization. It ingests and processes ...
Read More
How to mitigate security risk using BangDB
How to mitigate security risk using BangDB
Security risk is everywhere and it has been growing rapidly while we try to mitigate the threat at t...
Read More
Copy link