February 2022 - BangDB

REAN model to achieve higher conversions through hyper personalisation and recommendations

Posted on February 26, 2022January 2, 2023 by Sachin

BangDB implements REAN Model to enable conversion through personalization. It ingests and processes various aspects of users or customers and their other activities to optimize the conversion rate. With the REAN model, e-commerce companies can ingest and analyze the clickstream and all pixel data in real-time to understand the customer and visitors in much more advanced ways. This enables the organizations to personalize the content and recommend the products and solutions in a 1-on-1 manner which leads to much higher conversions and revenue

Introduction

The goal of the document is to lay out the basic building unit indices and KPIs using which we can target customers a lot better. But this is not the end, in fact, it just begins the journey to a “1-on-1 personalization and recommendation” for the organization where the underlying goal is to provide a much-improved customer experience and offer higher values. Once we get what’s there in this document, then we need to use Stream Processing and Even ingestions for ETL, data enrichment, and CEP (complex event processing). Further, we will need to put the Graph structure in place to operate in a highly “context-aware-environment” for personalization and recommendation. Let’s first look into the basic part of the bigger recommendation system here, and then in the next blog we will go into stream processing and Graph

REAN Model is defined as follows.

- Develop Reach
- Engage

Activate and
Nurture model

REAN Model Does Two Things Very Well

Firstly, it gives you a very clear indication of the measurement challenges you might have when breaking your strategy down into its component parts.

Secondly, it can be used to help you define a measurement strategy. You could develop KPIs around each activity (Reach, Engage, Activate and Nurture) and then combine the metrics as matrices of each other.

From a high level, this is how the REAN model is different components are defined.

REACH

Traffic sources
- Search Engines
- Ads, Campaign
- Email, Newsletter
- Internal links
- Partner sites
- YouTube, Video, Video banners
- Blog
- PR
SEM
SEO
Brand Awareness – traffic coming from logo, company names, product names,
Seeding – from opinion leaders, reviews, articles

ENGAGE

Shopping carts
Self-service process – ex; SaaS product sign up etc
Any creatives
User segmentation based on behavior
- Session length and depth
- Separate users based on their likes and dislikes [ based on session length and depth]
Click depth – average click depth and corresponding user’s segment
Duration – length of time spent on a website
Offline engagements – relevant for offline stores, etc.

ACTIVATION

Also, can be interpreted as a conversion
Purchases
Downloads of software or documents
Activation is typically what reach, engagement, and nurture KPIs are measured against.
- B2B, B2C, Media, Customer service, Branding

NURTURE

CRM
Follow-ups – emails, community
Most importantly
- Personalization & customization
- When users, customers are back, how we interact with them
- Cookie, user management, etc
- Recommendations
Recency – Measure of time elapsed between your visitor’s last visit and his/her current one

INDEXES

Click depth index [# of the session with more than n page views / #total sessions]
Recency index [ # of the session with more than n page views in last m weeks / #total sessions]
Duration index [ #num of sessions more than n min / #total sessions]
Brand index [ # of sessions originated directly (no ref URL) / total sessions]
Feedback index
Interaction index [ # of sessions where visitor completed any or tracked activity / #total sessions]
Loyalty index = ∑(Ci+Ri+Di+Bi+Fi+Ii) per visitors and select top k
Subscription index [ #num of visitors with content subscribers / #total visitors]
Content page view index [ count of content page views/total page views]
Internal banner index [ banner clicks / total clicks]
Content consumption index [ #page views per content/ #page views]
System perf idx [ #views from per system / #page views]

Metric	Visitor Acquisition	Conversion to opportunity	Conversion to sale	Customer Retention & Growth
Tracking metrics	Unique visitors New visitors	Opportunity volume	Sales volume	E-mail list quality Transaction churn rate
Performance drivers (diagnostic)	Bounce rate Conversion rate New visit	Macro-conversion rate to opportunity to micro conversion rate	Conversion rate to sale Email conversion rate	Active customers % (site & email active) Repeat conversion rate
Customer-centric KPIs	Cost per click, per sale Brand awareness	Cost per opportunity or lead	Cost per sale	Lifetime value Customer loyalty index
Business value KPIs	Audience shares	Total order	Total sales	Retained sales growth and volume
Strategy	Online and offline targeting and reach strategy	Lead generation strategy	Online sales generation strategy	Retention, customer growth
Tactics	Continuous campaign, ads, communications	Personalization & customization	Targeting	Targeting Churn rate etc

KPI

There are the following types of Web Analytics Metrics

Count
Ratios
KPIs – either count or ratio
Dimension – segments
Aggregates
Etc.

Business questions that we need to answer through KPIs

What is the best source of traffic in terms of volumes and sales?
Where are the visitors coming from? [ top k places]
Which channel is most productive? [ top k channels]
Which channels are overlapping?
Which landing page converts best [ top k landing pages]
Do registered users buy more?
Most searched pages [ top pages]
How many downloads?
What’s the value of download for different items [ top downloads by value]?
Avg response time for lead response?
Internal search stats
How engaged are our visitors?
What are the Top paths to our website?
How are visitors finding the site?
What is the cost per conversion (per campaign?)
Users by location
How many people don’t get through a shopping cart?
What are the search keywords?

Page bounce rate – left from a landing page

Hourly
15% deviation – Alert

Page time index – time spent on the page / total time spent on the site

Hourly
15% deviation – Alert

Segmentations

By paid traffic [ Reach] – campaign, ads, banners, etc.
Unpaid traffic [ Engage]
By location [ Engage]
By search phrase or keyword [ Engage]
By site pages or behaviors [ Engage]
By system vars [ device, browser, etc.] [ Engage ]
By conversion
By loyalty – repeat visitors, registered, recency, etc.

Attributes for basic segmentation

Visits
% Add to cart
Conversion rate [ #confirmed conversion / # total visits]
Engagement conversion rate [#confirmed conversion / # total engaged visitors]
Marketing cost
Cost per visit (CPV)
Visitor volume ratio [ num of visitors from a source / total visitor]
Video engagement rate [ count of num of times video played / num of visitors to the page]
Cost per add to cart
Customers
Average cart value
Shopping cart abandonment rate
Page time index
Visitor engagement index
Content page view index [ count of content page views/total page views]
Internal banner index [ banner clicks / total clicks]
Content consumption index [ #page views per content/ #page views]
System perf idx [ #views from per system / #page views]
Cost per acquisition [ cost of referring source / num of conversions]
Sales

Attributes for Behavior segmentation KPIs

Page views per session
Avg session time

Nurture rate

Repeat visitor index [ # of repeat visitors / # of visits]
Email perf index

BangDB is designed to ingest and process high-speed data to extract the intelligence and apply them to ongoing operations for better operational value. BangDB comes with Stream processing, AI, and Graph processing with unstructured data analysis in real-time. Take BangDB free of cost and start building applications

Relevant Readings

How to mitigate security risk using BangDB

Posted on February 15, 2022January 2, 2023 by Sachin

Security risk is everywhere and it has been growing rapidly while we try to mitigate security risk at the same time. The fraudsters are always a step ahead of the curve and come up with new ideas for attacks while we are busy handling the older ones. To mitigate security risk, it requires all of us to innovate faster and prepare in a much more advance and modern manner. Most of the time we keep solving the older problems due to the enormity of challenges here and forget about preparing for the potential upcoming attacks. The sheer definition of the problem is not available most of the time, the tools in the market are sparse and siloed, and the concepts are available but implementations are limited. The core of the solution lies in the ability to scan every single event in context and use modern methods to not only do forensic but be predictive to avoid the repercussions

The cybersecurity threats have changed in three crucial ways in the recent past:

MOTIVE: In the past viruses were introduced by curious programmers. Today cyberattacks are a result of a well-executed plan by trained militaries in support of cyber warfare.
SPEED: The potential rate at which the attack spreads has also increased and can affect computers all over the globe in seconds.
IMPACT: the potential impact has increased manifold due to the wide penetration of the internet

Challenges

Continuous and relentless: Threats may come from any place, any system, and the most unlikely of places. Therefore, we must capture and analyze all data (and not just samples). Hence stream processing in a continuous manner is critical where all events/data are analyzed with as low latency as possible. Most of the tools in the market are batch processing tools, they miss the pattern at the boundaries of the batches, therefore not suitable for such use cases

Non-atomic in nature: Threats may not necessarily be atomic in nature; it may arrive in small packets over a period from many different sources. Therefore, by just looking at a single packet or event we can’t perceive the thread. We must analyze these arriving data packets in the state-managed system with a continuously moving window that can see the pattern over a period. Also, we need to link data points to capture the essence and context

Unpredictability: Few threats may have known or constant signatures, which we could identify using some computations in the deterministic and absolute manner. However, several threats are extremely hard to be captured in this way as they are designed to miss the regular known or anticipated structure. Therefore, we must use AI to predict some of these scenarios continuously on stream data

High-speed processing: The speed of data is so high in some cases that existing tools in the market would sample and then process. We know that this is too open and risky. We must capture and process all data. This means we must have a system that has very high performance in reading and writing. The high throughput data store is desired in such cases

Linear scale: Data volume would be high as we need to process and store all data. A large scalable system would be needed to achieve this. We need a linear scale to ensure the data ingestion and processing work uninterrupted while system scales

What BangDB does do? – It enables a Predictive instead of a Forensic approach with high speed in a scalable manner to mitigate security risk

BangDB ingests and processes the security telemetry information at extremely high speed to make it easily accessible for advanced computation and analytics. It further leverages the following to achieve a predictive vs forensic approach

Advanced statistical and data science models for high-speed anomaly detection
Real-time ingestion and stream processing to enable continuous threat analysis
Machine learning models integrated with stream for predictive threat detection
Graph with stream for interlinking of data points for richer pattern detection
Handle all kinds of data in a single place, text, images, videos in a joined manner

What are the typical steps that BangDB takes to tackle this?

STEP1: Advanced Threat Detection

Need to leverage BANGDB to combine and contextualize incidents from multiple big data disparate sources for continuous near real-time streaming detection, capturing incidents that are often missed in batch-based technologies.

STEP2: Link data

Enrich data with a Graph model to capture the “context” rather than just isolated events which do not provide enough information. Further, integrate Graph with Stream processing such that the linking of data and context capturing in automated and continuous

STEP3: Complex event processing

Find anomalies, and patterns using complex event processing (CEP). This allows users to define a certain pattern that is so complex in nature that can’t be run on typical RDBMS or other databases. The pattern identified here are absolute in nature and with 100% confidence

STEP4: Predict and don’t depend on forensics as much as possible

Artificial intelligence enables the identification of never seen threats, malware, and infiltration techniques. Using AI, build a comprehensive security score leveraging behavioral modeling and stochastic anomaly detection. Kill chain incidents are prioritized based on potential impact, key users, and critical assets.

STEP5: Take automated action

When an anomaly or pattern is detected, take action in an automated manner. This means timely action which could potentially result in saving time and resources and in many cases avoiding the situation itself

STEP6: Track threat propagation

Leverage BangDB’s ability to ingest and analyze immense amounts of data to track threats and their propagation across time and space through a near real-time relational-graph view of the entire network

STEP7: Visual threat hunting

Sophisticated threat hunting tools within the Security Intelligence platform to allow the SOC staff to effectively hunt, validate and remediate potential threat incidents surfaced by the product. Analysts can self-assemble new threat hunting workflows using building block modules for ingestion, enrichment, and analytics on a security playbook interface.

Conclusion

In the end, no amount of effort and tools can make us completely insulated from these security threats, there is no complete immunity that we can develop for such things. However, we can at best be prepared for such attacks and try to avoid them and mitigate security risk as much as possible. And in case of some attacks, we can try and minimize the damage. An additional set of tools would never hurt, they can probably add more value and make the situation better, hence it is recommended to try BangDB to make the castle bit more impregnable

Download BangDB for free and get started, BangDB is one of the “fastest databases” in the market. It performs 2X+ when compared to some of the most popular databases like MongoDB, Couchbase, or Redis.

Please see more related articles on the following;

Architecture of modern stream processing platform for realtime data analytics

Posted on February 5, 2022January 2, 2023 by Sachin

Humans to Machine – Shift of data source

There is a rapid shift happening at the data level as we speak. More and more data is being created by devices, which are fast-moving and contain very high value but perishable insights. This shift is demanding a new architecture of stream processing platform for real-time data analytics. Data has been growing exponentially. We have more data streaming through the wire that we can keep them on disk from both value and volume perspectives. These data are being created by everything we deal with on daily basis. When humans were the dominant creator of data, we naturally used to have less amount of data to deal with and at the same time value used to persist for a longer period. This in fact holds true now as well if humans are the sole creator of the data.

However, humans are no longer the dominant creator of the data. Machines, sensors, devices, etc. have taken over a long time back. The data, predominantly, are being created by machines with humongous speed, so much so that in the last two years we had 90% of the data created since the dawn of civilization. These data tend to have limited shelf life as far as value is concerned. The value of data decreases rapidly with time. If the data is not processed as soon as possible then it may not be very useful for ongoing businesses and operations. Naturally, we need to have a different thought process and approach to deal with these data

Why stream analytics is the key to future analytics

Since we are having more of these data streaming in from all different sources, that if combined and analyzed then a huge value could be created for the users or businesses. At the same time given the perishable nature of the data, it’s imperative that these data must be analyzed and used as soon as they are created

The value of data is maximum when it’s created, the streaming data is perishable in nature, needs to extract the insights immediately

More and more use cases are being generated that need to be tackled to push the boundaries and achieve newer goals. These use cases demand the collection of data from different data sources, joining across different layers, correlation, and processing across different domains, all in real-time. The future of analysis is less about understanding “what happened” and more about “what’s happening or what may happen”

Few examples Use cases in this context

E-commerce

Let’s analyze some of the use cases. Consider an e-commerce platform that is integrated with a real-time stream processing platform. Using this integrated streaming analysis of data, it could combine & process different data in real-time to figure out the intent or behavior of the user to present a personalized offer or content. This could increase the conversion rate significantly or reduce to eroding customer engagements. It could also have better campaign management to yield better results for the same spend

Data Center

Think of a small or mid-size data center (DC) that typically has many kinds of different devices and machines each generating volume of data every moment. They typically use many different static tools for different kinds of data in different silos. These tools not only restrict the DC from having a single view of the entire data center but also works as a BI tool. Because of this, the issue identification in a predictive or real-time manner doesn’t happen as a result firefighting becomes the norm of the day. With a converged integrated stream processing platform, DC could have a single view of the entire DC along with real-time monitoring of events, and data to ensure issues are caught before they may create bigger problems. A security breach could be seen or predicted much earlier before the damage is done. Better resource planning and provisioning could be done by analyzing the bandwidth usage and forecasting in near real-time

IOT

The entire IoT is based on the premise that everything can generate data and interact with other things in real-time to achieve larger goals. This requires a real-time streaming analytic framework to be in place to ingest all sorts of unstructured data from different disparate sources, monitor them in real-time, and take actions as required after identifying either known patterns or anomalies

AI and Predictive Analysis

AI and predictive analytic means that the data is being collected and processed in real-time otherwise the impact of AI could only be in understanding what happened. And with the growth of data and types, it will be prudent to not rely solely on what has been learned so far in the hindsight. Demand will be in reacting to new things as it’s seen or felt. Also, we have learned from our experiences that a model trained on older data often struggles to deal with newer data with acceptable accuracy. Therefore, here also the real-time stream processing platform becomes the required part rather than a good to have a piece

Limitations with existing tools or platforms

There are two broad categories in which we can slot the options available in the market. One is an appliance model and another one is a series of open-source tools that need to be assembled to create a platform. While the former costs several millions of dollars upfront the latter requires dozens of consultants for several months to stitch create a platform. Time to market, cost, ease of use, and lack of unified options are a few major drawbacks. However, there are bigger issues to be addressed by either of these options when it comes to stream processing and here, we require a new approach to solve the problems. We can’t apply older tools to newer, future-looking problems, otherwise, it will remain a patchwork and would not scale to the needs of the hour

Challenges with Stream Processing

Here are the basic high-level challenges when it comes to dealing with the stream of data and processing them in real-time.

Deal with the high volume of unstructured data in an extremely low latency manner
Avoiding multiple copies of the data across different layers and over a network as well
Optimal flow of the data through the system
Partitioning the application across resources
Stream processing data in real-time
Data storage in a unique manner and most suitable manner
Processing data and taking actions before it’s persisted
Remain predictive rather than only forensic or BI tool
Ease of use – we should not code for months before seeing the results
Time to market – off the shelf such that the app can go to market in a short time
Deployment model – hybrid. From within device to LAN to Cloud – all interconnected

Most of the options in the market suffer from these bottlenecks. Let’s take a few examples.

Spark

Spark follows the map-reduce model philosophically although in a much more efficient manner. However, it still deals with batches. Spark deals with micro-batches of a given size with a given batch interval. It has several problems when it comes to aligning with stream processing, in fact, its approach is the antithesis of stream processing

A Micro or macro batch is not important as long it’s a batch. Consider a macro and micro-batch both with an equal number of events because of different speeds of the data. The concept of the batch doesn’t align with stream processing where processing every single event is important
Processing starts when a batch is full. This fails the premise of processing events as it comes
Stream processing happens within a moving or sliding window. Windowing with batches is not possible
When the batch processing time is more than the batch interval then the backlog only grows, this coupled with persisting data sets only aggravates the situation

Kafka + Spark + Cassandra

This model typically uses 5 or more distributed verticals, each containing many different nodes. This increases the network hops, and data copy, to a great level which eventually increases the latency. Scaling such a system is not trivial as we have different dynamic requirements at a different levels. Further, the cost of adding new processing logic is significantly higher than a simple BI tool where things could be handled using a dashboard. Finally, it requires a large team and resources which increases the cost. This can hardly be deployed for a scenario where sub-second latency is desirable

Kinesis

AWS kinesis at best is equivalent to Kafka, a distributed, partitioned messaging layer. Users still must assemble, process, store, vision, etc. layers themselves

Why BangDB

We need a platform that is designed and implemented inhomogeneous manner, towards a single goal of process stream data in real-time, which avoids all the above pitfalls and remain immune to future requirements, and scales well for higher load and volume of data

BangDB has tried to address most of the above-mentioned problems by designing and building the entire stack from the ground up. Here is a brief introduction of the BangDB platform in the light of the known issues identified

Deal with a high volume of unstructured data

BangDB has built a high-performance NoSQL database that scales well for a large amount of data. Also, it follows the convergence model to scale linearly. Scaling “single thing” vs “many things” addresses the problem to a large extent. BangDB also follows the FSM model and implements SEDA to achieve a cushion against the sudden surge in data

BangDB follows a true convergence model for higher performance, ease of management, and massive linear scale

Avoiding multiple copies of the data across different layers

BangDB removes all silos. The silos not only add the latency but also forces data to be copied across different verticals

Optimal flow of the data through the system

BangDB processes the data before it reaches the disk. This is opposite to most of the systems in the market. Further BangDB also avoids post-processing as much as possible, almost negligible. All these happen when data reaches a node, therefore there are no network packet hops for the data

Partitioning the application across resources

Convergence allows BangDB to partition application and data and all other resources in a single-dimensional manner. This enables the partitioning of space rather than the partitioning of a different set of spaces. Therefore, it naturally enforces optimal use of added capacity and resources which is otherwise is difficult to predict and provision

Processing streaming data in real-time

BangDB process every single event rather than micro or macro batches. Hence, the data is updated in real-time, a pattern is identified in real-time, and insight is also served to the application in real-time. Most of the streaming real-time use cases emphasize the need to process data in a sliding window, BangDB provides a configurable sliding window within which most of the processing happens

True stream processing with continuous sliding window. Most of the operations happen within the sliding window

BangDB follows the reverse of Map Reduce to achieve very high performance for reads. This is done by avoiding all sorts of post-processing of data and keeping the data in a format needed by the user

Data storage

BangDB stores both raw data and the extracted insights or aggregated data within the system. It’s a persistent platform hence it could store as much data as required. However, most of the time it is critical to process data in real-time and then push it to an offline system for deeper analysis. Therefore, BangDB connects with Hadoop and other long term storage frameworks as well

BangDB has an IO layer that uses SSD as an extension of RAM rather than a replacement for File System, thereby allowing out-of-memory computations and data handling without severe degradation in performance

BangDB also uses SSDs in totally different ways to achieve cost-effectiveness and elasticity. SSDs are typically used by others as a replacement for file systems (or HDD) where the gain is limited and if not used properly life and performance go down as well. BangDB has written software to mimic SSD as an extension of memory by which the performance can be increased multifold and also the cost-effectiveness can be achieved to a great extent

Remain predictive rather than only forensic or BI tool

BangDB aims to be predictive. BangDB processes and analyses data in both an absolute and predictive manner. It uses complex event processing for absolute pattern recognition. It also uses supervised and unsupervised machine learning for the pattern or anomaly detection. BangDB platform provides simple ways to upload and train models. It also integrates with “R” for data science requirements

Ease of use

Both appliance and open-source models provide a technology platform where the new analytic application or processing code must be developed and deployed on the production system. This requires a typical test for production DevOps and release management. BangDB provides an integrated dashboard to make the platform totally extensible. Users can perform all actions using the dashboard without ever developing code or application. Further BangDB has developed pre-baked apps in different domains and uploaded them to its AppStore such that users can simply take those apps, configure and start dealing with real-time insights.

Time to market

BangDB platform is hosted on a cloud as a SaaS model along with AppStore with several solutions. This allows users to start within an hour or even less. There is no stitching time, deployment time, or even development time, everything is ready to go with a set of clicks

Deployment model

BangDB can be deployed within the device for state-based computations including CEP and ML processing. Further BangDB could be in LAN and Cloud too. All of these could be interconnected for supercharging orchestrations. BangDB has a subscription model and users can start within a minute using BangDB SaaS and then grow as needed. Get started with BangDB by simply downloading it

Further Reading: Also check out two other blogs on similar topics which might be useful.

Why a modern data processing platform must have AI, Graph, Stream, and Unstructured data processing capabilities
Why a Novel Data Processing Philosophy Is Necessary For An Emerging Data Trend