Data Clustering Analysis
Credit Card Data Clustering Analysis using Kmeans on BangDB
This case requires developing a customer segmentation to define marketing strategy. The Dataset summarizes the usage behavior of about 14360 active credit card holders during the last 6 months. The file is at a customer level with 18 behavioural variables.
This is taken from Kaggle, here is the problem statement in detail
Build a model to perform customer segmentation.
Get help doc
Download the file and follow the help doc to complete the task
The dataset contains 14360 events and 18 attributes. The dataset consists of both continuous as well as categorical attributes. The columns are CUSTID,BALANCE,BALANCEFREQUENCY, PURCHASES, ONEOFFPURCHASES, INSTALLMENTSPURCHASES, CASHADVANCE, PURCHASESFREQUENCY, ONEOFFPURCHASESFREQUENCY, ONEOFFPURCHASESFREQUENCY,CASHADVANCE, PURCHASESFREQUENCY, ONEOFFPURCHASESFREQUENCY, PURCHASESINSTALLMENTSFREQUENCY, CASHADVANCEFREQUENCY, CASHADVANCETR, PURCHASESTRX, CREDITLIMIT, PAYMENTS, MINIMUM_PAYMENTS, PRCFULLPAYMENT, TENURE.
We have to do customer segmentation for the given problem. To do this we are selecting the kernel k-means algorithm( KKmeans). In kernel k-means before clustering, points are mapped to a higher-dimensional feature space using a nonlinear function, and then kernel k-means partitions the points by linear separators in the new space.
There are three ways to train a model on bangdb. One is to directly register the meta_data for training ( we call it json request which contains all the details about the model ), the second is to create mage_data for training by following the workflow on CLI and the third is by using api’s.. Here we will be training the model using all methods.
On bangdb, we have the option of training a model from a file( file format can be CSV or JSON) containing training data or from streaming data. Here we are training the model from a CSV file.
using Cli workflow