







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A comprehensive analysis of a telecom customer churn prediction problem using artificial intelligence techniques. The dataset provided by ibm includes customer demographic, service usage, and account information. The main objective is to analyze the behavior of churning customers and develop strategies to increase customer retention. Various steps, including data exploration, feature engineering, model selection (decision tree classifier), and evaluation using metrics like confusion matrix, roc curve, and precision-recall curve. The analysis provides insights into the key factors influencing customer churn, such as monthly charges, tenure, and service usage. The findings can help the telecom company's retention team identify high-risk churn customers and implement targeted retention plans to improve customer loyalty and reduce attrition.
Typology: Study Guides, Projects, Research
1 / 13
This page cannot be seen from the preview
Don't miss anything!








Here, IBM provided customer data for Telecom industry to predict churn customer based on demographic, usage and account based information. Main objective here is to analyze churn customers’ behavior and develop strategies to increase customer retention. Assumption — Here, data source has not provided any information related to time; So I have assumed that all the records are specific to the particular month. Dataset has information related to,
Gender - Male / Female Age range - In terms of Partner, Dependent and Senior Citizen
Phone service - If customer has Phone service, then services related to Phone like; Multiline Phone service Internet Service - If customer has Internet service, then services related to Internet like; Online security Online backup Device protection Tech support Streaming TV Streaming Movies
Tenure - How long customer is with the company? Contract type - What kind of contract they have with a company? Like Monthly bases On going bases - If on going bases, then One month contract or Two year contract Paperless billing - Customer is paperless billion option or not? Payment method - What kind of payment method customer has? Mailed check
Customer ID contains no specific information about any customer hence we need to drop this column from our dataset Now, let us have a look at our target variable. Here the target variable is “Churn” i.e. if a customer is going to stop using the services of Telco or not. We can see the proportion of the people opting out of the Telco services (churners) through the following code. It builds a simple pie chart to show the distribution of two classes in the dataset.
We can see through the pie chart, that 26.5% of customers present in the dataset are actually the churners i.e the people who opt out of the Telco services. The two target variables have a class imbalance in this case. No is present 73.5% of the times whereas yes is present only 26.5% of the times. In order to have a good machine learning model, we need to have a balance of the classes say 50% of both the class or in general, equal representation of the classes in the target variable. There are techniques to manage the imbalanced classes like SMOTE but this is beyond the scope of this article for now. Decision trees can handle some imbalance present in the data better than other algorithms hence, we will not be using any technique to obtain balanced classes. Finally, we need to convert yes/no of the target variable to 0 or 1. Machine learning models do not understand categorical variables. They need numbers to implement the underlying algorithms. Hence, our task in the data processing stage is to convert all the categorical variables to numbers.
Let us analyse the number of months variable in order to understand the churn of the customers. We are plotting a distribution chart using the seaborn library in python through the following code! We can clearly observe that most numbers of customers leave before 20 months. Once a customer has stayed with Telco for more than 20 months, it is highly unlikely that he is going to stop using the services provided by the Telco
Converting telephone line variable into dummy values
In this section, we will analyse the total and monthly charges column to understand their effect on the churning of the customers. Below code outputs a box plot based on the monthly charges between the customers who have left using the services and those who have not. The high monthly charge is one cause for customers opting out of the services. The box plot also suggests that that loyal customers are those who have a low monthly charge.
Before we move on to the evaluation of our algorithm, let us quickly test the accuracy of our model by making some predictions. Below code makes some predictions on the test data. We are using cross-validation here in order to avoid bias while testing the model for accuracy.
One great aspect of rule-based learning is that we can actually visualise how important a given feature is towards predicting our target variable. Below code forms a chart showing the relative importance of the features present in the data towards the prediction of our target churn class.
In order to evaluate our classification model, we are using the confusion matrix. A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. You can read more about the confusion matrix here. Following code generates the ROC curve for the predictions performed by our decision tree classifier. More the area under the ROC curve, better the prediction capability of the decision tree classifier.
I have tried to divide customer churn prediction problem into steps like exploration, profiling, clustering, model selection & evaluation and retention plans. Based on this analysis, we can help retention team to analyze high risk churn customers before they leave the company.