Telecom Customer Churn Prediction using Artificial Intelligence - Prof. Ch, Study Guides, Projects, Research of Artificial Intelligence

A comprehensive analysis of a telecom customer churn prediction problem using artificial intelligence techniques. The dataset provided by ibm includes customer demographic, service usage, and account information. The main objective is to analyze the behavior of churning customers and develop strategies to increase customer retention. Various steps, including data exploration, feature engineering, model selection (decision tree classifier), and evaluation using metrics like confusion matrix, roc curve, and precision-recall curve. The analysis provides insights into the key factors influencing customer churn, such as monthly charges, tenure, and service usage. The findings can help the telecom company's retention team identify high-risk churn customers and implement targeted retention plans to improve customer loyalty and reduce attrition.

Typology: Study Guides, Projects, Research

2023/2024

Uploaded on 06/01/2024

khadija-yousaf
khadija-yousaf 🇵🇰

1 document

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ARTIFICIAL INTELLIGENCE
COMPLEX ENGINEERING PROBLEM
Submitted To : Dr Umar Ch
Submitted By :
2020-CPE-03 Khadija Yousaf
2020-CEP-14 Usaid Ullah
2020-CPE-17 Arooj Fatima
2020-CPE-24 Hamid Majeed
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Telecom Customer Churn Prediction using Artificial Intelligence - Prof. Ch and more Study Guides, Projects, Research Artificial Intelligence in PDF only on Docsity!

ARTIFICIAL INTELLIGENCE

COMPLEX ENGINEERING PROBLEM

Submitted To : Dr Umar Ch

Submitted By :

2020 - CPE- 03 Khadija Yousaf

2020 - CEP-14 Usaid Ullah

2020 - CPE-17 Arooj Fatima

2020 - CPE- 24 Hamid Majeed

Telecom Customer Churn Prediction

1. Introduction

Dataset, Features and Target value

Here, IBM provided customer data for Telecom industry to predict churn customer based on demographic, usage and account based information. Main objective here is to analyze churn customers’ behavior and develop strategies to increase customer retention. Assumption — Here, data source has not provided any information related to time; So I have assumed that all the records are specific to the particular month. Dataset has information related to,

Demographic:

 Gender - Male / Female  Age range - In terms of Partner, Dependent and Senior Citizen

Services:

 Phone service - If customer has Phone service, then services related to Phone like;  Multiline Phone service  Internet Service - If customer has Internet service, then services related to Internet like;  Online security  Online backup  Device protection  Tech support  Streaming TV  Streaming Movies

Account type:

 Tenure - How long customer is with the company?  Contract type - What kind of contract they have with a company? Like  Monthly bases  On going bases - If on going bases, then One month contract or Two year contract  Paperless billing - Customer is paperless billion option or not?  Payment method - What kind of payment method customer has?  Mailed check

Customer ID contains no specific information about any customer hence we need to drop this column from our dataset Now, let us have a look at our target variable. Here the target variable is “Churn” i.e. if a customer is going to stop using the services of Telco or not. We can see the proportion of the people opting out of the Telco services (churners) through the following code. It builds a simple pie chart to show the distribution of two classes in the dataset.

We can see through the pie chart, that 26.5% of customers present in the dataset are actually the churners i.e the people who opt out of the Telco services. The two target variables have a class imbalance in this case. No is present 73.5% of the times whereas yes is present only 26.5% of the times. In order to have a good machine learning model, we need to have a balance of the classes say 50% of both the class or in general, equal representation of the classes in the target variable. There are techniques to manage the imbalanced classes like SMOTE but this is beyond the scope of this article for now. Decision trees can handle some imbalance present in the data better than other algorithms hence, we will not be using any technique to obtain balanced classes. Finally, we need to convert yes/no of the target variable to 0 or 1. Machine learning models do not understand categorical variables. They need numbers to implement the underlying algorithms. Hence, our task in the data processing stage is to convert all the categorical variables to numbers.

Analysing the number of months a person has stayed in the company

Let us analyse the number of months variable in order to understand the churn of the customers. We are plotting a distribution chart using the seaborn library in python through the following code! We can clearly observe that most numbers of customers leave before 20 months. Once a customer has stayed with Telco for more than 20 months, it is highly unlikely that he is going to stop using the services provided by the Telco

Handling the telephone lines variable

Converting telephone line variable into dummy values

Preprocessing OnlineSecurity OnlineBackup DeviceProtection TechSupport

StreamingTV StreamingMovie

Analysing monthly and total charges

In this section, we will analyse the total and monthly charges column to understand their effect on the churning of the customers. Below code outputs a box plot based on the monthly charges between the customers who have left using the services and those who have not. The high monthly charge is one cause for customers opting out of the services. The box plot also suggests that that loyal customers are those who have a low monthly charge.

Before we move on to the evaluation of our algorithm, let us quickly test the accuracy of our model by making some predictions. Below code makes some predictions on the test data. We are using cross-validation here in order to avoid bias while testing the model for accuracy.

Feature Importance

One great aspect of rule-based learning is that we can actually visualise how important a given feature is towards predicting our target variable. Below code forms a chart showing the relative importance of the features present in the data towards the prediction of our target churn class.

Evaluation

In order to evaluate our classification model, we are using the confusion matrix. A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. You can read more about the confusion matrix here. Following code generates the ROC curve for the predictions performed by our decision tree classifier. More the area under the ROC curve, better the prediction capability of the decision tree classifier.

Conclusion

I have tried to divide customer churn prediction problem into steps like exploration, profiling, clustering, model selection & evaluation and retention plans. Based on this analysis, we can help retention team to analyze high risk churn customers before they leave the company.