Building Intelligent Systems: An Educational Machine Learning Project with Practical Applications
##semicolon##
Customer churn##common.commaListSeparator## telecom##common.commaListSeparator## logistic regression##common.commaListSeparator## random forest##common.commaListSeparator## machine learning##common.commaListSeparator## Streamlit##common.commaListSeparator## Telegram bot##common.commaListSeparator## data preprocessing##common.commaListSeparator## educationAnnotatsiya
This paper describes an educational data science project to predict customer churn for a company using machine learning techniques in the telecom industry. What the company wants to know is why customers are leaving and which ones are at risk of churning. In this work, we analyzed, cleaned, and set up a real-world style dataset of data in the form of 7,043 customers (21 features) to be used in our modeling. The objective variable is Churn (Yes/No).
In the data preparation, the missing values in numerical variables (tenure, MonthlyCharges, and TotalCharges) were managed by converting them into the appropriate form and median imputation. Outliers and anomalies (negative tenure values and very large values of TotalCharges above the 99th percentile, for instance) were treated and capped. Categorical attributes were transformed into numeric data using label encoding and non-informative identification (customerID etc.) were discarded from the model.
Two classification models were created and evaluated; Logistic Regression and Random Forest Classifier with the help of scikit-learn library. The data was partitioned into training and testing datasets in an 80/20 ratio. Logistic Regression got to about 80.4% accuracy on the test set, and Random Forest got to about 79.1%. Logistic Regression was chosen as the highest performing model according to the accuracy. The confusion matrix was adopted to analyze the model performance regarding the actual classification of users as “Stay” and “Leave” customers, as well as their incorrect classification (the latter based on the accuracy of their classification).
Two user interfaces were implemented to present the project that are practically usable and convenient to implement: a Streamlit web application and a bot for sending messages in a Telegram application, written with python-telegram-bot. Both interfaces enable users to enter customer information, and are returned churn predictions, along with risk levels and business recommendations. The resulting version of the system illustrates that machine learning, software engineering, and UI design can work together to produce an end-to-end solution that is useful for strategic and instructional purposes.
##submission.citations##
1. Brownlee, J. (2016). Master machine learning algorithms: Discover how they work and implement them from scratch. Machine Learning Mastery.
2. Geron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems (2nd ed.). O’Reilly Media.
3. Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (3rd ed.). Morgan Kaufmann.
4. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.
5. Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). Wiley.
6. Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer.
7. Lantz, B. (2019). Machine learning with R: Learn how to use R to apply powerful machine learning methods and gain an insight into real-world applications (3rd ed.). Packt Publishing.
8. Müller, A. C., & Guido, S. (2017). Introduction to machine learning with Python: A guide for data scientists. O’Reilly Media.
9. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.


