Telco_Customer_churn_Data

doi:10.82556/b0ch-cn44

Published April 28, 2025 | Version v1

Dataset Open

Telco_Customer_churn_Data

Naz, Erum¹

1. TU Wien

Context and Methodology

The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).

The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.

The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.

Technical Details

The dataset has a tabular structure and was initially stored in CSV format. It contains:

Rows: 7,043 customer records
Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).

Naming Convention:

The table in the database is named telco_customer_churn_data.

Software Requirements:

To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).
For machine learning applications, libraries such as pandas, scikit-learn, and joblib are typically used.

Additional Resources:

Source code for data loading, preprocessing, model training, and evaluation is available at the associated GitHub repository: https://github.com/nazerum/fair-ml-customer-churn

Further Details

When reusing the dataset, users should be aware:

Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).
Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.

Files

confusion_matrix.png

Files (270.7 KiB)

Name	Size
confusion_matrix.png md5:29db9b955e6fe7ad83d063f2d908eacc	20.7 KiB	Preview Download
prediction_results.csv md5:ffde531d6e44a6df1d2d49fe67fefee4	130.7 KiB	Preview Download
trained_churn_model.pkl md5:8821c30ce8788ec44ac0e87e0b6581b8	119.3 KiB	Download

Additional details

Submitted: 2025-04-28

Telco_Customer_churn_Data

Creators

Description

Context and Methodology

Technical Details

Further Details

Files

confusion_matrix.png

Files (270.7 KiB)

Additional details

Dates