Published April 28, 2025 | Version v1
Dataset Open

Telco_Customer_churn_Data

Creators

  • 1. ROR icon TU Wien

Description

Context and Methodology

The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).

The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.

The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.

Technical Details

The dataset has a tabular structure and was initially stored in CSV format. It contains:

  • Rows: 7,043 customer records

  • Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).

Naming Convention:

  • The table in the database is named telco_customer_churn_data.

Software Requirements:

  • To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).

  • For machine learning applications, libraries such as pandas, scikit-learn, and joblib are typically used.

Additional Resources:

Further Details

When reusing the dataset, users should be aware:

  • Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

  • Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).

  • Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.

Files

confusion_matrix.png

Files (270.7 KiB)

Name Size
md5:29db9b955e6fe7ad83d063f2d908eacc
20.7 KiB Preview Download
md5:ffde531d6e44a6df1d2d49fe67fefee4
130.7 KiB Preview Download
md5:8821c30ce8788ec44ac0e87e0b6581b8
119.3 KiB Download

Additional details

Dates

Submitted
2025-04-28