DMP: Predicting Road Accident Severity in Great Britain
Creators
Description
Context and Methodology
This dataset was created as part of a Data Stewarship course project at TU Wien. The research domain is road safety and supervised machine learning. The project applies three classification models (Decision Tree, Random Forest, and Gradient Boosting) to predict the severity of road traffic collisions in Great Britain (Fatal, Serious, or Slight) using police-reported data from 2019 to 2023.
The input data is the Road Safety Data – Collisions (Last 5 Years), published by the UK Department for Transport under the Open Government Licence v3.0, collected through the STATS19 police reporting system. The experiment addresses severe class imbalance (Fatal: 0.9%) using SMOTE oversampling applied exclusively to the training set.
Technical Details
The project follows a consistent folder structure:
data/raw/— original unmodified dataset (CSV)
data/processed/— cleaned and split CSV files (train, validation, test, train_resampled)
outputs/— generated PNG charts (prefixed with two-digit numbers indicating pipeline order)
src/— Python source code
README.md— full project documentation
All data files are in CSV format, openable with any spreadsheet tool (e.g., Microsoft Excel) or Python (pandas). Output visualisations are PNG files. The experiment code requires Python 3 with the following free, open-source libraries: pandas, scikit-learn, imbalanced-learn, matplotlib, and numpy.