Published April 9, 2026 | Version v1
Dataset Open

DMP: Predicting Road Accident Severity in Great Britain

  • 1. ROR icon TU Wien

Description

Context and Methodology

This dataset was created as part of a Data Stewarship course project at TU Wien. The research domain is road safety and supervised machine learning. The project applies three classification models (Decision Tree, Random Forest, and Gradient Boosting) to predict the severity of road traffic collisions in Great Britain (Fatal, Serious, or Slight) using police-reported data from 2019 to 2023.

The input data is the Road Safety Data – Collisions (Last 5 Years), published by the UK Department for Transport under the Open Government Licence v3.0, collected through the STATS19 police reporting system. The experiment addresses severe class imbalance (Fatal: 0.9%) using SMOTE oversampling applied exclusively to the training set.

Technical Details

The project follows a consistent folder structure:

 

  • data/raw/ — original unmodified dataset (CSV)

 

  • data/processed/ — cleaned and split CSV files (train, validation, test, train_resampled)

 

  • outputs/ — generated PNG charts (prefixed with two-digit numbers indicating pipeline order)

 

  • src/ — Python source code

 

  • README.md — full project documentation

 

All data files are in CSV format, openable with any spreadsheet tool (e.g., Microsoft Excel) or Python (pandas). Output visualisations are PNG files. The experiment code requires Python 3 with the following free, open-source libraries: pandas, scikit-learn, imbalanced-learn, matplotlib, and numpy.

Files

DMP Predicting Road Accident Severity in Great Britain.pdf

Files (801.3 KiB)

NameSize
md5:2dcc238cc74038586b43e067d868ba4c
801.3 KiBPreview Download