Published April 14, 2026 | Version v2
Report Open
DMP:Vienna Traffic Peak Volume Prediction
Description
Context and methodology
- The aim of this dataset is to look at traffic congestion patterns in Vienna and figure out whether station-level vehicle count data could be used to classify traffic intensity into three categories: Low, Medium, and High. The raw data was taken from data.gv.at, which is Vienna's open data portal, and was not gathered manually. Everything else in the deposit was produced by running a Python script that cleans the data, trains two classifiers, and saves the results.
Technical details
- There are nine files in total. D1 is the original CSV from data.gv.at, which uses semicolon delimiters and latin1 encoding and has 45,672 rows. D2 is the processed version of that file saved as a standard CSV. D3 contains the predictions made on the test set. D4, D5, and D6 are PNG images showing the traffic volume histogram, the confusion matrix, and the model comparison chart. D7 is the Python script. D8 is the README and D9 is the DMP.
- Two classification models are used here: a Decision Tree with a max depth of 8 and a Random Forest with 100 estimators. Both are trained using scikit-learn. The TVMAX column was binned into three congestion classes using the 33rd and 66th percentiles as cut-off points. The features used for training were ZNR, RINAME, and FZTYP, all of which were label-encoded before being passed to the models. The dataset was split into training (70%), validation (15%), and test (15%) sets. The best model was picked based on validation accuracy and then evaluated once on the test set.