Published April 23, 2025 | Version v1
Model Open

BEST Office TOF Sensor Activity

  • 1. ROR icon TU Wien

Description

Abstract

This experiment involves training a machine learning model to predict door activity levels from sensor readings, based on time features. A TOF sensor mounted near a doorway collected distance measurements, which were filtered, aggregated, and labelled as activity counts in 10-minute bins. These were then used as training targets for a gradient boosting model. The model’s predictions were evaluated against a holdout test set and visualised in a time series plot. The goal of the project was to demonstrate basic principles of reproducible modelling and data publication, including FAIR metadata.


Context and methodology

This upload is part of a coursework project in the Data Stewardship course, TU Wien, 2025 summer semester. The model and results were generated for the assignment.

The dataset used consists of distance measurements captured by a TOF sensor installed near a door in the BEST Office (Room Code ACEG38). The data was cleaned, filtered and aggregated into 10-minute activity intervals. The machine learning pipeline was implemented in Python and includes preprocessing, model training, and evaluation stages. A gradient-boosted regression model was trained to predict activity counts based on time-based features (hour, minute, day of week, and weekend).

Technical details

  • Language and environment: Python 3.12, using scikit-learn, pandas, matplotlib

  • Model type: HistGradientBoostingRegressor from scikit-learn, trained using Poisson loss

  • Output files:

    • output_model.pkl: The trained machine learning model (serialised using joblib)

    • test_predictions_plot.png: A visual comparison of predicted vs. actual activity counts for the test set

The plot illustrates model performance using the test partition of the dataset, showing how well the predicted activity matches actual observations over time.

Provenance and metadata

  • Creator: Raphael-Hafis Kretschmer, TU Wien

  • Year: 2025

  • Language: Python

  • Dependencies: scikit-learn, matplotlib, pandas

  • Format: .pkl, .png

  • Model metadata is described using elements of the FAIR4ML schema(e.g. software environment, training parameters, target variable).

  • All code used to train and evaluate the model is version-controlled and publically available on GitHub.

 

Files

codemeta.json

Files (417.3 KiB)

NameSize
md5:22bc409bab1f749798d64bd434384ed6
1.3 KiBPreview Download
md5:edbaf18fc8e62df37255dc35b18c8e93
348.6 KiBDownload
md5:0972600c881570dede26f4a122f86bee
1.6 KiBPreview Download
md5:9680cfbd55bc61b9d38b822dc9eb17b5
65.8 KiBPreview Download

Additional details