Lego Price Regression

doi:10.70124/f4hgt-3mv92

Published April 25, 2025 | Version 1.0

Model Open

Lego Price Regression

Waser, Konstantin¹

1. TU Wien

Context and methodology

Research Domain:

This dataset was created as part of a machine-learning experiment in the domain of collectible toy valuation - in particular, modelling the secondary (resale) price of LEGO® sets. It sits at the intersection of data science, retail analytics and cultural heritage (collector markets).

Purpose:

The goal is to predict a rounded, integer resale price for LEGO sets, given a handful of easily-available attributes (theme, subtheme, production year, piece-count, and original MSRP). By framing it as a regression problem, we can build and evaluate models that help collectors, resellers or analytics platforms estimate fair market values.

Creation of the dataset:

Raw data, already split into Train/Test/Validation was fetched via the DBRepo3 API. The categorical columns (theme, theme_group, subtheme) were label encoded.

Technical details

The dataset contains the columns theme, theme_group, subtheme, age, pieces, msrp_int, price_int and id. No special folder hierarchy or additional naming conventions are used. Working with the dataset requires only a standard Python environment (version 3.8 or higher), along with the pandas and NumPy libraries for data manipulation, scikit-learn for preprocessing and modeling, matplotlib for plotting, and the DBRepo3 REST client to fetch and store splits. Supplementary materials include a Jupyter notebook on GitHub that demonstrates all steps: modeling, evaluation, and artifact serialization. Also the DBRepo3 persistent identifiers for the three splits: “2401ab5e-693b-4235-b14e-0f3eb53ec773” for training, “723c26fe-f89d-475c-a69e-83336b32c7bf” for test, and “1e5040fc-8d05-4480-8176-975ad338f4d3” for validation are present.

Files

val_true_vs_pred.png

Files (10.0 MiB)

Name	Size
rf_price_predictor.pkl md5:33e1048973ea78488d888c303ceb5659	9.9 MiB	Download
val_true_vs_pred.png md5:5c54dc50debc9e374d997b0ebdfee078	84.5 KiB	Preview Download

Additional details

DOI: 10.70124/f4hgt-3mv92

Lego Price Regression

Creators

Description

Context and methodology

Research Domain:

Purpose:

Creation of the dataset:

Technical details

Files

val_true_vs_pred.png

Files (10.0 MiB)

Additional details

Identifiers