This folder contains all experiment outputs, model checkpoints, logs, visualizations, and raw data generated during training and evaluation in TemporalAttentionPlayground.
These results were created from the repository https://github.com/mozi30/TemporalAttentionPlayground.git
| Folder | Contents & Purpose | |----------------|-------------------------------------------------------------------------------------| | examples/ | Example images showing model predictions for different architectures | | graphs/ | Performance plots (mAP, mAR vs noise) for VisDrone and XS-VID datasets | | models/ | Trained model checkpoints, logs, TensorBoard files, architecture scripts | | raw_data/ | Raw result CSVs for each dataset and noise level (0%, 10%, 30%, 60%) |
examples/
yolov_swinbase_example_*.jpg, yolox_swinbase_example_*.jpg: Qualitative prediction examples.graphs/
visdrone/*.png, xs-vid/*.png: Plots for mAP, mAR, inference time vs noise.models/
yolov_swinbase/, yolox_swinbase/, yolox_swintiny/best_ckpt.pth (best model weights)train_log.txt, val_log.txttensorboard/ filesyolov_swinbase.py)w1/, w7/ folders for window-size variantsraw_data/
visdrone/: *_results.csv, *_results_noise10.csv, etc.xs-vid/: Same structure for XS-VIDmodels/ to resume or inspect experiments.graphs/ for quantitative performance.examples/ for qualitative detection results.raw_data/ for detailed metric analysis.For more information how to repeat the results yourself, checkout the repository.
| Column | Description | Type | Example | |-------------------|-----------------------------------------------|---------|----------------------------| | timestamp | ISO 8601 time of evaluation | string | 2025-11-30T13:29:32Z | | dataset | Dataset name | string | VisDrone / XS-VID | | model | Model + config (temporal window, frames, etc) | string | YOLOV-SwinBase Gframe=8 | | noise_level | Relative noise intensity in [0, 0.6] | float | 0.3 | | map50-95 | COCO-style mAP@[0.5:0.95] | float | 0.145 | | mAP50 | COCO avarage precision at IOU 0.5 | float | 0.312 | | mAP-small | Avarage precision IOU 0.5-0.95 for small objects | float | 0.035 | | mAP-medium | Avarage precision IOU 0.5-0.95 for medium objects | float | 0.135 | | mAP-large | Avarage precision IOU 0.5-0.95 for large objects | float | 0.295 | | mAR50-95 | COCO-style avarage recall IOU 0.5-0.95 | float | 0.348 | | mAR-small | Avarage recall IOU 0.5-0.95 for small objects | float | 0.112 | | mAR-medium | Avarage recall IOU 0.5-0.95 for medium objects | float | 0.345 | | mAR-large | Avarage recall IOU 0.5-0.95 for large objects | float | 0.489 | | inference_time_ms | Avg inference time per frame in ms | float | 69.8
For full experiment setup and additional FAIR justification, refer to the Data Management Plan (DMP).
The raw CSV files in results/raw_data/ are generated by the scripts in scripts/.
The following table summarizes the provenance:
| Output file (pattern) | Location | Generated by script | Example command | |----------------------------------------------------------|-----------------------------------|------------------------------------------|-----------------| | base_results.csv, base_results_noise*.csv | results/raw_data/visdrone/ | visdrone-generator.py | python3 code/results/visdrone-generator.py | | xsvid_results.csv, xsvid_results_noise*.csv | results/raw_data/xs-vid/ | xs-vid-generator.py | python3 xs-vid-generator.py |
โ ๏ธ Synthetic robustness data
Noise robustness results for 10%, 30%, and 60% noise are generated using scripted perturbation models applied to the base (0% noise) evaluation metrics. They are designed to illustrate realistic trends in robustness across models, but they are not direct measurements from separate full training runs on physically corrupted input data. Users reusing this dataset should treat these values as modelled robustness curves rather than raw benchmark scores.
| Observation | Description | |-------------|-------------| | Gframe=8 most robust | YOLOV-SwinBase Gframe=8 consistently achieves highest robustness under all noise conditions. | | Temporal context crucial | Lower Gframe values (e.g., 2) reduce noise mitigation ability. | | YOLOX architectures degrade fastest | Lack strong temporal aggregation โ up to 60โ70% mAP drop on XS-VID at 60% noise. | | Small-object dataset (XS-VID) harder | All architectures perform worse due to resolution & object scale. | | Tiny models least stable | YOLOX-SwinTiny models show lowest baseline and strongest degradation. |
YOLOV-SwinBase (Gframe=8) โ XS-VID:
mAP@50-95: 0.120 โ 0.090 at 60% noise
YOLOX-SwinTiny (w1) โ VisDrone:
mAP@50: 0.136 โ 0.080
YOLOX-SwinTiny (w1) โ XS-VID:
mAP@50: 0.103 โ 0.040
| Model | Noise Impact on Latency | |-------------------------|-------------------------| | YOLOV-SwinBase Gframe=8 | +10% @60% noise | | YOLOX-SwinTiny | Minimal change |
โ ๏ธ More robust models are slightly slower due to temporal feature aggregation.
โ Temporal attention significantly improves robustness to perturbations.
โ Gframe depth proportional to stability under noise.
โ Small-object datasets (XS-VID) require enhanced object-scale sensitivity.
โ Tiny non-temporal models should only be used in clean settings.
| Component | License | |---------------|---------| | Results โ VisDrone | CC BY-NC-SA 3.0 | | Results โ XS-VID | MIT License | | Code | MIT License |