Published April 28, 2025 | Version v1
Computational Notebook Open

Illicit antiques trade network knowledge graph

Creators

  • 1. ROR icon TU Wien

Contributors

  • 1. ROR icon TU Wien

Description

1. Dataset Description

Research Domain

This dataset was developed within the domain of digital humanities and criminology, specifically the study of illicit antiquities trafficking networks. It was created as part of the research outlined in the article: "Relationship Prediction in a Knowledge Graph Embedding Model of the Illicit Antiquities Trade" published in Advances in Archaeological Practice (Cambridge University Press).

Purpose

The dataset serves the purpose of transforming historical information about the Italian illicit antiquities trade into a machine learning-friendly knowledge graph format. and evaluating the performance of several embedding models on this dataset, as a way of exemplifying the usefulness of Knowledge Graphs in this domain.

This enables:

  • Relationship prediction through knowledge graph embeddings.
  • Exploration of networked criminal behavior through computational methods.
  • Development of "digital tools" for combating illicit antiquities trading, responding to calls from funding bodies like the EU Horizon Europe scheme.

By producing this Notebook, I aim to support researchers and practitioners in better understanding and predicting the structure of hidden criminal networks.

Data Sources

The dataset is based on an 1995 "organigram" seized by the Carabinieri, depicting the network of antiquities traffickers. Supplementary information from the Trafficking Culture Project Encyclopedia (as of May 2022) has been added by the authors. The data reflects a bounded body of knowledge as curated by the Trafficking Culture Project and is not an exhaustive or final catalog of the trade.

Method of Dataset Creation

The original dataset was created through the following steps:

  • Extraction of network data from historical sources and digital encyclopedic entries.
  • Conversion into a knowledge graph where nodes represent individuals/entities and edges represent relationships.

Further it has been processed as such:

  • Splitting the graph into training, validation, and testing datasets (80%/10%/10% split). A fixed random seed (42) was used throughout to ensure reproducibility.
  • Mapping entity and relation IDs back to their labels for human interpretability.
  • Used  for training multiple knowledge graph embedding models (TransE, ComplEx, ConvE, DistMult) using PyKEEN pipelines.

The best ranking model was further used for UMAP dimensionality reduction and Cosine similarity search between entities

The knowledge Graph was enriched with semi-automatically generated Labels using the spaCy library.

 

2. Technical Details


Dataset Structure

The original dataset is organized into a single .CSV file with columns: head — the subject entity, relationship — the predicate (relation type), tail — the object entity. Each row represents a triple in the knowledge graph.

An additional column, split, indicates membership in "train," "validation," or "test" subsets.

 

Naming Conventions

Columns are labeled with descriptive names.

Entity and relation labels have been mapped back from their original integer IDs to human-readable names.

Dataset splits are explicitly labeled (train, validation, test).

Measurement results and trained models are saved under suggestive names in  dedicated folders

Images and Figures are saved as .JPG in a dedicated folder. These include

  • A histogram visualizes the distribution of different relationship types across the dataset.
  • A spring layout graph, where edges between entities are color-coded based on their relationship type.
  • Bar plots vizualising how the embedding models scored in certain relevant metrics

Required Software

The dataset is provided as a Pandas DataFrame or can be exported as a CSV.

Requires Python libraries such as:

  • pandas
  • numpy
  • torch (for tensor manipulations)
  • spaCy
  • PyKEEN library for knowledge graph embedding modeling. The latest stable version of Python that supports PyKEEN is 3.9 (according to PyKEEN documentation)

Additional Resources

Original Research Article: https://doi.org/10.1017/aap.2023.1

Trafficking Culture Encyclopedia: https://traffickingculture.org/encyclopedia/

PyKEEN Documentation: https://pykeen.readthedocs.io/en/stable

 

3. Further Details


Data Limitations:

The source material is historically bounded and non-exhaustive. There may be bias based on the original focus areas of the Trafficking Culture Project. While based on public historical documents and research, users should exercise caution if trying to infer sensitive or personally identifiable information.

Licensing and Attribution:

Appropriate credit should be given to the original authors and the Trafficking Culture Project, as the data is under ... license.

Recommended Uses:

  • Academic research into illicit trade networks.
  • Development and testing of knowledge graph embedding models.
  • Exploratory work on machine learning approaches to organized crime network analysis.

Known Issues:

Some relationships in the original data are inferred or reconstructed based on available evidence, rather than direct observation.

Model performance may vary based on sparsity and the nature of the network structure.

Active and passive voice do not reflect in the directionality of the relationships.

Files

generated_labels.csv

Files (14.5 MiB)

Name Size
md5:e8c302f066b4c92a69895db1a3fe7022
888.9 KiB Download
md5:a96ec78c6907dd96333c68cf4f874ba8
93.3 KiB Download
md5:628caa7ef3cc2546580409d6a1dc4037
13.0 KiB Preview Download
md5:66d076399b0da20f703916a2b1b62278
63.2 KiB Preview Download
md5:5abfacd2892fd36404d835bc2178e115
27.7 KiB Preview Download
md5:e60f4ddd353b068b2450a843c19a18cf
3.6 MiB Preview Download
md5:df605191cc80c8d8b850b13647fcd70e
270.2 KiB Preview Download
md5:f1d7eeac8df67c3e9f408db51738ba1f
111.2 KiB Download
md5:db75a4378f576d167e1ebd33f82253af
875.2 KiB Download
md5:79756aac9b2cb8c432dc8808fcad870a
8.4 MiB Download
md5:182f39fe94fdb049a2b96e9c3f759421
111.8 KiB Download
md5:c67837c2431abf384025749cf3a2341b
3.6 KiB Preview Download
md5:1fee653cf9b2b526c2978ad0548eaaa8
3.1 KiB Preview Download
md5:65d7942202097ff29936994399afb8d1
12.0 KiB Preview Download
md5:38202dbf4ed7cfe6785f7f2da85a5a86
12.3 KiB Preview Download
md5:28dbe01aa419b00d9536824b9f1db185
12.3 KiB Preview Download
md5:480a8dbefcfca1816052c81c26b67463
11.9 KiB Preview Download
md5:66fbb53ec98362d5607cdaabb95a7c9b
28.7 KiB Preview Download
md5:43774cdd7346f9b82d6ac890341dc95d
337 Bytes Preview Download
md5:73cc454da3e1d78125190ef41723219e
3.9 KiB Download
md5:23a14d3df04ed5cc62196ab3e5c257db
84.0 KiB Preview Download

Additional details

Related works

Cites
Journal Article: 10.1017/aap.2023.1 (DOI)
Is source of
Dataset: 10.82556/xfvk-2712 (DOI)