Analysis of programming language popularity and job market demand
Description
Context and Methodology
This dataset was created as part of the research project "Analysis of Programming Language Popularity Trends (2021-2025)" at TU Wien. The purpose of this dataset is to correlate professional developer usage (derived from the Stack Overflow Annual Developer Survey) with general interest trends (derived from the PYPL Index) to provide a comprehensive view of programming language adoption.
Methodology:
The data was generated using a custom Python ETL (Extract, Transform, Load) pipeline.
Input 1: Raw survey responses were downloaded from the official Stack Overflow archives (2021–2025).https://survey.stackoverflow.co/
Input 2: Historical search trend data was scraped from the PYPL website's source files.https://pypl.github.io/PYPL.html
Processing: The datasets were cleaned, harmonized (standardizing language names), and merged on a temporal axis using the Pandas library.
Technical Details
File Format: The main output is a structured CSV file (Processed_Programming_Language_Popularity_Dataset.csv).
Structure: Rows represent time periods (Year/Month) and programming languages; columns represent popularity metrics (e.g., "Survey_Usage_Percent", "Search_Share_Percent").
Software: The file is a standard CSV and can be opened with any spreadsheet software (Excel, LibreOffice) or text editor.
Source Code: The Python scripts used to generate this dataset are available in the associated GitHub repository (linked in the metadata).
Rights and Licensing
License: This dataset is licensed under the Open Database License (ODbL 1.0).
Attribution: Please cite this dataset and the original sources (Stack Exchange Inc. and Pierre Carbonnelle) when reusing this data.
Files
Processed_Programming_Language_Popularity_Dataset.csv
Additional details
Related works
- Is supplemented by
- Software: https://github.com/Rojta26/DMP_Project (URL)