Published November 23, 2024 | Version 1.0.0
Dataset Open

Manipulating data using R

  • 1. ROR icon TU Wien

Description

Data created during Computer Statistics assignment

Context and methodology

  • This is used for the project in the context of the "Introduction to Research Data Management" course, 2024 winter semester. Originally it was made for a homework assignment in the "Computer Statistics" course, 2023 winter semester.
  • The dataset consists of the following: code (and comment) written in the R markdown language that is to be compiled and executed in order to generate the 2 datasets created in the project; .pdf file generated from compiling and executing the aforementioned R code using RStudio; .txt file generated as part of one of the exercises in the assignment, also by compiling and executing the R code.
  • The code was written by Vseslav Levchenko in R, using RStudio.

Technical details

  • The code was written in RStudio and it is recommended to use it when working with R, however it is not strictly necessary. However, it is required to install the R language itself. For the other files, standard software like Microsoft Excel and any PDF reader are all that is needed.
  • The code also contains necessary comments, and a .pdf file with the assignment's tasks is provided separately.

Abstract (English)

General experiment description

The experiment involves downloading 6 datasets from TUWEL (.csv and .rds formats) and manipulating, analyzing and outputting the data in various ways using the R programming language. There are 5 different tasks, all with at least 3 subtasks. Those include reading the raw data, visualizing various parts of it using graphs, charts and tables, comparing some datasets, as well as making a new one and outputting it into a .txt file. All the research questions are to be answered using R code, commentated where necessary. All of the code, comments and data visualizations have to be presented in a .pdf file (generated by running the code in RStudio) with a table of contents.

The experiment aims to answer specific questions about the data presented (e.g. visualizing the Sustainable Development Index of various countries; visualizing, analyzing and making a new dataset from given datasets of movies shown in cinemas and their ratings), as well as to research some mathematical concepts used primarily in statistics, such as the Mahalanobis distance, the Euclidian distance, the Andrews curve, etc.

Files

Homework3.pdf

Files (3.2 MiB)

Name Size
md5:5b44ec33e11b7feb28f3bf5964dabe07
2.1 MiB Preview Download
md5:371f958d8776fdd1f87b7d208881ed47
10.7 KiB Download
md5:068c7f99801c836f1b52eec44f8eb4f3
1.0 MiB Preview Download
md5:2a964ba84f79d09e888e91cb6c7be3a1
1.2 KiB Preview Download

Additional details

Dates

Created
2023-10-30
When the code was originally written and executed