Data Science: Statistical Programming with R
Students learn to operate R, data manipulation and data visualisation, work with the (generalized) linear models, conduct simulation studies (e.g. bootstrap) and present results of data analyses in publication ready tables. We will work with the RStudio environment and Quarto. The focus will be on the tidyverse package.
R is a very popular and powerful platform for data manipulation, visualization, and analysis and has a number of advantages over other statistical software packages. A wide community of users contribute to R, resulting in broad coverage of statistical procedures, including many that are not available in any other statistical programme.
Target Audience
Applied researchers and (master) students who already use statistical software and would like to learn to use, or improve their usage of, the R environment. Understanding basic statistical theory such as t-tests, hypothesis testing, and regression is required. Participants from a variety of fields—including sociology, psychology, education, human development, marketing, business, biology, medicine, political science, and communication sciences—will benefit from this course. A maximum of 80 participants will be allowed in this course, and selection for the course will be done on a first-come-first-served basis.
For an overview of all our summer school courses offered by the Department of Methodology and Statistics please click here.
For participants who are also interested in more advanced data analysis techniques (machine learning) are recommended to follow the subsequent course Data Science: Data Analysis (Course code S31, 15-19 July 2024).
Aim of the course
The course teaches students the skills needed to understand how R works and how to use R for a variety of statistical analyses. The following skills and learning goals are covered in this course:
- Being able to work with the R environment (RStudio) and the online resources;
- Perform reproducible data analyses with Quarto and RStudio;
- Master data manipulation (cleaning, transformation, recoding) with tidyverse;
- Summarizing data in publication-ready tables;
- Make high-quality plots with ggplot;
- Work with pipelines (tidyverse);
- Fitting and interpreting (generalized) linear models;
- Being able to perform a bootstrap;
- Learn the basics of a shiny app.