Data Science: Solving Missing Data Problems in R

  • Certificate
  • 840
  • Classes in English
  • LocationUtrecht
  • Start20 July 2026
  • Duration4 days
  • ECTS1.5 EC

Missing data often disrupt real-world machine learning and AI applications. Through a mix of lectures, hands-on labs, and practical case studies, you will learn to implement flexible, scalable, and statistically sound solutions using R, with emphasis on real-world applicability. Led by experts in the field, including developers of the widely used mice package, the course explores contemporary approaches for generating imputations, synthesizing data, integrating imputation into AI workflows, and diagnosing the impact of missing data in statistical models and predictive modeling pipelines.

Most researchers need to deal with incomplete data. Missing data complicate the statistical analysis of data. Simply removing the missing data is not a good strategy and can bias the results. Multiple imputation is a general and statistically valid technique to analyze incomplete data. Multiple imputation has rapidly becoming the standard in social and behavioural science research.

This course will explain modern and flexible imputation and data synthesis techniques that are able to preserve salient data features. The course enhances participants’ knowledge of imputation principles and to provides flexible hands-on solutions to incomplete data problems. The course discusses principles of missing data theory, outlines a step-by-step approach toward creating high quality imputations, and provides guidelines on how to report the results. The course will use the authors’ MICE package in R to illustrate practical solutions to real data problems. The concepts and applications of the illustrated methodology would be equally applicable to other programming languages.

The course materials will follow the book “Flexible Imputation of Missing Data” by Stef van Buuren ( 2nd edition, Chapman & Hall, 2018) as well as a collection of papers and vignettes by the course team. The book can be read online for free at https://stefvanbuuren.name/fimd/.

Format of the course We iterate short lectures with hands-on practical sessions and plenary discussion of the practicals. This ensures that we form an interactive group of participants that learns the theory and practice of multiple imputation in bite-size blocks. Each block builds up to the next one. We invite participants to share their own experience and challenges during these blocks so that we can foster a collaborative learning environment.

Prerequisites Participants should have a basic knowledge of scripting and programming in R. Participants who have limited experience with R need to have followed a relevant R course beforehand, such as

Winter School Introduction to R (S002) Summer School Data Science: Statistical Programming in R (S24) or any similar level course elsewhere.

The theory and practice discussed in this course requires that participants are familiar with basic statistical concepts and techniques, such as linear modeling, prediction, least squares estimation and hypothesis testing. Participants are requested to bring their own laptop for lab meetings.

Data Science specialisation This course can be taken separately, but is also part of a series of seven courses in the Summer School Data Science specialisation taught by UU’s department of Methodology & Statistics:

Data Science: Programming with Python (Course code S17, 06-10 July 2026) Data Science: Statistical Programming with R (Course code S24, 06-10 July 2026) Data Science: Network Science (Course code S37, 13-17 July 2026) Data Science: Applied Text Mining and Natural Language Processing (Course code S42, 13-17 July 2026) Data Science: Introduction to Machine Learning and Data Analysis in R (Course code S31, 20 – 24 July 2026) Data Science: Solving Missing Data Problems in R (This course) Data Science: Machine Learning with Python (Course code S70, 2027) Upon completing, within five years, three out of seven courses in the Summer School Data Science specialisation), participants can obtain a certificate. Please click here for more information about the full specialization.

To course page

Course Page