National Health Data Science Sandbox for Training and Research

Anders Krogh, University of Copenhagen
Grant amount: DKK 17,764,483

Danish health data, generated as part of medical research and routine care (e.g. in hospitals), holds great promise for improving health care through the application of modern data science methods. The introduction of novel molecular and digital technologies generates increasingly large amounts of highly diverse health data that require specialized skills to integrate and analyze at scale. Training and education must evolve alongside novel these technologies and there is thus an increased demand for research and training in health data science.

The person-sensitive nature of health data does, however, present a major obstacle for training and development of skills and applications in the field. It is simply difficult to learn how to work with data without having access to representative data in a training context.

The aim of this national collaborative project is therefore to build a shared national sandbox environment that will facilitate training and research. The sandbox will provide non-sensitive health data that resemble the real data, on a secure computational platform which has the required tools and facilities. The sandbox will be used to teach students realistic applications of data science via hand-on training and can furthermore be used by researchers for testing ideas and prototyping new methods.

The national sandbox will contain only non-sensitive data: public (anonymous) data as well as anonymized and simulated/synthetic data, which can be used without risk of leaking personal data. It will be set up such that it will be easy to move from the sandbox to an environment with real data provided that the necessary research permissions are obtained.

The new platform will be hosted on the “Computerome 2” supercomputer and be accessible free of charge for researchers and students at the Danish universities. It will be built and supported by a consortium of data scientists from the universities of Copenhagen, Aarhus, Aalborg, Southern Denmark and the Technical University of Denmark, with oversight by the university deans of the health faculties to ensure alignment with relevant national educational programs.

The new health data sandbox will strengthen research and training in health data science across Denmark, as well as promote Denmark’s international position in research, development, validation, implementation and education in digital healthcare technologies and computational sciences.

Professor Anders Krogh, University of Copenhagen, who is heading the infrastructure, presents the concept in a video. Watch it here (external link).

Project participants
Professor Anders Krogh
University of Copenhagen, Center for Health Data Science

Professor Søren Brunak
University of Copenhagen, NNF Center for Protein Research

Associate Professor Sisse R. Ostrowski
University of Copenhagen, Department of Clinical Medicine

Professor Ole Nørregaard Jensen
University of Southern Denmark, Department of Biochemistry and Molecular Biology

Professor Claudio Pica
University of Southern Denmark, Department of Mathematics and Computer Science

Professor Anders Børglum
Aarhus University, Department of Biomedicine

Professor Mikkel Heide Schierup
Aarhus University, Bioinformatics Research Centre

Professor Martin Bøgsted
Aalborg University, Department of Clinical Medicine

Director Peter Løngreen
Computerome, Technical University of Denmark