National Health Data Science Sandbox for Training and Research
Danish health data, generated as part of medical research and routine care (e.g. in hospitals), holds great promise for improving health care through the application of modern data science methods. The introduction of novel molecular and digital technologies generates increasingly large amounts of highly diverse health data that require specialized skills to integrate and analyze at scale. Training and education must evolve alongside novel these technologies and there is thus an increased demand for research and training in health data science.
The person-sensitive nature of health data does, however, present a major obstacle for training and development of skills and applications in the field. It is simply difficult to learn how to work with data without having access to representative data in a training context.
The aim of this national collaborative project is therefore to build a shared national sandbox environment that will facilitate training and research. The sandbox will provide non-sensitive health data that resemble the real data, on a secure computational platform which has the required tools and facilities. The sandbox will be used to teach students realistic applications of data science via hand-on training and can furthermore be used by researchers for testing ideas and prototyping new methods.
The national sandbox will contain only non-sensitive data: public (anonymous) data as well as anonymized and simulated/synthetic data, which can be used without risk of leaking personal data. It will be set up such that it will be easy to move from the sandbox to an environment with real data provided that the necessary research permissions are obtained.
The new platform will be hosted on the “Computerome 2” supercomputer and be accessible free of charge for researchers and students at the Danish universities. It will be built and supported by a consortium of data scientists from the universities of Copenhagen, Aarhus, Aalborg, Southern Denmark and the Technical University of Denmark, with oversight by the university deans of the health faculties to ensure alignment with relevant national educational programs.
The new health data sandbox will strengthen research and training in health data science across Denmark, as well as promote Denmark’s international position in research, development, validation, implementation and education in digital healthcare technologies and computational sciences.
Professor Anders Krogh, University of Copenhagen, who is heading the infrastructure, presents the concept in a video. Watch it here (external link).