Synthetic Health and Research Data (SHARED)

Henning Langberg, University of Copenhagen
Grant amount: DKK 7,500,000

The purpose of the project is to develop and demonstrate a mathematical method for using original health data to generate synthetic health data. The synthetic data are created by running an original data set through a mathematical program that adds noise on the data set to ensure that the synthetic data cannot be attributed to specific individuals while maintaining a dispersion and context that makes them statistically valid. This enables data to be shared – without compromising data security.

Denmark and the other Nordic countries have some of the best and most complete health data in the world. These data have considerable potential to enable the healthcare sector to detect diseases early, improve diagnosis and create individually tailored treatment. However, this potential cannot be easily realised because of the great difficulty in sharing the compiled health data and thus using them for such purposes as research across areas and national borders.

There is good reason to restrict the sharing of health data, as they are basically personal and thus sensitive. However, the inability to share data poses a problem for the healthcare sector in finding new treatment options by analysing the large quantities of health data, for example, in international collaborations. If succesful, the SHARED method will give researchers in the health care sector better and faster access to and use of health data, allow researchers to perform systematic data exploration, develop new algorithms as well as accelerate the speed of new innovations and knowledge of multifactorial diseases.

Project participants
Henning Langberg, Professor
University of Copenhagen, Department of Public Health

Janna Saija Sareela, Professor
Helsinki University Central Hospital, Institute for Molecular Medicine Finland

Arho Virkki, Professor
University of Turku, Department of Medical Mathematics