Medical data safety via federated machine learning

In recent years, high-throughput technologies have produced enormous amounts of biomedical data. However, biomedical data tend to be “wide” rather than “big”, i.e., they typically contain a huge number of features for relatively few samples. To increase robustness and predictive power of machine learning models trained on biomedical data, it is hence desirable to increase sample sizes by jointly analyzing data sets available at several sites (e.g., hospitals). However, biomedical data is often extremely sensitive and so pooling it raises important privacy concerns. Consequently, federated learning (FL) has emerged as a privacy-aware alternative to centralized analyses on pooled data. In FL, sensitive data remain at the local sites and only model parameters are shared with a global aggregator. In my talk, I will introduce the promises and challenges of using FL for biomedical applications, using genome-wide association studies (GWAS) as an example. On the one hand, I will present recently developed tools that enable privacy-aware, federated GWAS in a user-friendly fashion while reaching high accuracies and practical runtimes. On the other hand, I will use the GWAS example to draw attention to a limitation of FL: Although, if detected, remaining privacy risks in FL can often be mitigated via algorithmic or cryptographic techniques, there are no mathematical frameworks which allow to oversee all possible sources of data leakage at the time of deployment. This suggests that it is unlikely that the conflict between privacy preservation and gained utility of pooled analyses of biomedical data will be fully resolvable via purely technological means in the foreseeable future. Instead, it might be necessary to develop risk-aware patient consent models, where the remaining risks of state-of-the-art privacy-aware machine learning technologies are made transparent and privacy preservation is no longer treated as a sine qua non condition but rather as an important good to be weighed against others.

Content

Author and article information

Conference

Publisher: ScienceOpen

Publication date (Electronic preprint): 3 August 2022

Affiliations

[1 ] University of Southern Denmark, Odense, Denmark

[2 ] Technical University of Munich, Garching, Germany

[3 ] University of Hamburg, Hamburg, Germany

[4 ] Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

[5 ] Technische Universität Braunschweig, Braunschweig, Germany,

Author notes

[* ]Email: david.b.blumenthal@ 123456fau.de .

Author information

David Benjamin Blumenthal https://orcid.org/0000-0001-8651-750X

Article

DOI: 10.14293/S2199-1006.1.SOR-.PPPY1JE1.v1

SO-VID: abd398b3-35e9-41e3-806a-690139bad367

License:

This work has been published open access under Creative Commons Attribution License CC BY 4.0 , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com .

Conference name: RExPO22

Conference location: Maastricht, Netherlands

Conference date: 2-3 September, 2022

History

Date received : 3 August 2022

Data availability: Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Keywords: Data safety in biomedicine,Genome-wide association studies,Federated learning

References

Nasirigerdeh Reza, Torkzadehmahani Reihaneh, Matschinske Julian, Frisch Tobias, List Markus, Späth Julian, Weiss Stefan, Völker Uwe, Pitkänen Esa, Heider Dominik, Wenke Nina Kerstin, Kaissis Georgios, Rueckert Daniel, Kacprowski Tim, Baumbach Jan. sPLINK: a hybrid federated tool as a robust alternative to meta-analysis in genome-wide association studies. Genome Biology. Vol. 23(1)2022. Springer Science and Business Media LLC. [Cross Ref]
Hartebrodt Anne, Nasirigerdeh Reza, Blumenthal David B., Rottger Richard. Federated Principal Component Analysis for Genome-Wide Association Studies. 2021 IEEE International Conference on Data Mining (ICDM). 2021. IEEE. [Cross Ref]
Hartebrodt Anne, Röttger Richard, Blumenthal David B.. Federated singular value decomposition for high dimensional data. 2022. arXiv. [Cross Ref]
Nasirigerdeh Reza, Torkzadehmahani Reihaneh, Baumbach Jan, Blumenthal David B.. On the Privacy of Federated Pipelines. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021. ACM. [Cross Ref]

Comments

Comment on this article

[1] Nasirigerdeh Reza, Torkzadehmahani Reihaneh, Matschinske Julian, Frisch Tobias, List Markus, Späth Julian, Weiss Stefan, Völker Uwe, Pitkänen Esa, Heider Dominik, Wenke Nina Kerstin, Kaissis Georgios, Rueckert Daniel, Kacprowski Tim, Baumbach Jan. sPLINK: a hybrid federated tool as a robust alternative to meta-analysis in genome-wide association studies. Genome Biology. Vol. 23(1)2022. Springer Science and Business Media LLC. [Cross Ref]

[2] Hartebrodt Anne, Nasirigerdeh Reza, Blumenthal David B., Rottger Richard. Federated Principal Component Analysis for Genome-Wide Association Studies. 2021 IEEE International Conference on Data Mining (ICDM). 2021. IEEE. [Cross Ref]

[3] Hartebrodt Anne, Röttger Richard, Blumenthal David B.. Federated singular value decomposition for high dimensional data. 2022. arXiv. [Cross Ref]

[4] Nasirigerdeh Reza, Torkzadehmahani Reihaneh, Baumbach Jan, Blumenthal David B.. On the Privacy of Federated Pipelines. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021. ACM. [Cross Ref]