In recent years, high-throughput technologies have produced enormous amounts of biomedical data. However, biomedical data tend to be “wide” rather than “big”, i.e., they typically contain a huge number of features for relatively few samples. To increase robustness and predictive power of machine learning models trained on biomedical data, it is hence desirable to increase sample sizes by jointly analyzing data sets available at several sites (e.g., hospitals). However, biomedical data is often extremely sensitive and so pooling it raises important privacy concerns. Consequently, federated learning (FL) has emerged as a privacy-aware alternative to centralized analyses on pooled data. In FL, sensitive data remain at the local sites and only model parameters are shared with a global aggregator. In my talk, I will introduce the promises and challenges of using FL for biomedical applications, using genome-wide association studies (GWAS) as an example. On the one hand, I will present recently developed tools that enable privacy-aware, federated GWAS in a user-friendly fashion while reaching high accuracies and practical runtimes. On the other hand, I will use the GWAS example to draw attention to a limitation of FL: Although, if detected, remaining privacy risks in FL can often be mitigated via algorithmic or cryptographic techniques, there are no mathematical frameworks which allow to oversee all possible sources of data leakage at the time of deployment. This suggests that it is unlikely that the conflict between privacy preservation and gained utility of pooled analyses of biomedical data will be fully resolvable via purely technological means in the foreseeable future. Instead, it might be necessary to develop risk-aware patient consent models, where the remaining risks of state-of-the-art privacy-aware machine learning technologies are made transparent and privacy preservation is no longer treated as a sine qua non condition but rather as an important good to be weighed against others.
Nasirigerdeh Reza, Torkzadehmahani Reihaneh, Matschinske Julian, Frisch Tobias, List Markus, Späth Julian, Weiss Stefan, Völker Uwe, Pitkänen Esa, Heider Dominik, Wenke Nina Kerstin, Kaissis Georgios, Rueckert Daniel, Kacprowski Tim, Baumbach Jan. sPLINK: a hybrid federated tool as a robust alternative to meta-analysis in genome-wide association studies. Genome Biology. Vol. 23(1)2022. Springer Science and Business Media LLC. [Cross Ref]
Hartebrodt Anne, Nasirigerdeh Reza, Blumenthal David B., Rottger Richard. Federated Principal Component Analysis for Genome-Wide Association Studies. 2021 IEEE International Conference on Data Mining (ICDM). 2021. IEEE. [Cross Ref]
Hartebrodt Anne, Röttger Richard, Blumenthal David B.. Federated singular value decomposition for high dimensional data. 2022. arXiv. [Cross Ref]
Nasirigerdeh Reza, Torkzadehmahani Reihaneh, Baumbach Jan, Blumenthal David B.. On the Privacy of Federated Pipelines. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021. ACM. [Cross Ref]