The good, the bad, and the ugly: Drug response prediction using machine learning

Drug response prediction using molecular readouts (e.g., transcriptomics) is an active area of research due to its potentially pivotal role in personalized medicine. In theory, it would allow treating physicians to tailor recommendations to individuals. Machine learning has emerged as a promising method for predicting complex outcomes and thus has been used in drug response prediction. We have shown that state-of-the-art models suffer from various problems in the past. Here, we provide an update on this and aim to address whether drug response prediction using machine learning is currently possible and what issues remain to be solved to allow model development and generalization. For this, we cover various aspects of the machine learning life cycle, from available training data to generalization. Briefly, we show that currently available data shows substantial levels of heterogeneity, often enriched in specific cell lines and drugs. This questions the use of independent hold-out data for model performance assessment. To circumvent high noise levels and outliers in the training data – a result of high throughput screening efforts – we developed a foundational model using self-supervised learning that can be used to model drug response prediction with and embedding that is aware of the full concentration range rather than relying on simplified summary statistics. Last, we present our efforts to develop a ready-to-use pipeline for model training, testing, and evaluation that comes with several large curated and normalized training data and discuss whether the available data and complexity of the problem allows training drug response prediction models showcased on preliminary data where several dimensionality reduction methods were used to reduce the p>n challenge.

Content

Author and article information

Conference

Publisher: REPO4EU

Publication date (Electronic, pub): 8 May 2024

Affiliations

[1 ] Technical University of Munich ( https://ror.org/02kkvpp62)

Author information

Mario Picciani https://orcid.org/0000-0003-0428-1703

Judith Bernett https://orcid.org/0000-0001-5812-8013

Markus List https://orcid.org/0000-0002-0941-4168

Mathias Wilhelm https://orcid.org/0000-0002-9224-3258

Article

DOI: 10.58647/REXPO.24004

SO-VID: 7e09dbc2-8e8e-4099-8da2-0a15d209c03d

License:

Published under Creative Commons Attribution 4.0 International ( CC BY 4.0). Users are allowed to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material for any purpose, even commercially), as long as the authors and the publisher are explicitly identified and properly acknowledged as the original source.

Conference name: RExPO24

Conference number: 3

Conference location: Munich, Germany

Conference date: 3-5 July 2024

History

Product

REPO4EU

References

Lautenbacher Ludwig, Samaras Patroklos, Muller Julian, Grafberger Andreas, Shraideh Marwin, Rank Johannes, Fuchs Simon T, Schmidt Tobias K, The Matthew, Dallago Christian, Wittges Holger, Rost Burkhard, Krcmar Helmut, Kuster Bernhard, Wilhelm Mathias. ProteomicsDB: toward a FAIR open-source resource for life-science research. Nucleic Acids Research. Vol. 50(D1)2022. Oxford University Press (OUP). [Cross Ref]

Comments

Comment on this article

[1] Lautenbacher Ludwig, Samaras Patroklos, Muller Julian, Grafberger Andreas, Shraideh Marwin, Rank Johannes, Fuchs Simon T, Schmidt Tobias K, The Matthew, Dallago Christian, Wittges Holger, Rost Burkhard, Krcmar Helmut, Kuster Bernhard, Wilhelm Mathias. ProteomicsDB: toward a FAIR open-source resource for life-science research. Nucleic Acids Research. Vol. 50(D1)2022. Oxford University Press (OUP). [Cross Ref]