Drug response prediction using molecular readouts (e.g., transcriptomics) is an active area of research due to its potentially pivotal role in personalized medicine. In theory, it would allow treating physicians to tailor recommendations to individuals. Machine learning has emerged as a promising method for predicting complex outcomes and thus has been used in drug response prediction. We have shown that state-of-the-art models suffer from various problems in the past. Here, we provide an update on this and aim to address whether drug response prediction using machine learning is currently possible and what issues remain to be solved to allow model development and generalization. For this, we cover various aspects of the machine learning life cycle, from available training data to generalization. Briefly, we show that currently available data shows substantial levels of heterogeneity, often enriched in specific cell lines and drugs. This questions the use of independent hold-out data for model performance assessment. To circumvent high noise levels and outliers in the training data – a result of high throughput screening efforts – we developed a foundational model using self-supervised learning that can be used to model drug response prediction with and embedding that is aware of the full concentration range rather than relying on simplified summary statistics. Last, we present our efforts to develop a ready-to-use pipeline for model training, testing, and evaluation that comes with several large curated and normalized training data and discuss whether the available data and complexity of the problem allows training drug response prediction models showcased on preliminary data where several dimensionality reduction methods were used to reduce the p>n challenge.
Lautenbacher Ludwig, Samaras Patroklos, Muller Julian, Grafberger Andreas, Shraideh Marwin, Rank Johannes, Fuchs Simon T, Schmidt Tobias K, The Matthew, Dallago Christian, Wittges Holger, Rost Burkhard, Krcmar Helmut, Kuster Bernhard, Wilhelm Mathias. ProteomicsDB: toward a FAIR open-source resource for life-science research. Nucleic Acids Research. Vol. 50(D1)2022. Oxford University Press (OUP). [Cross Ref]