1,115
views
1
recommends
+1 Recommend
2 collections
    3
    shares

      Submit your manuscript to the new open access journal Drug Repurposing. Open for research articles, reviews, discussions, case studies, negative results across the whole spectrum of drug repurposing.

      No article processing charges.

      scite_
       
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Machine Learning and Artificial Intelligence in Drug Repurposing—Challenges and Perspectives

      Published
      review-article
      Bookmark

            Abstract

            Artificial intelligence (AI) and machine learning (ML) techniques play an increasingly crucial role in the field of drug repurposing. As the number of computational tools grows, it is essential to not only understand and carefully select the method itself, but also consider the input data used for building predictive models. This review aims to take a dive into current computational methods that leverage AI and ML to drive and accelerate compound and drug target selection, in addition to addressing the existing challenges and providing perspectives. While there is no doubt that AI- and ML-based tools are transforming traditional approaches, especially with recent advancements in graph-based methods, they present novel challenges that require the human eye and expert intervention. The growing complexity of OMICs data further emphasizes the importance of data standardization and quality.

            Main article text

            INTRODUCTION

            Over the past decade, artificial intelligence (AI) and machine learning (ML, a subfield of AI) tools have revolutionized various domains, from creating smart enhanced content in education 1 to strengthening e-mail security by improved spam detection 2 and even helping farmers optimize their irrigation methods. 3

            In the healthcare sector, AI has made substantial contributions throughout academia and industry. Applications range from drug discovery and development 4, 5 to clinical trial optimization 6, 7 and drug manufacturing. 8 In 2021, the Food and Drug Administration (FDA) reported over 100 submissions of drug and biological applications using AI and ML components. 9 In 2023, we saw the first fully AI-generated drug reaching phase II clinical trials in humans, 10 potentially leading the way to a whole new generation of drugs.

            While AI can help speed up the drug development pipeline in many ways, bringing new drugs to the market remains costly, time-consuming, and is prone to a high failure rate. 11, 12 Repurposing existing drugs for new therapeutic indications aims to reduce the cost and time of development, additionally seeing a much lower rate of clinical trial failure, especially at the early stages of the clinical study (while phase III of clinical trials remains essentially the same in terms of time and costs). Drug repurposing thus has the potential to dramatically decrease the time safe therapies may take to reach patients in need of a treatment. 13

            At the level of accurate compound or target selection, drug repurposing is empowered both by the continuous development and improvement of high-throughput multi-omics technologies and the quick advances in computational methods. 14 More recently, AI/ML applications have played a crucial role in the way drugs can be repurposed. AI-based tools leveraging deep learning and neural networks are particularly promising to, for instance, screen compounds, predict new drug–target combinations, and assess adverse drug reactions, which can ultimately speed up the selection of existing targets and drugs for new applications. 1517

            Although AI-based methods offer new and exciting opportunities in the field of drug repurposing, and drug development in general, they come with new and unique challenges.

            In this review, we explore different AI/ML-based methods and their approach to complex problems related to drug repurposing, and we also highlight current obstacles, challenges, and perspectives within this field.

            MACHINE LEARNING-BASED METHODS TO REPURPOSE DRUGS

            Ziaurrehman Tanoli
            Background and Significance of Machine Learning in Drug Repurposing

            Drug repurposing can be executed through either target-based or phenotypic-based methodologies. Nonetheless, both approaches necessitate experimental validation in clinical settings, incurring substantial financial investments.

            ChEMBL V-33 18 encompasses bioactivity data for 3492 approved small molecules, elucidating their interactions with specific targets. The Open Targets platform 19 stands out as one of the most comprehensive resources, cataloging associations between genes and over 21,000 diseases and symptoms. Given the vast number of potential drug–disease pairings—potentially reaching millions—conducting experimental tests on animals or in clinical trials becomes a formidable and costly challenge. 20 Consequently, the application of ML-based prediction methods offers an alternative, enabling the timely identification of prioritized drug–disease associations that can subsequently undergo validation in clinical trials.

            Existing Tools and Machine Learning-Based Methods for Drug Repurposing

            The drug discovery paradigm comprises several integral phases, including hit identification, assessment of drug safety, dose optimization, and drug repurposing, the latter focusing on the identification of new disease associations for already approved drugs. This discussion will concentrate specifically on tools and methodologies pertinent to drug repurposing.

            In recent years, the field of drug repurposing has witnessed the development of numerous computational resources dedicated to supporting these endeavors. Notable examples include DrugRepo, 21 Drug Repurposing Hub, 22 repoDB, 23 and RepurposeDB, 24 along with an array of other web-based databases, collectively contributing directly or indirectly to the drug repurposing landscape. 25

            In tandem with web-based resources, various prediction methods for drug repurposing have been introduced. PREDICT, 26 a widely utilized computational approach, leverages drug–drug and disease–disease networks, specifically focusing on 593 drugs and 313 diseases. Another innovative study presented a bipartite graph-based methodology aimed at uncovering novel drug indications by exploring their relationships with similar drugs. 27 The CMAF method employs matrix factorization to predict drug–disease associations based on the networks of drug and disease similarities. 28

            It is crucial to note that these methods exhibit limitations, primarily in terms of coverage, as they are restricted to a limited set of drugs and disease indication pairs. Notably, their predictive capacity for disease associations related to drug combinations remains insufficient. Furthermore, the absence of interactive web applications for these methodologies restricts their practical utility and broader accessibility within the scientific community.

            RepurposeDrugs: A Novel Machine Learning-Based Algorithm to Repurpose Drugs and Combinations for Hundreds of the Diseases

            To overcome existing limitations and enhance drug repurposing capabilities for researchers lacking programming skills, we introduce RepurposeDrugs ( https://repurposedrugs.org/), an ML-based web portal offering a versatile approach to uncover novel relationships between drugs and diseases, encompassing both single and combination therapies. Our dataset for drug–disease associations underwent meticulous curation from the clinical trials database ( https://clinicaltrials.gov/). These associations were categorized as approved (positive class) or failed (negative class) based on the reported status of each clinical trial. Failed drug–indication pairs denote instances where at least two trials were terminated or withdrawn for a disease indication different from the drug’s original approval.

            We fine-tuned two XGBoost-based regression models, one for single drugs and another for combinations. The positive dataset for the single-drug XGBoost model comprised 382 approved drugs and 190 diseases, while the negative dataset contained 409 drugs and 175 diseases. Similarly, for the drug combination model, the positive dataset included 65 approved combinations for 55 diseases, and the negative dataset comprised 62 drug combinations across 39 diseases.

            These models were trained using eight distinct descriptor sets, encompassing two-dimensional (2D), three-dimensional (3D), and graph neural network (GNN)-based fingerprints of drugs, in addition to Lipinski’s rule of five (RO5) descriptors. Various descriptor types were incorporated to enhance the accuracy of the prediction model. The 2D fingerprints encoded structural information with descriptors such as ECFP4 (1024 bits) and ECFP6 (1024 bits), 29 MACCS (166 bits), 30 Klekota-Roth (4860 bits), 31 PubChem (881 bits), 32 and E-State (79 bits). 33 The 3D fingerprints (E3FP) provided spatial arrangement insights with a length of 4096 bits, 34 while GNN-based fingerprints (3DInfoMax) 35 captured molecular graph structures (256 bits). The fingerprint for each drug combination was determined through a logical OR operation on each bit of the individual drug fingerprints, effectively capturing features present in either drug of the combination.

            Optimal XGBoost parameters were determined via Bayesian optimization and 10-fold cross-validation, minimizing overfitting while maximizing performance, with root mean square error (RMSE) as the primary performance metric. Note that cross-validation in this context refers to out-of-fold predictions, which are obtained by aggregating the predictions made on each validation set during the 10-fold cross-validation process. In cross-validation, the XGBoost prediction algorithm within RepurposeDrugs demonstrated a significant correlation of 0.75 for single drug–disease associations and 0.56 for drug combinations. Despite the lower correlation for drug combinations, attributed in part to the limited dataset size, future enhancements with additional trial data are anticipated to bolster the model’s predictive capabilities.

            Moreover, RepurposeDrugs incorporates a conformal prediction module to filter out low-confidence predictions by setting a default confidence threshold of 0.8, focusing on the most promising drug–disease associations.

            We have also validated RepurposeDrugs for predicting outcomes of phase III, phase II, and phase I trials. Information on these trials, which successfully completed various drug–disease associations, was extracted from https://clinicaltrials.gov/ For phase I compounds, RepurposeDrugs had a predicted approval likelihood of 29%, reflecting the exploratory nature of this phase. RepurposeDrugs’s predicted approval likelihood increased to 38% for phase II trials, aligning with the focus on efficacy. Most notably, phase III trials showed a significant increase to a 63% mean predicted likelihood, corresponding to their advanced stage and proximity to market approval. These results demonstrate our model’s capability to estimate the likelihood of drug approval at different stages of clinical trials.

            Anticipated to be a valuable resource for the drug repurposing community, RepurposeDrugs is poised to facilitate innovative repurposing strategies and leverage existing data for predictive analysis. It could significantly reduce the costs associated with running clinical trials, by predicting the likelihood of drug approval for specific indications in advance, so that clinical trial priorities could be established.

            DRUG TARGET PREDICTION AND REPOSITIONING USING MACHINE LEARNING INTEGRATION OF NETWORK-BASED, PATHWAY ENRICHMENT-BASED, AND DISEASE ENRICHMENT-BASED ANALYSES

            Ezequiel Anokian

            ML and AI methods have gained popularity for their ability to significantly expedite the drug development process and mitigate risks.

            The Clarivate ML-driven pipeline for drug repurposing uses an input as simple as a disease term or disease-related genes, to detect novel connections between diseases and drugs. This largely automated system employs a compendium of algorithms: molecular network analysis, molecular pathway assessment, and disease similarity. The overall ML pipeline aims to assess whether a gene plays a role in the disease of interest, producing scores that are eventually integrated using ML (partial least square and recursive feature elimination 36 ) to generate a final ranking.

            There is a wide diversity of ML-based approaches in drug repurposing based on how they work and the input data they require. 13, 37 For example, ligand similarity and molecular docking methods operate with target and ligand 3D structures to predict features such as binding affinity (BA) and stability. Other strategies for identifying suitable drug candidates rely on the “reverse transcriptional signature” principle, where compounds showcasing gene expression profiles opposite to that of the disease of interest (i.e. can revert expression profile back to normal levels) are prioritized.

            The success of AI-based drug repurposing tools is subject to the quality and consistency of the data they utilize. Often, these models have been trained on data from public sources, such as DrugBank, PubChem, BindingDB, ICD9/10, Mesh, UniProtKB, and PDB. 38

            Nevertheless, computational drug repurposing comes with significant challenges. For example, identification of suitable candidates demands a profound understanding of the molecular mechanisms underpinning diseases. Furthermore, the absence of a clear regulatory framework tailored to the distinctive challenges of drug repurposing has hampered its widespread adoption. 13

            Among the main advantages, AI-based drug repurposing plays a significant role in the burgeoning field of personalized medicine, aiming to accommodate the diversity in disease manifestations, genetic variations, and treatment responses among patients. This not only expedites the development of targeted therapies but also empowers healthcare providers to select treatments with a higher probability of success, thereby minimizing trial-and-error approaches.

            Computational drug repurposing also aligns with the growing awareness of the environmental impact of pharmaceutical production. Repurposing existing drugs diminishes the need for extensive resource-intensive manufacturing processes, contributing to a more environmentally responsible pharmaceutical industry.

            Real-world data (RWD) have emerged as an alternative to traditional clinical trial data in recent years. 39 By encompassing this longitudinal information collected from electronic health records, insurance claims, and patient registries, RWD provide a comprehensive view of a drug’s performance in real-world scenarios. The integration of such large data into the model training and/or fine-tuning not only expedites the research process but also captures the nuances of patient heterogeneity and comorbidities often excluded from controlled clinical trials.

            In conclusion, we are witnessing the ever-evolving field of AI-driven drug repurposing and how it is accelerating the overall drug discovery process at an unprecedented pace. The evolving landscape also demands new business models and a collaborative strategy involving academia, industry, and regulatory bodies to establish standardized protocols for evaluating repurposed drugs, ensuring both efficacy and safety.

            By overcoming the obstacles, fostering collaboration, and adapting regulatory frameworks, AI-based drug repurposing could revolutionize the landscape of medicine, delivering timely and cost-effective solutions to pressing healthcare challenges.

            ENHANCING DRUG REPURPOSING THROUGH GRAPH NEURAL NETWORKS AND LINK PREDICTION

            Lucía Prieto Santamaría

            The increasing availability of biomedical heterogeneous data obtained from improved multi-omics techniques has allowed science to redefine the way we conceive diseases, becoming more holistic and interrelated entities. 40 More importantly, it has opened up new horizons in the possible ways of searching for treatments. In this context, we find the process of drug repurposing, which aims to find new uses for already existing drugs.

            There are multiple ways of advancing drug repurposing. Some of the most relevant strategies derive from computational approaches. 38 Particularly, representing this heterogeneous information in the form of graphs (also called networks) allows the exploitation and description of data structured in nodes connected by edges (also called links) in a very expressive manner (meaning they can include the implicit semantics underpinning a specific given structure). 41 In biomedical problems, nodes can represent diseases, symptoms, genes, proteins, genetic variants, non-coding RNAs, biological pathways, drugs, and so on, and the links may depict the connections between these different types of nodes. The advantages of arranging the information in this network structure are plenty, and its use toward tackling medicine challenges has been framed under the so-called network medicine. 42, 43

            One of the major technical advancements over learning with graph structures lies in the new AI-based field of deep learning on graphs and its major formalism: GNNs. Contrasting with the process of developing sophisticated feature extractors (namely, feature engineering), deep learning addresses the learning problem by jointly learning representations of the raw input data and a predictive model for the task under study. This is usually approached by stacking multiple layers of differentiable non-linear transformations and training such model in an end-to-end fashion using gradient descent techniques. These resulting models are often called deep neural networks. Traditionally, deep learning has been employed to tackle Euclidean-shifted data problems. However, graph-framed data have an underlying structure that follows a non-Euclidean space. Extending deep neural models to these non-Euclidean domains, and specifically to graphs, 44 has been and is an emerging research area. 45, 46 The GNN 47, 48 formalism is a general framework for defining deep neural networks on graph data. The key idea is to generate representations (also called embeddings) for nodes or edges that depend on both the structure of the graph and the feature information that might be related to them.

            Regarding drug repurposing, one of the most straightforward perspectives to use both network-structured information and GNNs is to target the link prediction task with regard to the disease–drug link type, as represented in Figure 1 . That is, using GNN pipelines to embed the information in the network (representing each node as a vector of features) to then decode these embedding vectors optimizing the prediction of new links of the type disease–drug. Following this idea, various models have been developed and presented previously. 15, 49, 50 In them, the integration of heterogeneous biomedical data organized as a graph has demonstrated its efficacy when combined with GNNs to address the drug repurposing challenge. This approach has also been employed to predict treatments for diseases with unclear mechanisms. 51

            Figure 1.

            Representation of the disease–drug link prediction task in the context of drug repurposing. Each node depicts a biomedical entity: pink nodes represent drugs and gray nodes represent diseases. The rest of the node types include other relevant biomedical entities, such as symptoms and genes. The link prediction task in this case targets the disease–drug edge type. By means of graph neural networks (GNNs), the structural information of each node is first embedded in a vector representation, and then a decoding function prioritizes the potential disease–drug links.

            Nonetheless, there are several limitations in the field and challenges to be addressed. The predictions made by GNN-based models, as any in silico repurposing model, need the input and interpretations of experts, as well as experimental confirmation. They act as a means of pinpointing repurposing opportunities that may warrant further examination. The complexity of biomedical data and the need to represent it in a meaningful way pose significant and opened research lines. Integrating diverse information sources and refining models to handle this complexity are ongoing tasks. Additionally, the need for large and diverse datasets for training robust models, as well as addressing issues related to data quality and bias when considering patient-centric information, have to be emphasized.

            In general, the application of AI in drug repurposing offers a great potential to reshape traditional approaches, leading to faster identification of candidates for repurposing. Specifically, employing sophisticated GNN architectures to integrate both the biomedical network structure and the particular node features can be key to having multimodal systems that bring about better opportunities. In this sense, future research paths will involve refining these models to incorporate even more diverse and heterogeneous data sources. This will eventually enhance explainability and interpretability. In this direction, complementing AI-based pipelines with more classical data-driven methodologies (as the ones provided by network biology in general, and network medicine in particular) could lead to better and more understandable predictions.

            DRUG RESPONSE PREDICTION WITH MACHINE LEARNING: A CRITICAL ASSESSMENT OF CURRENT APPROACHES

            Judith Bernett, Markus List

            Park divides computational repurposing into knowledge-based, phenotype-based, and signature-based approaches. 52 In knowledge-based approaches, information about drugs, their potential targets, and disease–gene or disease–protein associations are integrated into databases or knowledge graphs such as NeDRex, 53 from where new indications for existing drugs can be predicted. Phenotype-based repurposing aims to benefit from electronic health records through natural language processing. Finally, signature-based methods rely on available molecular profiling or OMICs data. Here, two diverse challenges can be addressed. First, OMICs data can help pinpoint disease mechanisms via implicating disease genes or proteins or, on a higher level, pathways and disease modules harboring potential drug targets. Another important direction is drug response prediction, an active research field fueled by the availability of large-scale drug response screens coupled with one or several layers of OMICs data. Such datasets represent a treasure trove for ML enthusiasts. They are easily accessible through projects such as Connectivity Map, 54 NCI-60, 55 GDSC, 56 or CCLE. 57 The goal here is not only to reveal drug–target relationships but to understand and predict how drug response depends on the cellular context.

            Currently, the notion of cellular context is typically limited to cell lines and sometimes extends to patient-derived xenografts. 58 Still, the general principle can be extended to personalized medicine, where physicians may one day tailor treatments to individual patients to improve therapeutic outcomes. Due to easy access to these data, many ML methods have emerged for predicting drug responses. 59 However, this burgeoning field faces substantial challenges that remain inadequately addressed.

            Most of the current methods engage in monotherapy drug response prediction (reviewed in detail in the work by Firoozbakht et al. 60 ), where the field follows the general trend where models become increasingly more complex. The hope is that deep learning strategies will learn latent data structures and understand complex non-linear associations that classical ML methods fail to identify. It should be noted though, that the datasets commonly used here have not grown much in recent years and often do not have the necessary size to train models with millions of parameters, making a breakthrough as in the case of AlphaFold 61 unlikely. Surprisingly, we still see that the prediction accuracy, typically measured through the correlation of predicted and observed drug response on a hold-out dataset, has continuously improved over recent years and has now reached almost optimal performance.

            This leads to the impression that drug response prediction is a solved problem, but this is by far not the case. First, most published methods lack reproducible or even available implementations, rendering it difficult to objectively assess their performance. Second, existing methods rarely compare themselves against simpler baseline models, making the superiority of highly complex models questionable. Third, it can be shown that even a simple mean predictor, which just reports mean drug response in the training data, can achieve surprisingly good performance, questioning whether current methods actually learn cell-line-specific drug response as advertised and whether current performance measures are adequate. 62 Indeed, it can be shown that the current practice of reporting the correlation between predicted and observed drug response across all drugs is subject to Simpson’s paradox ( Figure 2 ) due to drug-specific average responses. Fourth, current data-splitting strategies allow models to overfit and to learn shortcuts. The standard practice of random splits, where drugs and cell lines can be found across training, test, and validation data, encourages data leakage. 63 Fifth, frequently used measures to quantify and summarize drug response, such as IC 50, EC 50, and Area Under the Curve (AUC) values, do not adequately capture the drug response dynamic. While the field has started to develop mitigation strategies such as the recently published CurveCurator, 64 existing methods do not generalize to unseen drug response data in the same cell lines, demonstrating that true generalization of findings across datasets and translation to patient care are far from reality.

            Figure 2.

            Visualization of the issue of evaluating the model performance across all drugs. While it seems like overall (black line), predicted IC 50s correlate almost perfectly with actual IC 50s, the evaluation per drug reveals that predictions concentrate mostly around the mean responses. The apparent correlation is removed (Simpson’s paradox).

            Taken together, these issues create over-optimism about our current abilities to predict drug response and reveal a huge research gap that still needs to be closed before drug response prediction can become a routine aspect of patient care. Toward this goal, we advocate for better data standardization, more rigorous method evaluation, and benchmarking. Furthermore, most of the published drug response prediction methods rely exclusively on transcriptomics measurements, neglecting the multimodal nature of drug response. We envision that additional OMICs data, such as proteomics and phospho-proteomics data, could offer unique insights into the drug targets’ response and allow for building more robust methods. Furthermore, to equip deep learning models with the sample number necessary to learn complex associations, single-cell screening techniques such as Perturb-seq 65 are essential. Finally, future experimental and computational work should focus on predicting the response to drug combinations since this allows for synergistic effects at lower dosages and with fewer side effects. This is already the standard in cancer patient care but not reflected in computational modeling. A prerequisite for this is to improve the robustness and generalizability of existing approaches. These concerted efforts pave the way for more accurate and clinically relevant drug response predictions, ultimately enhancing patient care and treatment outcomes.

            DATA-DRIVEN INDICATION DISCOVERY—EXPANDING THE POTENTIAL OF MEDICINES

            Adrian Freeman

            Within the pharmaceutical industry, we routinely divide ourselves into therapeutic areas and focus on projects that are related to a specific mechanism or target. As part of the drug discovery process a target is then used to drive the generation of a molecule, with many modalities now available as options to modulate the disease. This approach has not dramatically changed for several years and although the industry has become more efficient at driving molecules through the pipeline and success rates have increased, 66 there is still the requirement for delivering novel mechanisms of action in an expedited way for patient benefit.

            The ever-increasing and rapid development of OMICS technologies is enabling data-driven-based approaches to be implemented within the pharma industry at multiple stages of the drug discovery pipeline to identify novel ways to progress ideas at multiple stages. The aim of our approach is to take a disease-agnostic view with a desire to remove the individual disease-centric view and start to consider mechanistic modules that form part of many diseases. This leads to considerations for gene/transcriptional network analysis across diseases, which may lead to combinatorial or temporal use of drugs within diseases.

            An example of one approach that is being taken within different groups is the use of modified connectivity mapping approaches, 67, 68 where the transcriptomic signature of each compound is compared computationally with transcriptomic signatures of human diseases. As an example, differential gene expression signatures for compounds are generated by performing RNA sequencing on cell lines after exposure to two concentrations of each of the compounds. This analysis is completed in a blinded manner to the compound identity or chemistry and the generated genome-wide pattern of mRNA changes in cell-matched compound versus vehicle-treated samples. A disease transcriptomic library is then used at the differential gene expression data level. To identify novel clinical indications for each compound, a connectivity score is calculated by comparing each disease signature to each compound signature. The connectivity score aims to summarize the transcriptomic relationship between each compound and disease, such that a strongly negative score indicates that the compound will induce transcriptomic changes that may revert or “normalize” the disease signature.

            The output from data-driven approaches are routinely employed and have had many successes within repurposing, but the landscape is now changing. Because of further enhancements with graph- and network-based approaches, the move from repositioning molecules to positioning using these methods is now more often used across the industrial and academic settings. Understanding the output from these models is crucial and validation is critical. There are core components that need to be considered e.g. data quality, code validation, and comprehensive metadata, among many others. The network analysis of many datasets has highlighted that the direct gene-to-protein relationship is not always the key driver of the disease phenotype and proven by lack of efficacy for certain medicines during clinical trials. The data-driven/network approaches are bringing forward the idea of interacting modules within the cell machinery, ranging from transcriptional regulation through to protein–protein interactions within the cytoplasm of cells. In addition, many of these modules interact and drive different phenotypes in different diseases. In the future, this should allow for therapies to be considered across diseases in an expedited manner.

            CONCLUSION

            In the field of drug discovery, AI- and ML-based methods became transformative tools for accelerating compound and drug target selection, particularly in the context of drug repurposing. Their impact is particularly relevant for precision medicine, where tailored treatments are essential, and for repurposing drugs to address rare diseases.

            Network-structured information and GNNs provide a powerful framework for predicting new disease–drugs links, by integrating and organizing as a graph highly heterogeneous biomedical data and complex relationships.

            Leveraging RWD allows for real-time assessment of a drug’s performance, which benefits patient stratification strategies, and improves patient recruitment and accurate monitoring. The increasing amount and quality of RWD can dramatically contribute to improving clinical trial success rates.

            Incorporating additional layers of OMICs data, such as proteomics, phospho-proteomics, or single-cell technologies can offer the opportunity to build more comprehensive and robust predictive models.

            Challenges persist in ensuring data coverage and quality, particularly needed for building and training high-quality and robust statistical models.

            As the amount, complexity, and heterogeneity of input data keeps growing, there is a constant need to review and assess the operating procedures. Data security and protection also remain critical concerns.

            While we increasingly rely on computational models to predict drug repurposing opportunities, there is constant need for input data selection and results interpretation by experts, combined with experimental validation, complementing the ever-increasing power of algorithms.

            The rapid development of new methods underscores the importance of data quality, standardization, and rigorous and extensive benchmarking.

            Despite the evolution of drug response prediction techniques, a gap persists for setting up standardized methods and accurate evaluation criteria.

            Beyond data-centric challenges, ethical and regulatory considerations and challenges associated with AI- and ML-based tools remain critical and demand careful attention.

            As this field evolves, we anticipate further breakthroughs and a broader scope of AI adoption in healthcare, constantly reshaping, rethinking, and reevaluating traditional approaches.

            ACKNOWLEDGEMENTS

            The authors were invited to write the mini review as a follow up of session 6 of the RexPo23 conference. We thank the organizers and the REPO4EU consortium for this opportunity.

            Lucía Prieto-Santamaría’s work is funded through the project “Data-driven drug repositioning applying graph neural networks (3DR-GNN)”, which is being developed under grant “PID2021-122659OB-I00” from the Spanish Ministerio de Ciencia e Innovación.

            Markus List and Judith Bernett’s project is funded by the European Union under grant agreement No. 101057619. Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or European Health and Digital Executive Agency (HADEA). Neither the European Union nor the granting authority can be held responsible for them. This work was also partly supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract No. 22.00115. This work was supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the CompLS funding concept (031L0305A).

            REFERENCES

            1. Yuskovych-Zhukovska V, Poplavska T, Diachenko O, Mishenina T, Topolnyk Y, Gurevych R. Application of artificial intelligence in education. Problems and opportunities for sustainable development. Broad Res Artif Intell Neurosci. 2022. Vol. 13:339–356. [Cross Ref]

            2. Jáñez-Martino F, Alaiz-Rodríguez R, González-Castro V, Fidalgo E, Alegre E. A review of spam email detection: Analysis of spammer strategies and the dataset shift problem. Artif Intell Rev. 2023. Vol. 56:1145–1173. [Cross Ref]

            3. Talaviya T, Shah D, Patel N, Yagnik H, Shah M. Implementation of artificial intelligence in agriculture for optimisation of irrigation and application of pesticides and herbicides. Artif Intell Agric. 2020. Vol. 4:58–73. [Cross Ref]

            4. Paul D, Sanap G, Shenoy S, Kalyane D, Kalia K, Tekade RK. Artificial intelligence in drug discovery and development. Drug Discov Today. 2021. Vol. 26:80–93. [Cross Ref]

            5. Blanco-González A, Cabezón A, Seco-González A, et al.. The role of AI in drug discovery: Challenges, opportunities, and strategies. Pharmaceuticals (Basel). 2023. Vol. 16:891. [Cross Ref]

            6. Harrer S, Shah P, Antony B, Hu J. Artificial intelligence for clinical trial design. Trends Pharmacol Sci. 2019. Vol. 40:577–591. [Cross Ref]

            7. Zhang B, Zhang L, Chen Q, Jin Z, Liu S, Zhang S. Harnessing artificial intelligence to improve clinical trial design. Commun Med (Lond). 2023. Vol. 3:191[Cross Ref]

            8. Vora LK, Gholap AD, Jetha K, Thakur RRS, Solanki HK, Chavda VP. Artificial intelligence in pharmaceutical technology and drug delivery design. Pharmaceutics. 2023. Vol. 15:1916. [Cross Ref]

            9. FDA. Artificial intelligence and machine learning (AI/ML) for drug development. 2024. https://www.fda.gov/science-research/science-and-research-special-topics/artificial-intelligence-and-machine-learning-aiml-drug-developmentAccessed 18 Macrh 2024

            10. Field H. The first fully A.I.-generated drug enters clinical trials in human patients. 2023. https://www.cnbc.com/2023/06/29/ai-generated-drug-begins-clinical-trials-in-human-patients.htmlAccessed 29 June 2024

            11. Morgan S, Grootendorst P, Lexchin J, Cunningham C, Greyson D. The cost of drug development: A systematic review. Health Policy. 2011. Vol. 100:4–17. [Cross Ref]

            12. Mullard A. 2018 FDA drug approvals. Nat Rev Drug Discov. 2019. Vol. 18:85–89. [Cross Ref]

            13. Pushpakom S, Iorio F, Eyers PA, et al.. Drug repurposing: Progress, challenges and recommendations. Nat Rev Drug Discov. 2019. Vol. 18:41–58. [Cross Ref]

            14. Cong Y, Endo T. Multi-omics and artificial intelligence-guided drug repositioning: Prospects, challenges, and lessons learned from COVID-19. OMICS. 2022. Vol. 26:361–371. [Cross Ref]

            15. Ayuso-Muñoz A, Prieto-Santamaría L, Ugarte-Carro E, Serrano E, Rodríguez-González A. Uncovering hidden therapeutic indications through drug repurposing with graph neural networks and heterogeneous data. Artif Intell Med. 2023. Vol. 145:102687. [Cross Ref]

            16. Issa NT, Stathias V, Schürer S, Dakshanamurthy S. Machine and deep learning approaches for cancer drug repurposing. Semin Cancer Biol. 2021. Vol. 68:132–142. [Cross Ref]

            17. Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P. Artificial intelligence to deep learning: Machine intelligence approach for drug discovery. Mol Divers. 2021. Vol. 25:1315–1360. [Cross Ref]

            18. Zdrazil B, Felix E, Hunter F, et al.. The ChEMBL database in 2023: A drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 2024. Vol. 52:D1180–D1192. [Cross Ref]

            19. Ochoa D, Hercules A, Carmona M, et al.. The next-generation Open Targets Platform: Reimagined, redesigned, rebuilt. Nucleic Acids Res. 2023. Vol. 51:D1353–D1359. [Cross Ref]

            20. Sadybekov AV, Katritch V. Computational approaches streamlining drug discovery. Nature. 2023. Vol. 616:673–685. [Cross Ref]

            21. Wang Y, Aldahdooh J, Hu Y, et al.. DrugRepo: A novel approach to repurposing drugs based on chemical and genomic features. Sci Rep. 2022. Vol. 12:21116[Cross Ref]

            22. Corsello SM, Bittker JA, Liu Z, et al.. The Drug Repurposing Hub: A next-generation drug library and information resource. Nat Med. 2017. Vol. 23:405–408. [Cross Ref]

            23. Brown AS, Patel CJ. A standard database for drug repositioning. Sci Data. 2017. Vol. 4:170029. [Cross Ref]

            24. Shameer K, Glicksberg BS, Hodos R, et al.. Systematic analyses of drugs and disease indications in RepurposeDB reveal pharmacological, biological and epidemiological factors influencing drug repositioning. Brief Bioinform. 2018. Vol. 19:656–678. [Cross Ref]

            25. Tanoli Z, Seemab U, Scherer A, Wennerberg K, Tang J, Vähä-Koskela M. Exploration of databases and methods supporting drug repurposing: A comprehensive survey. Brief Bioinform. 2021. Vol. 22:1656–1678. [Cross Ref]

            26. Gottlieb A, Stein GY, Ruppin E, Sharan R. PREDICT: A method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol. 2011. Vol. 7:496[Cross Ref]

            27. Li J, Lu Z. A new method for computational drug repositioning using drug pairwise similarity. Proceedings (IEEE Int Conf Bioinformatics Biomed). 2012. Vol. 2012:1–4. [Cross Ref]

            28. Wang J, Wang W, Yan C, Luo J, Zhang G. Predicting drug-disease association based on ensemble strategy. Front Genet. 2021. Vol. 12:666575. [Cross Ref]

            29. Le T, Winter R, Noé F, Clevert D-A. Neuraldecipher - reverse-engineering extended-connectivity fingerprints (ECFPs) to their molecular structures. Chem Sci. 2020. Vol. 11:10378–10389. [Cross Ref]

            30. Durant JL, Leland BA, Henry DR, Nourse JG. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci. 2002. Vol. 42:1273–1280. [Cross Ref]

            31. Yang J, Cai Y, Zhao K, Xie H, Chen X. Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov Today. 2022. Vol. 27:103356. [Cross Ref]

            32. Fernández-de Gortari E, García-Jacas CR, Martinez-Mayorga K, Medina-Franco JL. Database fingerprint (DFP): An approach to represent molecular databases. J Cheminform. 2017. Vol. 9:9[Cross Ref]

            33. Hall LH, Kier LB. Electrotopological state indices for atom types: A novel combination of electronic, topological, and valence state information. J Chem Inf Comput Sci. 1995. Vol. 35:1039–1045. [Cross Ref]

            34. Axen SD, Huang X-P, Cáceres EL, Gendelev L, Roth BL, Keiser MJ. A simple representation of three-dimensional molecular structure. J Med Chem. 2017. Vol. 60:7393–7409. [Cross Ref]

            35. Stärk H, Beaini D, Corso G, et al.. 3D Infomax improves GNNs for molecular property prediction. arXiv:2110.04126. 2022. [Cross Ref]

            36. You W, Yang Z, Ji G. PLS-based recursive feature elimination for high-dimensional small sample. Knowl-Based Syst. 2014. Vol. 55:15–28. [Cross Ref]

            37. March-Vila E, Pinzi L, Sturm N, et al.. On the integration of in silico drug design methods for drug repurposing. Front Pharmacol. 2017. Vol. 8:298[Cross Ref]

            38. Luo H, Li M, Yang M, Wu F-X, Li Y, Wang J. Biomedical data and computational models for drug repositioning: A comprehensive review. Brief Bioinform. 2021. Vol. 22:1604–1619. [Cross Ref]

            39. Park K. The use of real-world data in drug repurposing. Transl Clin Pharmacol. 2021. Vol. 29:117–124. [Cross Ref]

            40. Menche J, Sharma A, Kitsak M, et al.. Uncovering disease-disease relationships through the incomplete human interactome. Science. 2015. Vol. 347:1257601. [Cross Ref]

            41. Hamilton WL. Graph representation learningSynthesis lectures on artificial intelligence and machine learning. Vol. Vol. 14:Cham: Springer. 2020. p. 1–159. [Cross Ref]

            42. Barabási A-L, Gulbahce N, Loscalzo J. Network medicine: A network-based approach to human disease. Nat Rev Genet. 2011. Vol. 12:56–68. [Cross Ref]

            43. Chan SY, Loscalzo J. The emerging paradigm of network medicine in the study of human disease. Circ Res. 2012. Vol. 111:359–374. [Cross Ref]

            44. Zhang Z, Cui P, Zhu W. Deep learning on graphs: A survey. IEEE Trans Knowl Data Eng. 2022. Vol. 34:249–270. [Cross Ref]

            45. Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P. Geometric deep learning: Going beyond Euclidean data. IEEE Signal Process Mag. 2017. Vol. 34:18–42. [Cross Ref]

            46. Bronstein MM, Bruna J, Cohen T, Veličković P. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv:2104.13478. 2021. cited 2021 Nov 4[Cross Ref]

            47. Gori M, Monfardini G, Scarselli F. A new model for learning in graph domainsProceedings of the 2005 IEEE International Joint Conference on Neural Networks, 2005; Vol. Vol. 2. Montreal, QC, Canada. IEEE. 2005. p. 729–734. [Cross Ref]

            48. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Transa Neural Netw. 2009. Vol. 20:61–80. [Cross Ref]

            49. Ayuso-Muñoz A, Prieto-Santamaría L, Álvarez-Pérez A, Otero-Carrasco B, Serrano E, Rodríguez-González A. Enhancing drug repurposing on graphs by integrating drug molecular structure as featureProceedings of the 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS); L’Aquila, Italy. IEEE. 2023. p. 192–197. [Cross Ref]

            50. Muñoz AA, Carro EU, Santamaría LP, et al.. REDIRECTION: Generating drug repurposing hypotheses using link prediction with DISNET dataProceedings of the 2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS); Shenzhen, China. IEEE. 2022. p. 7–12. [Cross Ref]

            51. Huang K, Chandak P, Wang Q, et al.. Zero-shot drug repurposing with geometric deep learning and clinician centered design. medRxiv. 2023. 2023.03.19.23287458. [Cross Ref]

            52. Park K. A review of computational drug repurposing. Transl Clin Pharmacol. 2019. Vol. 27:59–63. [Cross Ref]

            53. Sadegh S, Skelton J, Anastasi E, et al.. Network medicine for disease module identification and drug repurposing with the NeDRex platform. Nat Commun. 2021. Vol. 12:6848[Cross Ref]

            54. Lamb J, Crawford ED, Peck D, et al.. The Connectivity Map: Using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006. Vol. 313:1929–1935. [Cross Ref]

            55. Shoemaker RH. The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer. 2006. Vol. 6:813–823. [Cross Ref]

            56. Yang W, Soares J, Greninger P, et al.. Genomics of drug sensitivity in cancer (GDSC): A resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013. Vol. 41:D955–D961. [Cross Ref]

            57. Ghandi M, Huang FW, Jané-Valbuena J, et al.. Next-generation characterization of the cancer cell line encyclopedia. Nature. 2019. Vol. 569:503–508. [Cross Ref]

            58. Gao H, Korn JM, Ferretti S, et al.. High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat Med. 2015. Vol. 21:1318–1325. [Cross Ref]

            59. Adam G, Rampášek L, Safikhani Z, Smirnov P, Haibe-Kains B, Goldenberg A. Machine learning approaches to drug response prediction: Challenges and recent progress. NPJ Precis Oncol. 2020. Vol. 4:19[Cross Ref]

            60. Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform. 2022. Vol. 23:bbab408. [Cross Ref]

            61. Jumper J, Evans R, Pritzel A, et al.. Highly accurate protein structure prediction with AlphaFold. Nature. 2021. Vol. 596:583–589. [Cross Ref]

            62. Li Y, Hostallero DE, Emad A. Interpretable deep learning architectures for improving drug response prediction performance: Myth or reality? Bioinformatics. 2023. Vol. 39:btad390. [Cross Ref]

            63. Kapoor S, Narayanan A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns (N Y). 2023. Vol. 4:100804. [Cross Ref]

            64. Bayer FP, Gander M, Kuster B, The M. CurveCurator: A recalibrated F-statistic to assess, classify, and explore significance of dose-response curves. Nat Commun. 2023. Vol. 14:7902[Cross Ref]

            65. Van de Sande B, Lee JS, Mutasa-Gottgens E, et al.. Applications of single-cell RNA sequencing in drug discovery and development. Nat Rev Drug Discov. 2023. Vol. 22:496–520. [Cross Ref]

            66. Peek CJ, Glasgow RE, Stange KC, Klesges LM, Purcell EP, Kessler RS. The 5 R’s: An emerging bold standard for conducting relevant research in a changing world. Ann Fam Med. 2014. Vol. 12:447–455. [Cross Ref]

            67. Ahangari F, Becker C, Foster DG, et al.. Saracatinib, a selective Src kinase inhibitor, blocks fibrotic responses in preclinical models of pulmonary fibrosis. Am J Respir Crit Care Med. 2022. Vol. 206:1463–1479. [Cross Ref]

            68. Dudley JT, Sirota M, Shenoy M, et al.. Computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease. Sci Transl Med. 2011. Vol. 3:96ra76[Cross Ref]

            Author and article information

            Journal
            dr
            Drug Repurposing
            ScienceOpen (Berlin )
            2941-2528
            3 July 2024
            : 1
            : 1
            : e20240004
            Affiliations
            [1 ] Discovery and Translational Science (DTS), Clarivate Analytics, Barcelona, Spain;
            [2 ] Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany ( https://ror.org/02kkvpp62)
            [3 ] Discovery Sciences, Research and Early Development, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK;
            [4 ] Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Spain ( https://ror.org/03n6nwv02)
            [5 ] Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Spain ( https://ror.org/03n6nwv02)
            [6 ] Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland ( https://ror.org/040af2s02)
            [7 ] BioICAWtech, Helsinki, Finland;
            Author notes
            Author information
            https://orcid.org/0000-0003-0694-1867
            https://orcid.org/0000-0001-5812-8013
            https://orcid.org/0009-0007-0413-5657
            https://orcid.org/0000-0002-0941-4168
            https://orcid.org/0000-0003-1545-3515
            https://orcid.org/0000-0003-2435-9862
            https://orcid.org/0000-0001-5159-2518
            Article
            10.58647/DRUGREPO.24.1.0004
            5146b704-bb05-4085-8f08-0a1f6c53540e
            2024 The Author(s).

            This work has been published open access under Creative Commons Attribution License (CC BY) 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com.

            History
            : 12 March 2024
            : 28 May 2024
            Page count
            Figures: 2, References: 68, Pages: 9

            machine learning,neural networks,artificial intelligence,drug repurposing

            Comments

            Comment on this article