Abstract
The rising cost of drug discovery, coupled with a stagnation in the approval of novel treatments, highlights the urgent need for innovative strategies such as drug re-purposing. Pharmaceutical companies invest roughly 10-15 years and $2.6 billion to get a single FDA-approved drug to market. The COVID-19 pandemic further underscored the necessity of quickly identifying existing drugs with potential efficacy against a fast-spreading virus to curtail the pandemic. In this study, we perform a comparative analysis of several Graph Neural Networks (GNNs) and recommendation system models to address drug re-purposing. We construct an integrated graph that combines Protein-Protein Interaction networks, Drug-Target Protein graphs, Disease-Protein associations, and Drug-Disease links. We leverage a network learning paradigm implemented over this complex graph via both node-agnostic and heterogeneous graph techniques for link prediction in drug-disease pairs. We implement a Heterogeneous Graph Transformer (HGT) model that processes three node types (drugs, diseases, proteins) and four edge types. The HGT achieved an AUC-ROC of 0.985 and an F1-score of 0.90, demonstrating its efficacy in predicting drug re-purposing candidates. Additionally, we compared several node-agnostic GNN architectures, including Graph Convolutional Networks, Graph Attention Networks, GraphSAGE, and Graph Isomorphism Networks. All architectures performed comparably, with an AUC-ROC of around 0.98. However, when framing the drug re-purposing task as a recommendation problem using Matrix Factorization with side information, we observed a significant drop in performance, with the AUC-ROC falling to 0.82. This performance degradation highlights the importance of incorporating Protein-Protein Interaction networks in the modelling process, as matrix factorization fails to capture these complex network effects critical for drug re-purposing. Our models ranked 6,158 drugs based on their predicted efficacy in treating COVID-19, providing a valuable tool for prioritizing clinical trials and further research. Beyond COVID-19, such an integrated framework can allow us to uncover drug re-purposing prospects for any other novel diseases in a significantly more efficient and cost-effective way.