STEM 1



STEM is a class taught by Dr. Crowthers, and consists of two major half year projects. STEM 1 is an independent research project where each of us choose a topic we are interested in, identify knowledge gaps, and develop an experiment to advance that field.

A Novel Deep Learning Pipeline to Noninvasively Detect Gynecologic Diseases Using MiRNA Expression and In Silico Modeling

The goal of this project is to evaluate the role of miRNA as a non-invasive diagnostic tool by designing a machine-learning model that classifies microRNA expression from publicly available serum sample datasets to predict the likelihood of the following diseases: endometriosis, ovarian cancer, and breast cancer.

Abstract

Women’s health faces a growing crisis: 1 out of 10 women will experience chronic gynecological diseases, yet treatment can be delayed up to 4-11 years due to the social stigma, ambiguous symptoms, complex pathology, lack of noninvasive and accessible diagnostic tools, and cost-ineffective drug discovery pipelines. MicroRNAs (non-coding RNA segments) offer promise as noninvasive biomarkers as they regulate gene expression and are easily detectable in bodily fluids. This study aims to create a deep-learning pipeline that utilizes blood-based miRNA expression to facilitate early disease detection and identify therapeutic targets. Logistic Regression, Random Forest, Deep Neural Network, and Ensemble models were trained on more than 2000 publicly accessible patient miRNA samples to differentiate diseases, specifically ovarian cancer, breast cancer, endometriosis, and polycystic ovary syndrome. The models performed greater than 90% accuracy for binary classification and 85% for multi-disease classification. Feature extraction techniques, such as Shapley analysis, were applied to identify key biomarkers and perform pathway analysis and gene clustering to understand the unique and shared pathology behind malignant and cancerous female conditions. Notably, miR-let-7d emerged as a consistently dysregulated miRNA, impacting the RAS signaling pathway, while miR-1307-3p has been linked to chemoresistance in ovarian cancer and endometriosis malignancy. These distinguished miRNAs provide insights into early disease detection biomarkers and potential therapeutic targets. This pipeline can be applied in a clinical setting by extracting miRNAs from patient blood samples and ev Overall, this pipeline aims to improve health outcomes for female patients by creating avenues for accessible and noninvasive serum-based diagnostics, assessing treatment options, expediting personalized drug discovery pipelines, and reducing misdiagnosis and waiting periods.

Picture

Research Documents

Click here to view supporting documents

Problem Statement

Women's Health is a growing public health crisis as 1 out of 10 women will experience some type of gynecologic condition, yet it can take a prolonged time for conformative diagnoses due to the lack of adequate sources, social stigma, and more.

Project Goals

microRNAs hold promise as noninvasive and accessible screening candidates due to their role in gene expression and abundance in bodily fluids. The project goal is to Design a machine learning model to predict diseases based on miRNA expression levels and extract significant miRNA unique and shared among the target diseases to model miRNA mediated pathways and advance current understanding of disease pathology and etiology.

Background

Methods

Results

Picture

Figure 1: Displayed here are the multi-disease Deep Neural Network confusion matrix. The vertical axis represents the trye labels, and the horizontal axis represents the predicted labels. The major diagonal represents the correct predictions of the true and predicted labels.

Picture

Figure 2: A Receiving Operator Curve (ROC) illustrates the performance of binary classification models. In this case, each line demonstrates the accuracy levels of classifying the condition out of all the possible outcomes in that dataset.

Picture

Figure 3: The Shapley values of the significant feature miRNA in the Deep Neural Network and their contribution to the prediction of each class.

Picture

Figure 4: A bar graph showcasing the pathways of the genes targeted by the significant miRNAs identified by the machine learning models.

Analysis

Discussion + Conclusion

References

Poster