STEM I

Dr. Crowthers

Course Description
In STEM I and Science Technical Writing (STW), instructed by Dr. Crowthers, students conduct an independent research project in science, engineering, or mathematics. Read more about my project below!
Gene Expression Meta-Analysis Identifies Novel Cell Type Specific Pathways in Multiple Sclerosis
In my research project, I developed new software and a computational pipeline to effectively analyze disparate gene expression datasets. I then used this software to identify novel biological processes that underlie multiple sclerosis, a severe neurodegenerative and autoimmune disease. The pathways found were supported by the current literature, and highlighted cell type specific effects that were not previously known. I plan on expanding this to additional data types and diseases.
Abstract
Nearly 3 million people suffer from multiple sclerosis (MS), a severe neurodegenerative disease with life-debilitating symptoms and an economic burden of over 85 billion dollars per year in the U.S. alone. Despite being one of the most common autoimmune and neurodegenerative diseases, the biological causes behind MS are still unknown. Diagnosing and treating MS are imperative, but due to the absence of MS-specific biomarkers and therapeutic targets, tests and drugs are ineffective or nonspecific with poor patient outcomes. Studies that address this problem by profiling the gene expression landscape produce inconsistent results. To address this issue, a robust meta-analysis of gene expression data was undertaken to provide an unbiased cell and tissue type analysis of differentially expressed genes and dysregulated pathways. This analysis elucidates cell-type specific effects and identifies a novel list of potential drug targets and biomarkers. Among these, post-translational modification, ribosomal, and mitochondrial mechanisms were found to be disrupted, implicating a new etiology that significantly expands the current knowledge of this disease. This study represents a major step in both understanding and outlining further targeted research into MS.

Keywords: meta-analysis, multiple sclerosis, RNA sequencing, gene regulation, biological pathways, rank aggregation, gene expression
Graphical Abstract
Research Proposal
Click here to view my supporting documents.
Research Question
What are the cell-type specific pathways that underlie multiple sclerosis?
Hypothesis
By performing a meta-analysis on gene expression data, a robust list of novel biological pathways can be identified in both immune and nervous system cells.
Background
Multiple sclerosis (MS) is a severe neurodegenerative and autoimmune disease that affects nearly 3 million people worldwide (Filippi et al., 2018). With a $85 billion cost annually in the United State alone, the discovery of specific biomarkers and development of effective drugs are an incredibly high priority (Bebo et al., 2022). Despite decades of research, the cause and mechanisms underlying MS is a mystery. Numerous studies have attempted to understand MS by profiling the gene expression landscape, but these produce contradictory and inconsistent results (Elkjaer et al., 2022).

To better understand MS, it is necessary to uncover the cell-type specific processes within the disease. With both nervous and immune system components, many different groups of cells at play cause an increasingly complex and heterogeneous presentation in patients that makes not only diagnosing the disease but choosing the right treatments difficult (Filippi et al., 2018). To obtain robust cell-type specific data, rank aggregation can combine gene expression from various datasets without suffering from damage to the data such as batch effects. Rank aggregation is the process of combining many ranked lists into one, such that the output reaches some form of "truth" or "consensus" between the input lists (Wang et al., 2022).
Graphical Background
Methodology
The methodology of this project is split into four main parts:

A list of gene expression datasets were curated from online databases (GEO, Expression Atlas), and subsequently processed using a series of bioinformatics tools and programs. As I analyzed RNA-sequencing data, the process consisted of: quality control, trimming and filtering the sequencing reads, aligning the reads to a reference genome, then converting this into a gene count matrix.

Differential gene expression analysis was performed on each of the gene count matrices, and cell-type specific ranked lists of differentially expressed genes were generated for each of the datasets. Differential expression analysis identifies genes that either had much higher or much lower expression levels in diseased cells in comparison to healthy cells.

Rank aggregation was used to combine the various cell-type specific ranked lists into a single “master rank” — a ranked list of genes that more accurately represented the biological reality. This was done on each of the cell-type specific lists, for a total of six “master ranks”.

The master ranks underwent pathway analysis, which identified biological processes that were either over-activated or under-activated in the disease based on the order of the ranked gene list. The pathway analysis used 8 total databases, and the top pathways were subsequently studied for relationships to MS.
Graphical Methodology
Figures
Figure 1
Pathway analysis results of CD4+ T-cells in Multiple Sclerosis.
CD4+ T-cell pathway analysis
Figure 2
Pathway analysis results of "Normal Appearing White Matter" in Multiple Sclerosis.
NAWM pathway analysis
Figure 3
Pathway analysis results of B-cells in Multiple Sclerosis through the Gene Ontology database.
B-cell GO pathway analysis
Figure 4
Pathway analysis results of B-cells in Multiple Sclerosis through the Reactome database.
B-cell Reactome pathway analysis
Analysis
As a consequence of the methodology, the differential gene expression analysis and pathway analysis inherently calculate statistical significance as part of the procedure. Additionally, multiple testing correction is also applied with the Benjamini-Hochberg procedure, which reduces the false discovery rate. All pathways considered significant have a calculated p-value less than 0.01.
Discussion & Conclusion
References
February Fair Poster