In STEM I and Science Technical
Writing (STW), instructed by Dr. Crowthers, students conduct an
independent research project in science, engineering, or mathematics. Read more about my project
Gene Expression Meta-Analysis
Identifies Novel Cell Type Specific Pathways
In my research project, I developed
new software and a computational pipeline to effectively analyze disparate gene expression datasets.
I then used this software to identify novel biological processes that underlie multiple sclerosis, a
severe neurodegenerative and autoimmune disease. The pathways found were supported by the current
literature, and highlighted cell type specific effects that were not previously known.
I plan on expanding this to additional data types and diseases.
Nearly 3 million people suffer from
multiple sclerosis (MS), a severe neurodegenerative disease with life-debilitating symptoms and an
economic burden of over 85 billion dollars per year in the U.S. alone. Despite being one of the most
common autoimmune and neurodegenerative diseases, the biological causes behind MS are still unknown.
Diagnosing and treating MS are imperative, but due to the absence of MS-specific biomarkers and
therapeutic targets, tests and drugs are ineffective or nonspecific with poor patient outcomes.
Studies that address this problem by profiling the gene expression landscape produce inconsistent
results. To address this issue, a robust meta-analysis of gene expression data was undertaken to
provide an unbiased cell and tissue type analysis of differentially expressed genes and dysregulated
pathways. This analysis elucidates cell-type specific effects and identifies a novel list of
potential drug targets and biomarkers. Among these, post-translational modification, ribosomal, and
mitochondrial mechanisms were found to be disrupted, implicating a new etiology that significantly
expands the current knowledge of this disease. This study represents a major step in both
understanding and outlining further targeted research into MS.
multiple sclerosis, RNA sequencing, gene regulation, biological pathways, rank aggregation, gene
Click here to view my supporting documents.
What are the cell-type specific
pathways that underlie multiple sclerosis?
By performing a meta-analysis on
gene expression data, a robust list of novel biological pathways can be identified in both immune and nervous system cells.
Multiple sclerosis (MS)
is a severe neurodegenerative and autoimmune disease that affects nearly 3 million people worldwide
(Filippi et al., 2018).
With a $85 billion cost annually in the United State alone, the discovery of specific biomarkers and
development of effective drugs are an incredibly high priority (Bebo et al., 2022). Despite decades
of research, the
cause and mechanisms underlying MS is a mystery. Numerous studies have attempted to understand MS by
profiling the gene expression landscape, but these produce contradictory and inconsistent results
(Elkjaer et al., 2022).
To better understand MS, it is necessary to uncover the cell-type specific processes within the
disease. With both nervous and immune system components, many different groups of cells at play
cause an increasingly complex and heterogeneous presentation in patients that makes not only
diagnosing the disease but choosing the right treatments difficult (Filippi et al., 2018). To obtain
specific data, rank aggregation can combine gene expression from various datasets without suffering
from damage to the data such as batch effects. Rank aggregation is the process of combining many
ranked lists into one, such that the output reaches some form of "truth" or "consensus" between the
input lists (Wang et al., 2022).
The methodology of this project is split into four main parts:
A list of gene expression datasets were curated from online
databases (GEO, Expression Atlas), and
subsequently processed using a series of
bioinformatics tools and programs. As I analyzed
RNA-sequencing data, the process consisted of: quality control, trimming and filtering the
sequencing reads, aligning the reads to a reference genome, then converting this into a gene count
Differential gene expression
analysis was performed on each of the gene count matrices, and
cell-type specific ranked lists of differentially expressed genes were generated for each of the
datasets. Differential expression analysis identifies genes that either had much higher or much
lower expression levels in diseased cells in comparison to healthy cells.
Rank aggregation was used to
combine the various cell-type specific ranked lists into a single
“master rank” — a ranked list of genes that more accurately represented the biological reality. This
was done on each of the cell-type specific lists, for a total of six “master ranks”.
The master ranks underwent pathway analysis, which
identified biological processes that were either
over-activated or under-activated in the disease based on the order of the ranked gene list. The
pathway analysis used 8 total databases, and the top pathways were subsequently studied for
relationships to MS.
results of CD4+ T-cells in Multiple Sclerosis.
results of "Normal Appearing White Matter" in Multiple Sclerosis.
results of B-cells in Multiple Sclerosis through the Gene Ontology database.
results of B-cells in Multiple Sclerosis through the Reactome database.
As a consequence of the methodology, the differential gene expression analysis and pathway analysis inherently calculate statistical
significance as part of the procedure. Additionally, multiple testing correction is also applied with the Benjamini-Hochberg procedure,
which reduces the false discovery rate. All pathways considered significant have a calculated p-value less than 0.01.
Discussion & Conclusion
February Fair Poster