STEM I is taught by Dr. Crowthers. Students in this course engage in a six-month long independent research project that involves reading literature, making hypotheses, designing/conducting experiments, and communicating their results. This culminates with the school-wide February STEM Fair, where students have the opportunity to present their work to judges working in related industries.
Using Immune Footprints in a Novel Deep Learning Model to Detect Human Diseases
U.S. Provisional No. 63/439,655
This project involves the design of a novel disease diagnosis method using the genetically sequenced antibodies from a patient and machine learning. When a patient has their blood drawn, their antibodies can be digitally catalogued as amino acid sequences, which can be analyzed by numerous machine learning models to see if any antibodies from the patient are similar to previously known antibodies associated with a disease. This will allow for rapid, simultaneous disease detection in a way that can revolutionize the healthcare industry.
Click here to access supporting project documents
Many diseases remain difficult, expensive, or slow to diagnose (Sujena et al., 2022). However, the immune system naturally carries disease “footprints” (National Cancer Institute, 2021) in the form of antibody sequences. With recent advancements in next-generation sequencing and deep learning, these “footprints” can be sequenced and analyzed to provide diagnostic information at a scale never seen before.
To design a deep learning model capable of predicting Chronic Lymphocytic Leukemia (CLL), COVID-19, and other diseases based on antibody sequences in a patient’s peripheral blood cells which serve as genetic “footprints” in the immune system.
Displayed here are the ensemble model’s predictions vs. truths on an unseen testing dataset. Categorically, the vertical axis represents true labels, while the horizontal axis represents predicted labels. Thus, different portions of the matrix represent certain combinations of true and predicted labels. Also, the major diagonal (top-left to bottom-right) where true and predicted labels match is containing of the correct predictions.
Based on the number of antibodies predicted for a disease, the binomial distribution can be used to formulate a diagnosis confidence level for the patient as a whole.