STEM |

STEM is a class at Mass Academy that’s taught by Dr. Crowthers. During STEM we collaborate with one another to complete projects such as our STEM Fair submissions.

Risk Factor Analysis For Sleep Disorders
STEMWebsiteQRCode
The evidence favored the hypothesis that each of the variables would affect the probability of developing a sleep disorder differently. When risk was calculated using the linear regression equation, different weights were assigned to each variable.
Abstract
This project used machine learning techniques to create a risk prediction system for insomnia and sleep apnea. Different lifestyle variables were investigated to see how they each determine someone’s likelihood of developing a sleep disorder. Sleep disorders, including insomnia and sleep apnea impact a person's ability to rest properly. Insomnia, the most common of these disorders, and sleep apnea, a condition that obstructs breathing are disorders that cause persistent difficulty in sleeping. These sleep disorders have "risk factors," encompassing a variety of categories such as age, gender, exercise levels, etc…. Identifying patterns in risk factors helps predict the likelihood of developing a disorder. This project aimed to determine how demographics and variables work as risk factors for developing sleep disorders. Data from diagnosed individuals was processed using TensorFlow and Microsoft Excel pattern recognition software, and common risk factors for each disorder were identified. Each disorder was affected differently by each of the risk factors. With a system that determines how likely a patient is to develop a sleep disorder, hospitals can shift resources from diagnosis over to preventative strategies. Once diagnosed with these conditions, maintaining a healthy lifestyle can become challenging, so it is best to identify the risk early and take precautionary measures. In the future, this technology could be implemented into hospitals and sleep clinics to provide an efficient method of risk prediction. Keywords: Risk prediction, Insomnia, Sleep apnea, Machine learning
Graphical Abstract
GraphicalAbstract
Research Proposal
Research Question
How do lifestyle variables and demographics determine someone's likelihood of developing a sleep disorder?
Hypothesis
The expected outcome was that each of the variables would have different effects on the risk of developing each of the disorders. Some variables would be more prevalent than others. Additionally, each variable's sub-categories would have different levels of severity as a risk factor within each sleep disorder.​
Background Infographic
BackgroundInfographic
Background
Procedure Infographic
ProcedureInfographic
Procedure
InsomniaRegression
Figure 1: This displays all variables of the equation for insomnia in the linear regression model for the 7 different variables. Significant data points include the “Multiple R” value and the “Adjusted R Squared” value. There were a total of 234 patients involved in this study.
SleepApneaRegression
Figure 2: This displays all variables of the equation for sleep apnea in the linear regression model for the 7 different variables. Significant data points include the “Multiple R” value and the “Adjusted R Squared” value. There were a total of 296 patients involved in this study.
OriginalInsomniaRegression
Figure 3: The original linear regression for insomnia based off of a different set of variables.
OriginalSleepApneaRegression
Figure 4: The original linear regression for sleep apnea based off of a different set of variables.
Analysis
The outcome of this project was a machine learning model that could predict someone’s chances of developing a sleep disorder based off certain characteristics and variables of theirs. When asked to calculate the chance for another subject, the model had to assess this subjects’ answers to each of the columns, and based of this output and answer that could be interpreted as a percent. However, this model did not have a high accuracy, and a multi-variable linear regression test was conducted to determine the coefficients for each variable. The coefficients related directly to the weight of each variable and allowed insight into how each individual variable effected the chances. The coefficient for Daily Steps in insomnia was 3.18281E-05, while the coefficient for Age was 0.056569301. This quantifies that Age was more directly linked to developing a sleep disorder that Daily Steps.
Discussion/Conclusion Applications
The evidence favored the hypothesis that each of the variables would affect the probability of developing a sleep disorder differently. While some variables were more prominent as risk factors in one disorder, they demonstrated to be less impactful for other sleep disorders. Examining the results showed an increased spike in probability for those with a high heart rate to suffer from sleep apnea, while this variable was seen to not be as closely linked with insomnia. Furthermore, within certain variables, some manifestations had greater effects on the chance of developing a sleep disorder than others. Older age proved to increase chances of developing insomnia far more than younger ages through the difference in their coefficients. This is relevant to the topic of prediction because differentiating between each of the variables and their contribution to the possibility of a sleep disorder can isolate problem areas. This provides an idea of what sleep specialists should focus on when evaluating a patient’s risk of either insomnia or sleep apnea. The accuracy of the TensorFlow model was 68% due to the structure of the model along with the dataset that was analyzed. This was considered an unsuccessful model because 68% is considered low in the risk prediction field. Potential limitation for the project included the type of data that was used. The dataset that was chosen for this study did not contain a wide variety for of answers for each variable which led to a skewed analysis. The results of this project agreed with the results of the project conducted by Dr. Alexander Huang for the risk prediction of insomnia. While the variables in his project were not all the same as this one, the variables that did overlap had generally the same weight along with the similar common manifestations (Huang, 2023). In the future, the TensorFlow model can be improved to increase its accuracy and risk prediction abilities. After determining the risk factors for each of the disorders, another model can be created in which a user can easily figure out the risk that they are in. The model could be in a survey format where they answer a few questions, and the application can determine their chance of getting insomnia, narcolepsy, and sleep apnea. This application can then be implemented in hospitals and sleep clinics as a method of checking up on the patient’s well-being. The survey would be useful, especially if someone goes to a sleep clinic because they are concerned about potentially developing a sleep disorder. A professional can easily tell a person their chances of getting one so that they can work towards lowering those odds. The focus of this project was to identify risk factors for the sleep disorders of insomnia and sleep apnea. to then utilize in a risk prediction model. This was done by gathering data through an online database of variables and demographics of those suffering with each of the sleep disorders. After, the data was processed through a multi-variable linear regression model that analyzed the frequency of each of the variables and determined their weight in the probability of a diagnosis. The results that this model yielded were different weights for each variable along with also changed between the two disorders as well. In a fast-paced world of development and efficiency, sleep disorders pose an impending threat. Risk prediction mitigates this threat and helps people return to innovation.
References
February Fair Poster