A singular night of disrupted slumber, often dismissed as a mere precursor to daytime grogginess, may in fact hold profound implications for an individual’s long-term health trajectory. Researchers at Stanford Medicine, in collaboration with a consortium of academic partners, have pioneered an advanced artificial intelligence system capable of scrutinizing intricate physiological data captured during a single sleep cycle, thereby estimating an individual’s predisposition to over 100 distinct medical conditions. This groundbreaking innovation leverages the rich, often underutilized, information embedded within comprehensive sleep studies.
The foundational architecture of this intelligent system, designated as SleepFM, was meticulously developed through the analysis of an extensive repository comprising nearly 600,000 hours of sleep recordings. This vast dataset was sourced from a cohort of 65,000 participants, each having undergone polysomnography. Polysomnography, widely recognized as the definitive diagnostic standard for evaluating sleep, employs a sophisticated array of sensors to meticulously monitor a spectrum of physiological parameters throughout the nocturnal period. These parameters include electroencephalographic (EEG) activity charting brainwave patterns, electrocardiographic (ECG) readings capturing cardiac rhythms, respiratory airflow and effort, oculomotor movements denoting eye activity, and electromyographic (EMG) data reflecting muscle tone and movement.
Historically, polysomnography has been primarily employed for the diagnosis of sleep-related disorders, such as insomnia, sleep apnea, and narcolepsy. However, the scientific community has increasingly recognized that the wealth of physiological data generated during these extensive overnight assessments represents a largely untapped reservoir of information concerning general physiological health. As Dr. Emmanual Mignot, the Craig Reynolds Professor in Sleep Medicine and co-senior author of the study, elucidates, "We record an amazing number of signals when we study sleep. It’s a kind of general physiology that we study for eight hours in a subject who’s completely captive. It’s very data rich." In the conventional clinical application of polysomnography, only a fraction of this comprehensive data is typically extracted and analyzed, primarily focusing on metrics directly relevant to sleep architecture and common sleep disturbances.
The advent of sophisticated artificial intelligence, particularly in the realm of machine learning and deep learning, has now empowered researchers to delve into these complex and voluminous datasets with unprecedented depth and accuracy. This novel research marks a significant milestone as the first instance of applying AI to sleep data on such a monumental scale. James Zou, PhD, an associate professor of biomedical data science and another co-senior author, highlights the relative scarcity of AI research focused on sleep: "From an AI perspective, sleep is relatively understudied. There’s a lot of other AI work that’s looking at pathology or cardiology, but relatively little looking at sleep, despite sleep being such an important part of life."
To unlock the latent insights within this extensive sleep data, the research team embarked on the creation of a foundational model, a sophisticated type of AI designed to assimilate broad patterns from exceptionally large and diverse datasets. This acquired knowledge can then be flexibly adapted and applied to a multitude of specific tasks, analogous to how large language models like ChatGPT are trained on vast textual corpora to perform various language-based functions. SleepFM, however, is trained not on text, but on intricate biological signals.
The training regimen for SleepFM involved approximately 585,000 hours of polysomnography data meticulously collected from patients who had sought evaluation at specialized sleep clinics. Each individual sleep recording was systematically segmented into discrete five-second intervals. These segments function as analogous "tokens" or units of information, akin to the words used in training text-based AI systems, allowing the model to learn the sequential and contextual relationships within the physiological data. "SleepFM is essentially learning the language of sleep," Professor Zou articulated.
This sophisticated model operates by integrating multiple concurrent streams of physiological information. This includes not only brainwave activity but also intricate cardiac rhythms, muscle activation patterns, pulse wave characteristics, and the dynamics of respiratory airflow. Crucially, SleepFM learns how these disparate signals dynamically interact and influence one another. To foster a deeper understanding of these interdependencies, the researchers devised an innovative training methodology termed "leave-one-out contrastive learning." This technique involves systematically isolating one specific type of physiological signal and then tasking the AI model with reconstructing that missing data stream solely from the remaining available signals. This process compels the AI to identify and understand the predictive relationships between different physiological modalities. Professor Zou elaborated on this technical advancement: "One of the technical advances that we made in this work is to figure out how to harmonize all these different data modalities so they can come together to learn the same language."
Following its comprehensive training, the researchers strategically adapted SleepFM for a series of specialized diagnostic and predictive tasks. Initially, the model was rigorously tested against established benchmarks for standard sleep assessments, including the accurate identification of distinct sleep stages (such as REM, NREM stages 1-3) and the precise quantification of sleep apnea severity. In these initial evaluations, SleepFM demonstrated performance levels that either matched or surpassed those of the most advanced existing models currently employed in clinical practice.
The team then set its sights on a more ambitious objective: to ascertain whether the physiological signatures captured during sleep could serve as reliable predictors of future disease development. To achieve this, they ingeniously linked the polysomnography records of individuals with their subsequent long-term health outcomes. This critical linkage was made possible by the researchers’ access to an extraordinary longitudinal dataset, comprising decades of meticulously maintained medical records from a single, pioneering sleep clinic.
The Stanford Sleep Medicine Center, established in 1970 by the visionary Dr. William Dement, widely acclaimed as the progenitor of modern sleep medicine, provided the invaluable historical data. The largest and most significant cohort utilized for training SleepFM comprised approximately 35,000 patients, whose ages ranged from two to ninety-six years old. Their sleep studies, conducted at the clinic between 1999 and 2024, were meticulously paired with electronic health records that provided a detailed health history for some patients spanning up to 25 years. Dr. Mignot, who himself directed the sleep center for nearly a decade, noted that while the clinic’s paper-based polysomnography records extend even further back in time, their digital integration was crucial for this AI-driven analysis.
By leveraging this unique integrated dataset, SleepFM meticulously analyzed over 1,000 distinct disease categories. The AI identified 130 conditions for which sleep data alone could provide a reasonably accurate prediction of future risk. The most robust predictive performance was observed for a range of serious conditions, including various forms of cancer, complications arising during pregnancy, cardiovascular diseases, and mental health disorders. For these categories, the prediction scores, quantified using the C-index, consistently exceeded 0.8.
The C-index, or concordance index, is a statistical metric used to evaluate the discriminatory power of a predictive model. It quantifies how effectively the model can correctly rank individuals based on their risk of experiencing a specific health event. In essence, it measures the probability that, for any randomly selected pair of individuals, the model will correctly identify the one who is more likely to experience the event sooner. Professor Zou explained its significance: "For all possible pairs of individuals, the model gives a ranking of who’s more likely to experience an event — a heart attack, for instance — earlier. A C-index of 0.8 means that 80% of the time, the model’s prediction is concordant with what actually happened."
SleepFM exhibited particularly high accuracy in predicting specific conditions, achieving a C-index of 0.89 for Parkinson’s disease and prostate cancer, 0.85 for dementia, 0.84 for hypertensive heart disease and all-cause mortality, and 0.87 for breast cancer. "We were pleasantly surprised that for a pretty diverse set of conditions, the model is able to make informative predictions," Professor Zou remarked, underscoring the broad applicability of the AI’s predictive capabilities. He further contextualized these findings by noting that predictive models with C-indices around 0.7, which are considered moderately accurate, are already integrated into current medical practice, such as those assisting in predicting patient responses to specific cancer therapies.
The research team is actively engaged in further refining SleepFM’s predictive accuracy and, perhaps more importantly, in developing methods to enhance the interpretability of the AI’s decision-making process. Future iterations of the system are anticipated to incorporate data from wearable biosensors, thereby expanding the spectrum of physiological signals that can be analyzed and potentially enriching the predictive power of the model. Professor Zou acknowledged the current limitations in direct explanation: "It doesn’t explain that to us in English. But we have developed different interpretation techniques to figure out what the model is looking at when it’s making a specific disease prediction."
Initial explorations into the AI’s reasoning revealed intriguing patterns. While cardiovascular signals proved most influential in predicting heart-related diseases and neurological signals played a more prominent role in forecasting mental health disorders, the most accurate predictions invariably arose from the synergistic integration of all available data modalities. Dr. Mignot emphasized this holistic approach: "The most information we got for predicting disease was by contrasting the different channels. Body constituents that were out of sync — a brain that looks asleep but a heart that looks awake, for example — seemed to spell trouble." This suggests that subtle desynchronizations or disharmonies between different bodily systems during sleep, imperceptible to the human observer, can serve as early indicators of underlying pathology.
The seminal study was co-led by Rahul Thapa, a PhD student in biomedical data science, and Magnus Ruud Kjaer, a PhD student at the Technical University of Denmark. The collaborative effort involved researchers from several esteemed institutions, including the Technical University of Denmark, Copenhagen University Hospital – Rigshospitalet, BioSerenity, the University of Copenhagen, and Harvard Medical School. Funding for this groundbreaking research was generously provided by the National Institutes of Health (grant R01HL161253), the Knight-Hennessy Scholars program, and the Chan-Zuckerberg Biohub.
