Introduction
Individuals with one psychiatric disorder are at such an elevated risk for a second psychiatric diagnosis1 that comorbidity is the norm rather than the exception2. Critically, this problem extends to physical disorders with multimorbidity (\(\ge 2\) chronic conditions) observed in ~38% of the global population3. The separation of psychiatric from physical diseases reflects a traditional dichotomy distinguishing disorders with characterizable physical changes (e.g., organic) from those without (e.g., functional)4. However, prior research utilizing large-scale national registries indicates that pervasive comorbidity extends beyond this theoretical distinction, with a psychiatric diagnosis also associated with an increased likelihood of developing a physical illness5,6. In isolation, psychiatric conditions already contribute substantially to excess mortality7,8,9, diminished health-related quality of life10, reduced functional status11, and increased primary care burdens12. When comorbid psychiatric and physical illnesses are present, this further exacerbates negative clinical outcomes10,11,12,13. For example, early mortality amongst diabetes cases is 38% higher when comorbid with a diagnosis of major depression13. Given these significant clinical implications, understanding the etiological factors underlying these comorbidities is crucial.
Pervasive comorbidity indicates that there are likely transdiagnostic risk pathways shared beyond single psychiatric–physical disorder pairings. For example, literature investigating patterns of physical comorbidity among patients with depression has found an increased risk for coronary artery disease14. However, it may be that the risk pathways shared between coronary artery disease and depression are shared broadly across other internalizing disorders (e.g., anxiety) that phenotypically1 and genetically15 overlap. However, a hindrance to multivariate applications of modeling broader systems of comorbidity patterns (e.g., structural equation modeling) has historically been the practical limitations of obtaining large samples that encompass a variety of psychiatric and physical outcomes. The current study leverages cutting-edge multivariate genomic methods applied to a wide range of physical and psychiatric datasets to circumvent these pragmatic barriers and evaluate whether genetic overlap across pairs of psychiatric and physical disorders cuts across diagnostic boundaries.
Genomic structural equation modeling (SEM16) extends bivariate frameworks for estimating genome-wide genetic correlations, such as LD-score regression (LDSC17), into a multivariate approach designed to model genome-wide genetic correlations across multiple traits. As genome-wide association study (GWAS) summary statistics need not come from the same participant sample, Genomic SEM offers the unique opportunity to examine genetic overlap across a broad range of rare and potentially mutually exclusive outcomes. Here, we apply Genomic SEM and introduce and validate a new extension, genomic exploratory SEM (E-SEM), to explore the relationships across psychiatric (compulsive, psychotic/thought, neurodevelopmental, internalizing, and substance use) and newly identified latent genomic factors of physical illness. Genomic E-SEM is developed for genomic exploratory factor analyses and enables examining the multivariate genomic architecture of complex traits for which there are no a priori model specifications or when theoretically derived model specifications do not provide a reasonable fit to the observed data. The combination of these exploratory and confirmatory genomic modeling approaches allows us to include a broad range of psychiatric and physical health phenotypes ascertained from different participant samples in the same statistical model, conduct exploratory and confirmatory factor analyses to identify genomic factors indexing shared risk across groupings of traits, and delineate transdiagnostic risk pathways from more circumscribed shared risk pathways that can reflect overlap unique to a pair of disorders. These results provide a critical context for interpreting the consistent observations of psychiatric–physical comorbidity in the extant literature.
Results
Factor structures across physical illness domains
Through a comprehensive analysis of physical illness phenotypes spanning eight domains (neurological, respiratory, circulatory, digestive, endocrine/metabolic, genitourinary, musculoskeletal, and cancer), we identified 73 physical disorders across 1,889,850 case observations. We first estimated the genome-wide genetic correlations (rg’s) using LDSC, which specifically captures the average direction of risk sharing across common variants in the genome. The genetic correlations were used to estimate a single-factor model in each domain in Genomic SEM. When this simple structure did not provide adequate model fit, we utilized Genomic E-SEM to conduct exploratory factor analyses (EFAs). We validate key features of Genomic E-SEM through a series of simulations and demonstrate its flexibility to identify complex factor structures across a wide range of physical outcomes. We further describe the features and validation procedures in the “Method” below.
Using a pre-registered modeling pipeline (https://doi.org/10.17605/OSF.IO/YBA58), we identified a common factor for genitourinary disorders and complex, multifactorial structures for the other seven domains (see Supplementary Figs. 1–8). These models all provided a good fit to the data, evidenced by confirmatory fit indices (CFIs) > 0.9 and standardized root mean square residuals (SRMR) < 0.1. In multifactorial solutions within systems, each latent factor represents subgroups of physical illnesses that exhibit higher levels of genome-wide genetic correlation compared to the other conditions within that domain. For example, in the respiratory system, we identified a factor defined by traits with an atopic component (asthma, nasal polyps) and another factor characterized by high genetic correlation in traits with central pulmonary pathology (pneumonia, chronic airway obstruction). We report model fit indices for the systems-level models in Supplementary Data 1.
Relationships between physical and psychiatric illness latent factors
We took forth the factor structures identified in each physical illness system and modeled them alongside five previously established latent psychiatric factors18 representing the genetic correlation between compulsive, psychotic/thought, neurodevelopmental, internalizing, and substance use factors defined by subclusters of 13 psychiatric disorders (see Fig. 1). To determine the extent to which factor correlations are operating through common pathways or pathways driven by specific pairs of traits we utilized the QFactor heterogeneity index19. A significant QFactor statistic indicates that the inter-factor correlation fails to recapture the pairwise genome-wide genetic correlations across the conditions that define the factors. For example, a QFactor statistic would likely be significant if two physical illnesses on one factor had directionally opposing relationships with the disorders defining a psychiatric factor. QFactor thereby differentiates between broad risk-sharing patterns reflected in the factor correlation and risk-sharing more appropriately modeled at the more circumscribed disorder level.
Single-headed arrows represent regression paths. Curved double-headed arrows represent correlations among the (residual) genetic variance components for each trait. Each u represents residual variances for the psychiatric and physical traits. The three dots next to Px depict a truncated factor structure for physical illness domains where a multifactorial solution was identified. Comp compulsive, Psych psychotic/thought, Neuro neurodevelopmental, Int internalizing, SUD substance-use disorders.
Across all eight physical illness domains, we identified 46 significant inter-factor correlations with latent psychiatric factors with a Bonferroni corrected threshold of p < 4.5 \(\times \,{10}^{-4}\) (0.05/110 inter-factor correlations) that were not QFactor significant at the same threshold. Figure 2 visually depicts the inter-factor genetic correlations across the physical and mental illness domains, with the corresponding numeric values presented in Supplementary Data 2. Overall, our results revealed distinct patterns of relationships depending on the psychiatric domain.
Inter-factor genetic correlations between physical and psychiatric illness factors estimated within each physical illness domain with Genomic SEM. Dashed bars represent traits not surpassing a two-sided Bonferroni-corrected significance threshold of p < 4.5 \(\times {10}^{-4}\) (0.05/110 inter-factor genetic correlations) and orange bars represent significant QFactor results at the same threshold. Error bars depict ±95% confidence intervals. Data S2 contains exact point estimates, along with corresponding standard errors and p values, presented in this figure. Data S7 and S8, respectively, contain sample size and study information for physical and psychiatric disorders analyzed in this plot. Comp compulsive, Psych psychotic/thought, Neuro neurodevelopmental, Int internalizing, SUD substance-use disorders.
For the compulsive factor (defined by anorexia nervosa, obsessive–compulsive disorder, and Tourette’s syndrome), we observed a protective effect for Digestive Factor 1, defined by acute pancreatitis, appendicitis, cholelithiasis/cholecystitis, and peptic ulcers (inter-factor rg: −0.26, SE = 0.07). For the psychotic/thought disorders factor (defined by bipolar disorder and schizophrenia), we observed significant genome-wide genetic correlations with physical illness factors in the digestive, musculoskeletal, and respiratory systems (mean rg: 0.15; range: 0.10 to 0.20). We further observed significant QFactor results with factors in digestive, neurological, endocrine/metabolic, and genitourinary systems, indicating divergent associations across bipolar disorder (BD), schizophrenia (SCZ), and the physical illnesses defining these factors. Bivariate rg’s between schizophrenia, bipolar disorder, and physical disorders revealed that BD is likely driving heterogeneous factor relationships indicated by QFactor, with BD exhibiting greater mean rg’s than SCZ.
In contrast, we observed pervasive and substantial genome-wide genetic correlations between neurodevelopmental, internalizing, substance use disorders, and a range of physical illness systems. Within the neurodevelopmental domain (defined by autism spectrum disorder, attention deficit hyperactivity disorder, posttraumatic stress disorder, major depression, and Tourette’s syndrome), we observed moderate to high inter-factor correlations with factors across all physical illness systems (Mean rg: 0.51; range: 0.25–0.76). We also observe heterogeneous associations with factors in the circulatory, digestive, musculoskeletal, respiratory, neurological, and endocrine/metabolic systems. Inspection of the pairwise genome-wide genetic correlations between physical outcomes and the disorders that define the neurodevelopmental factor indicates that these QFactor results were driven by larger rg’s with attention deficit hyperactivity disorder (ADHD) and posttraumatic stress disorder (PTSD).
For internalizing disorders (defined by posttraumatic stress disorder, major depression, and anxiety disorders), we further observed significant inter-factor correlations with specific factors across all physical illness systems (Mean rg: 0.32; range: 0.13–0.55). In addition, we find heterogeneous (QFactor) relationships between the internalizing factor and factors in the circulatory, digestive, and respiratory systems. Genetic correlations between internalizing and physical illnesses indicate that major depression (MD) and PTSD were driving heterogeneous inter-factor relationships.
In line with neurodevelopmental and internalizing disorders, we observed substantial positive relationships between the substance use disorders (defined by alcohol use disorder, cannabis use disorder, and opioid use disorder) and factors across all physical illness systems (Mean rg: 0.39; range: 0.18–0.91), with one significant QFactor result in the cancer system. Unlike neurodevelopmental and internalizing disorders, the relative absence of QFactor findings across most illness systems for the substance use factor indicates that the shared genetic signal across this group of psychiatric disorders is shared broadly with the genetics underlying physical illness.
Providing broad characterization for the specific psychiatric disorders driving heterogeneous factor relationships, we conducted bivariate analyses between psychiatric and physical outcomes (Supplementary Fig. 9). Two findings of note here are that: (i) compared to all other psychiatric disorders in psychiatric domains with multiple QFactor significant relationships, PTSD, ADHD, and MD had the highest average rg with physical illnesses and (ii) perhaps most surprising was that the average rg between these disorders and physical illnesses was higher than the average pairwise rg between physical disorders and other physical disorders (Fig. 3). This pattern of results collectively indicates that the genetic signal for PTSD, ADHD, and MD indexes an expansive set of risk pathways that transcend psychiatric and physical illness diagnostic boundaries that are likely driving the observed instances of heterogeneous factor relationships (as indexed by QFactor).
Individual dots depict mean pairwise genetic correlations (rgs) between the 13 psychiatric traits and 75 physical traits. The error bars depict one standard deviation above and below each mean. The last two bars depict means and standard deviations (SDs) for pairwise rgs between all unique physical-physical and psychiatric-psychiatric combinations. Data S7 and S8 respectively contain sample size and study information for physical and psychiatric disorders analyzed in this plot. Phys physical, Psych psychiatric.
Factor analysis across all physical traits
In the primary analysis, we utilized physical illness domains based on prior epidemiological work, grouping traits according to the body system where the disease manifests. However, this categorization system assumes that physical outcomes manifesting in the same body system share a common etiology. We evaluated this assumption by conducting a follow-up analysis using a data-driven, system-agnostic model across all 73 curated physical traits. We began by fitting a single-factor model across all physical illness measures and then removed any traits with a standardized factor loading below 0.6. This process yielded a common factor across 21 physical traits spanning six of eight domains of illness, which provided an adequate fit to the observed genome-wide genetic correlations (CFI = 0.91; SRMR = 0.11; see Supplementary Data 3 and Fig. 4A).
A Single-headed arrows represent regression paths and curved double-headed arrows represent residual genetic variance components. The outer circles with arrows pointing to each indicator represent indicator residual variances, with the corresponding value and SE presented in each circle. AHS abnormal heart sounds, CCD cardiac conduction disorders, HFA heart failure, PVD peripheral vascular disease, APA acute pancreatitis, CON constipation, DES diseases of the esophagus, DYS dysphagia, GAD gastritis & duodenitis, GIH gastrointestinal hemorrhage, PCE peptic ulcer excl. esophageal, IDD intervertebral disc disorders, PEA peripheral enthesopathies & allied syndromes, SAD spondylosis & allied disorders, CAO chronic airway obstruction, PNE pneumonia, RAB respiratory abnormalities, NPD nerve, root, and plexus disorders, SLD sleep disorders, HEM hematuria, UTI urinary tract infection. B Manhattan plot for multivariate GWAS of common physical disease factor estimated with Genomic SEM. The chromosomes are displayed on the x-axis, and the −log10 p-values for the association between each SNP and the physical illness factor are displayed on the y-axis. The dashed gray line denotes the two-sided genome-wide significance threshold of 5 × 10−8. C PheWAS plot for common physical disease factor conducted in Penn Medicine Biobank with LDpred2. The x-axis depicts the respective system for each EHR-derived phenotype, with −log10 p-values displayed on the y-axis. All significance testing was two-sided. The triangles display the direction of effect (e.g., upwards triangles for positive and downwards triangles for negative).
Multivariate GWAS and PheWAS of broadly transdiagnostic physical illness factor
We applied Genomic SEM to conduct a multivariate GWAS on the physical illness factor, which estimates the effects of single-nucleotide polymorphisms (SNPs) on the shared liability across the 21 physical disorders defining the factor. We removed genetic loci that were genome-wide significant \((p < \, 5\, \times \,{10}^{-8})\) for QSNP, a heterogeneity metric that is conceptually similar to the QFactor metric described above, that functions to remove SNPs that are unlikely to operate via the factor (e.g., if the SNP is only associated with one of the 21 physical disorders). After removing these QSNP loci, we identified 27 genome-wide significant risk loci (Supplementary Data 4 and Fig. 4B).
We then conducted a phenome-wide association study (PheWAS) in the Penn Medicine BioBank (PMBB; N = 29,883), testing for associations between multivariate GWAS results for the transdiagnostic physical illness factor (expected sample size \(\hat{[N]}\) = 106,118.7) and a wide range of disease phecodes (see Fig. 4C). Providing external validity for our risk loci, we found substantial effects across multiple physical illness systems (e.g., chronic airway obstruction, type 2 diabetes, obesity, hypertension, and ischemic heart disease). Additional significant associations emerged for sleep-related breathing disorders (sleep apnea, shortness of breath), emphysema, GERD/diseases of the esophagus, chronic pain, and viral hepatitis C. PheWAS results revealed no significant associations with psychiatric outcomes. This result may be due to more limited power to predict outcomes, such as psychiatric disorders, not used to define the 21-disorder common physical factor in Genomic SEM. Further, the ratio of cases to controls was much smaller for many of the psychiatric disorders in the PMBB, resulting in a much smaller effective sample size, relative to many of the physical outcomes. We provide a summary of PheWAS results and the number of cases for all associations with a p < 0.05 in Supplementary Data 5.
To investigate this power-based interpretation of null psychiatric associations, we went on to examine better powered 34-indicator common medical factor \(\left(\right.\hat{N}\) = 129,650.2), expanded to include a combination of symptoms and disease proxies (Supplementary Fig. 10a). We find that this updated medical factor significantly predicted psychiatric illnesses (see Supplementary Fig. 10b). However, the lack of associations for the 21-disorder model should still be kept in mind when interpreting genetic correlations reported below.
We then modeled the 21-disorder common physical factor alongside the five psychiatric factors utilized in the primary analysis, using the same Bonferroni corrected threshold \((p \, < \,4.5 \,\times \,{10}^{-4})\) to identify significant inter-factor correlations. Consistent with the large, pervasive genetic correlations observed in the systems-level analysis (see Fig. 5; Supplementary Data 6), we identified substantial inter-factor correlations with substance use (rg = 0.63, SE = 0.04), internalizing (rg = 0.61, SE = 0.03), and neurodevelopmental factors (rg = 0.68, SE = 0.05). In addition, we identified significant QFactor results with the psychotic/thought disorders factor.
Inter-factor genetic correlations between the common disease factor (expected sample size \(\hat{[N]}\) = 106,118.7) and five psychiatric factors estimated with Genomic SEM. All relationships were significant at a Bonferroni-corrected significance threshold of p < 5.6 \(\times {10}^{-4}\) (0.05/110 inter-factor genetic correlations) with orange bars depicting significant QFactor results at the same threshold. Error bars depict ±95% confidence intervals. Data S7 and S8, respectively, contain sample size and study information for physical and psychiatric disorders analyzed in this plot. Comp compulsive (p = 1.48E − 2), Psych psychotic/thought (p = 3.69E − 9), Neuro neurodevelopmental (p = 9.51E − 41), Int internalizing (p = 4.61E − 92), SUD substance-use disorders (p = 1.57E − 45).
Discussion
Prior research consistently indicates that the comorbidity problem extends beyond psychiatry to include physical conditions20,21,22,23,24. However, the extant epidemiological literature has primarily focused on analyzing pairwise relationships between psychiatric and physical illnesses that are both measured and have enough cases to yield reliable comorbidity estimates in single cohorts. This has reduced the viability of multivariate approaches that account for multimorbidity patterns within illness domains while assessing comorbidity patterns across domains. Our findings provide a detailed characterization of broad and specific genetic risk pathways connecting psychiatric and physical illness systems. Leveraging publicly available GWAS summary statistics for 73 physical disorders, we systematically modeled the complex factor structure across eight major domains of physical illness. Through further modeling with established clusters of psychiatric illness, our results revealed substantial genome-wide genetic correlations across specific clusters of psychiatric and physical conditions. In contrast to phenotypic findings that might examine comorbidity with physical conditions in a sample ascertained for a psychiatric disorder, the current estimates reflect genomic data for different participant samples. The fact that we still observed such pervasive genome-wide genetic correlations has important implications for interpreting emergent GWAS signals, etiological models, and our current nosology.
Across eight systems of medical illness, Genomic E-SEM characterized the latent structure across high-dimensional genetic landscapes. Combined with robust applications of factor retention heuristics, Genomic E-SEM uncovered patterns of global genetic signal shared between an array of physical illnesses, with subsequent confirmatory models exceeding conventional fit criteria. Genomic E-SEM offers a user-friendly, scalable pipeline with valid standard errors, allowing for more reliable estimates than naïve exploratory approaches. Nevertheless, it shares the typical exploratory caveats – specification of the number of latent factors and dependence on investigator judgment for factor loading cut-offs, underscoring the necessity for future replication. Our identification of well-fitting latent structures across all eight systems in a single analytic framework reflects Genomic E-SEM’s practical utility for mapping the genomic architecture of complex multimorbidity.
The prior phenotypic literature for psychotic/thought disorders has identified numerous physical comorbidities across multiple illness domains for both schizophrenia25,26 and bipolar disorder27,28 (e.g., respiratory, neurological, and gastrointestinal disorders), with meta-analyses indicating a comorbid physical disease among ~50% of schizophrenia cases29. Genome-wide investigations have corroborated these findings. The extant literature has also linked common-variant liability between psychotic/thought disorders to metabolic and gastrointestinal liability30,31. Our results support prior literature and extend our knowledge of relationship specificity across these disease systems. At the factor level, we find that psychotic/thought disorders (BD, SCZ) exhibit modest genome-wide genetic correlations with factors in digestive, musculoskeletal, and respiratory domains, indicating that liability shared between these disorders further cuts across diagnostic boundaries in specific domains of physical illness. Certain risks, however, appear to be disproportionally driven by single psychotic/thought disorders, suggesting that signal from BD is driving risk-sharing with genitourinary disorders and specific factors in neurological (sleep disorders), endocrine/metabolic (e.g., hypothyroidism, diabetes type 1), and digestive (e.g., irritable bowel syndrome) systems.
For compulsive disorders, the phenotypic literature describes an increased risk for several physical comorbidities in the metabolic, musculoskeletal, and circulatory domains32,33. Conversely, large-scale genomic analyses have found modest negative genome-wide genetic correlations between compulsive disorders and body-mass index, with null genetic correlations for cardiometabolic traits34. In line with prior molecular genetic investigations, we also observe null genetic correlations with cardiometabolic traits, as well as a moderate protective association with a digestive factor defined by acute pancreatitis, appendicitis, cholelithiasis, and peptic ulcers. The discrepancy between observed genome-wide and phenotypic patterns of shared liability may reflect shared risk factors independent from common genetic variation measured in the current analyses (e.g., upstream environmental risk factors or rare genetic variants).
Epidemiological literature has documented extensive physical comorbidities across disorders in substance use, internalizing, and neurodevelopmental domains6,21,24,35,36. We identified genome-wide genetic correlations between the substance use factor and factors in all eight physical illness domains. Our results support prior literature that finds extensive pleiotropy between substance use GWAS and metabolic, inflammatory, and pulmonary traits30. The substance use factor also exhibited far less inter-factor heterogeneity, indicating that the genetic variance shared across multiple substance use disorders is shared substantially with a broad set of physical outcomes. The internalizing factor also exhibited genome-wide genetic correlation with at least one factor in all eight physical illness domains, extending extant literature that finds pervasive genome-wide risk between internalizing disorders and cardio-metabolic37 and respiratory traits38. However, more associations with internalizing disorders appeared to be disorder-specific—for example, specific factors within the respiratory, circulatory, and digestive domains—with PTSD and MD exhibiting the greatest average rg with physical illness traits among internalizing disorders. We observed a similar broad and disorder-specific risk-sharing pattern for the neurodevelopmental factor, with disorders defining this factor previously found to exhibit substantial genome-wide risk-sharing with physical illnesses30. The pattern of bivariate genetic correlations between disorders that loaded on the neurodevelopmental factor and physical illnesses suggests that ADHD and PTSD, which displayed a higher mean rg with physical than psychiatric disorders, are likely driving much of the disorder-specific genetic correlations with physical health outcomes. Although we identify relationships with physical illnesses operating almost entirely via the factor for substance use disorders, the heterogeneity observed within internalizing and neurodevelopmental disorders underscores the importance of employing a nuanced statistical approach capable of examining varying levels of genetic correlation.
The theoretical dichotomy between psychiatric and physical illnesses reflects a historical divide between conditions of functional etiology, characterized by the absence of identifiable physical changes and often assumed to be psychological in origin, and those of organic etiology, marked by identifiable physical changes. Psychiatry inherited the former, while other specialized medical disciplines inherited the latter4. Although this organ- and symptom-based classification remains essential for biological classification and mechanistic insight, it does not exhaust the array of useful nosologies. Our findings provide further context to this historical dichotomy, highlighting an avenue for disease organization based on systems of shared risk from statistically inferred factors. Within physical illness domains, we identified a shared genome-wide risk factor that cuts across six illness systems and exhibits substantial cross-disorder predictive validity in external samples. We observed pervasive genome-wide genetic correlation between neurodevelopmental, internalizing, and substance use factors and physical illness factors in systems and systems-agnostic levels of analysis, with the average genetic correlations between ADHD, MD, PTSD, and physical disorders higher than the mean pairwise relationships within the physical domain. This risk-sharing pattern suggests that the genetic signal for a range of physical outcomes is diffuse and highly overlapping with psychiatric risk pathways. As such, the emergent genetic signal likely reflects genetically distal causal chains that are predominantly not trait-specific. Indeed, the extant literature has found many shared behavioral risk factors for both physical39 and psychiatric40,41 illness (e.g., substance use, nutrition, sedentary behavior). These findings suggest a need for an additional classification system, conceptualizing illnesses as outcomes from systems of interconnected risk, likely unrelated to particular phenotypic presentations. This framework need not imply that psychiatric and physical disorders are identical in clinical expression or management, but sheds light on common etiological factors that can inform prevention strategies targeting liability rather than its diverse downstream manifestations.
Several limitations exist when interpreting our findings, chiefly the restriction to European-only GWAS summary statistics. Our analyses would greatly benefit from including individuals across a broader range of ancestral backgrounds as these data become larger. The current findings specifically index genetic risk sharing using genome-wide genetic correlations estimated from common variants (e.g., SNPs with a minor allele frequency >1%). Future work should reevaluate our observed patterns of risk-sharing in rare genetic variation and at more localized levels of analysis. The importance of analysis at more granular levels is highlighted by recent applications of local analysis of [co]variant association (LAVA42) that find significant local, but not genome-wide correlations, for psychiatric disorders and insulin-resistant conditions (e.g., type 2 diabetes43) and neurodegenerative diseases (e.g., Alzheimer’s44). The physical factor structures at systems-specific and -agnostic levels reported above reflect clusters of shared genetic risk across common physical conditions. It is also important to note that although we observed substantial transdiagnostic predictive validity for the common physical risk factor in the PMBB Phewas, the instrument derived from the 21-disorder model did not significantly predict psychiatric disorders, despite exhibiting substantial genome-wide rgs with psychiatric risk factors. Further, as GWAS data for psychiatric and physical symptoms become publicly available, future work should evaluate the validity of conceptualizing disorders as independent diagnostic units. The absence of significant results should also be interpreted in the context of power differences for the GWAS that defined the psychiatric or physical factors.
The pattern of relationships across groups of psychiatric and physical disorders reported above may reflect some combination of vertical and horizontal pleiotropy. Vertical pleiotropy refers to a causal chain where a genetic variant influences a first trait, which influences a second trait. In contrast, horizontal pleiotropy reflects genetic variants that jointly influence multiple traits. Future work that disentangles these mechanisms is vital for informing the integration of current findings into clinical settings. For example, identifying vertical pleiotropic effects would indicate that treatment of one disorder would have cascading effects on subsequent physical outcomes. If we take substance use disorders as an example, in line with a vertical mechanism, it could be that the use of substances has wide-ranging consequences that increase the risk for virtually all surveyed physical conditions. Alternatively, the observed patterns of genetic correlation may reflect horizontal pleiotropy via upstream processes that may reflect novel treatment targets for multimorbid presentations. Indeed, a horizontal explanation is just as plausible for the substance use factor, wherein some upstream process (e.g., genetically mediated personality characteristics) has far-reaching consequences across a range of psychiatric and physical outcomes. Future work should integrate cross-ancestry fine-mapping with functional genomics and multi-omic approaches to prioritize causal mechanisms, followed by causal mediation analyses to adjudicate vertical vs. horizontal pleiotropy.
Historically, comorbidity research on psychiatric and physical outcomes has confined itself to describing bivariate relationships within a single cohort, reducing the viability of multivariate approaches capable of assessing latent risk profiles. This research has revealed widespread genome-wide genetic correlations within and across psychiatric and physical disorders. Through the application of Genomic SEM and E-SEM, the current study concatenates a wide-ranging set of physical and psychiatric disorders into a single multivariate framework. Our findings uncover broad genetic liabilities spanning multiple physical and psychiatric domains, such as widespread genetic correlations across internalizing, neurodevelopmental, substance use disorders, and a wide range of physical illness traits. This study enriches our knowledge of the multivariate systems of relationships underlying complex multimorbidity, with implications for interpreting emergent genetic signals and considering additional nosological frameworks that capture the systems of interconnected risk between two theoretically dichotomized sets of disorders.
Methods
Trait selection
All research presented in this study complies with the institutional review board at the University of Colorado Boulder. A prior large-scale epidemiological investigation of psychiatric and physical comorbidity utilized diagnostic bins encompassing various physical conditions available in Danish national registries6. Building upon this, we curated an initial list of physical traits by intersecting the diagnostic bins from this prior study with the diagnostic categories described in the UK Biobank (UKB), which are based on the World Health Organization’s International Classification of Diseases and Related Health Problems (ICD) codes (versions 9 and 10). This resulted in nine diagnostic categories (neurological, respiratory, circulatory, digestive, endocrine/metabolic, genitourinary, musculoskeletal, cancer, and hematopoietic) being analyzed in the present study. However, the hematopoietic domain did not contain sufficient traits passing the quality control (QC) filters outlined below. As a result, we excluded hematopoietic disorders from further modeling. To focus on more common physical conditions that frequently coexist in the population, we restricted this list to traits with over 2000 cases in the UKB. The initial list included broad illness definitions (e.g., heart and valve disorders) for each physical trait and their subtypes (e.g., aortic aneurysm). Leveraging Genomic SEM’s capability to integrate traits from multiple samples, we updated this list by incorporating the most recent GWAS summary statistics for each trait from the GWAS Catalog (a comprehensive database of genomic studies - https://www.ebi.ac.uk/gwas/) for either (a) the broad disorder category obtained in the UKB or (b) a specific disorder within the broader category, depending on which had more cases. We utilize the number of cases rather than the total sample size due to the disproportionate number of controls for some traits. Because the UKB may not contain sufficient case numbers for theoretically relevant but statistically rare disorders, we surveyed the extant literature to assess for disorders theoretically relevant in their respective domain or exhibiting previous overlap with psychiatric disorders. We included coronary artery disease, venous thromboembolism, Crohn’s disease, Celiac disease, urinary incontinence, Duparten’s disease, psoriatic arthritis, melanoma, Alzheimer’s disease, Parkinson’s disease, amyotrophic lateral sclerosis, insomnia, and childhood obesity. Next, we pruned symptom measures (e.g., nonspecific chest pain, joint pain) overlapping with other disorders in their system and ambiguously specified disorders (e.g., other urethra/urinary tract disorders, synovium/tendon/bursa disorders-other) that likely contained symptoms represented by more specifically measured phenotypes. This allowed for more precisely defined factors at systems-level analyses and avoided model fit inflation from multicollinearity. Supplementary Data 7 contains a list of all disorders included in our modeling pipeline.
We utilized a previous psychiatric factor structure15—containing five correlated psychiatric dimensions - with specific indicators (major depression, bipolar disorder, and posttraumatic stress disorder) updated for newer publicly available GWAS. Encompassing compulsive, psychotic/thought, neurodevelopmental, internalizing, and substance use disorders, the psychiatric factors represent groupings of genome-wide genetic liability across 13 major psychiatric disorders, ascertained with internationally recognized diagnostic criteria and participant self-report of symptom presentation. Supplementary Data 8 lists the disorders defining each psychiatric factor, with specific ascertainment methods presented in the corresponding paper of each discovery GWAS.
LD-score regression
We applied a standard set of QC filters to all GWAS summary statistics using the munge function available within Genomic SEM. Munging GWAS summary statistics involves reducing the number of SNPs to HapMap3 SNPs and then pruning by minor allele frequency (MAF) less than 0.01 and an imputation score (INFO) below 0.9 when this information is available in the summary statistics. “Munged” summary statistics were then input for the multivariable version of LDSC implemented in Genomic SEM. LDSC regression models utilized 1000 Genomes Phase 3 European linkage disequilibrium (LD) weights for estimation. These weights exclude the major histocompatibility complex (MHC) as an intricate LD structure in this region that may bias estimates. Following the guidelines from original LDSC developers for producing interpreData genetic covariance estimates, we removed physical traits with SNP-based heritability (h2SNP) Z-statistics < 4.
LDSC produces two meaningful sets of output: the genetic covariance and sampling covariance matrix. The genetic covariance matrix contains SNP-based heritability estimates on the diagonal and genetic covariances on the off-diagonal. These estimates were converted to the liability scale, using the corresponding population prevalence from the original publication and the sum of effective sample sizes across contributing cohorts45. The sampling covariance matrix contains squared standard errors on the diagonal (e.g., the sampling variances) and sampling covariances on the off-diagonal, which reflect sampling dependencies that emerge when participant samples overlap across the included traits. The sampling covariance matrix is estimated directly from the GWAS data using a block jackknife resampling procedure. This matrix allows for appropriate parameter estimates and standard errors for traits with varying and unknown degrees of participant sample overlap and varying levels of statistical power across the included GWAS. SNP-based heritability Z-statistics ranged from 4.14 to 26.40 across included physical disorders.
Genomic SEM
For each physical illness domain, we began by supplying the genomic matrices from LDSC to Genomic SEM to fit a common factor in confirmatory factor analysis (CFA) to identify if this simple factor structure adequately recaptures the observed genetic covariance between the traits within each physical illness domain. All models were fit in Genomic SEM using the diagonally weighted least squares estimator (DWLS), which produces unbiased standard errors when GWAS vary in sample size and prioritizes reducing model misfit for genetic covariance estimates with greater precision. This could include, for example, prioritizing producing a lower factor loading for the best-powered GWAS if it has low genetic correlations with the other traits. If the common factor provided an adequate model fit (as determined by CFI > 0.9 and SRMR < 0.1), then this parsimonious, single-factor model was carried forward for that physical illness domain. If a common factor did not provide adequate fit, we conducted an exploratory factor analysis (EFA) using Genomic E-SEM (described below) to examine multifactorial solutions, a novel extension described and validated via extensive simulations in a separate section below. We determined the number of factors to model in each EFA using the median number of factors selected by the optimal coordinates, acceleration factor46, and Kaiser rule47 when applied to 1000 genetic covariance matrices simulated from the observed genetic covariance and sampling covariance matrices. This novel simulation-based approach to determining the number of genomic factors to model is introduced and validated below. Next, we modeled traits in the EFA with standardized factor loadings exceeding our preregistered cutoff of 0.4 in subsequent confirmatory models.
We assessed the associations between each physical illness and psychiatric disorders by modeling each physical illness domain model alongside five previously established psychiatric factors18. The inter-factor correlations within this model represent the degree to which shared signals captured by the physical and psychiatric factors are across factors. However, these factor correlations may not always sufficiently describe patterns of pairwise genome-wide risk sharing of the disorders across factors. We evaluated factor-level heterogeneity by utilizing the QFactor metric for each physical and psychiatric factor correlation. QFactor is a heterogeneity metric that becomes more significant when the factor correlation cannot recapture the individual associations across disorders. For example, if two disorders loading on one factor have directionally discordant associations with the disorders loading on another, a factor correlation insufficiently describes this divergent pattern of relationships. QFactor is a \({\chi }^{2}\) distributed test statistic with degrees of freedom equal to one less than the cross-factor indicator correlations. We also calculate pairwise associations to follow up on QFactor results. We first calculated the mean bivariate genetic correlation (rg) between each of the 13 psychiatric disorders and all other psychiatric and physical conditions. Next, we calculated the mean rg amongst all psychiatric and physical disorders.
Genomic E-SEM
Overview
Genomic exploratory structural equation modeling (E-SEM) is the first software developed for genomic exploratory factor analysis, providing a robust framework for identifying the genomic factor structure among a set of variables with limited or no a priori structure. This method enables the identification of latent variables from genomic data without predefining the entire factor structure. Moreover, Genomic E-SEM allows for incorporating subsets of traits with a predefined factor structure into exploratory analysis. Genomic E-SEM implements a weighted least squares (WLS) estimator, prioritizing the reduction of misfits in cells of the genetic covariance matrix that are more precisely estimated (e.g., have a smaller SE). Although a WLS estimator will produce reliable model parameter estimates with model identification, the “naïve” standard errors (SEs) and fit statistics generated in stage 2 estimation will be invalid from the full sampling covariance matrix \(({V}_{S})\) 16, not utilized when estimating either estimate. Therefore, robust correction techniques are necessary to obtain valid SEs and model fit statistics. We calculate the correct sampling covariance matrix (\({V}_{\theta }\)) in stage 2 estimation through a sandwich correction48,49:
$${V}_{\theta }={\left({\hat{\Delta }}^{{\prime} }{\Gamma }^{-1}\hat{\Delta }\right)}^{-1}{\hat{\Delta }}^{{\prime} }{\Gamma }^{-1}{V}_{S}{\Gamma }^{-1}\hat{\Delta }{\left({\hat{\Delta }}^{{\prime} }{\Gamma }^{-1}\hat{\Delta }\right)}^{-1}$$
(1)
where \(\hat{\Delta }\) is a matrix of model derivatives at each parameter estimate and \(\Gamma \) is the “naïve” weights matrix estimated in stage 2 before correction.
Simulations
We validated key parameters of Genomic E-SEM using simulated LDSC objects for 10% heriData phenotypes with a univariate LDSC intercept of 1.05 to model slight levels of population stratification. These simulations validated the sandwich-corrected SEs and determined the power to discern different population structures in an EFA conducted with Genomic E-SEM. For each model, we simulated participant samples of 50,000 or 500,000. We validated sandwich-corrected SEs on simulated population sizes of 50,000 and the power to discern different population structures on both sets of simulated population sizes. The first population-generating structure was a two-factor model, where each factor had three indicators with a standardized loading of 0.71. We ran three iterations of this model using inter-factor correlations of 0.3, 0.5, and 0.7, with 100 simulations per iteration (e.g., 300 simulations). The second was a single-factor model using standardized factor loadings across all indicators of either 0.32, 0.5, or 0.71 with 100 simulations per iteration (e.g., 300 simulations). The power to detect the ground-truth structures across the population-generating models is presented in Supplementary Fig. 11, with the validation of sandwich-corrected SEs presented in Supplementary Fig. 12. Across simulations, we find valid sandwich-corrected SEs, with ratios of sandwich-corrected SEs to empirical SEs (calculated as the standard deviation of the parameter estimates across the 100 simulations) close to 1. In addition, we find adequate power to discern population structures, with power scaling in the expected direction. That is, we observed greater power to detect the ground-truth factor structure as factor loadings increased in the case of a single-factor generating model and as inter-factor correlations increased in the case of a two-factor generating model. At a simulated population size of 50,000, power was \(\ge 78\%\) when the standardized inter-factor correlation was 0.5 in a two-factor generating model and when the standardized factor loading was 0.71 (~50% of the variance explained by the factor) in a common-factor generating model. When the simulated population size reached 500,000, power was \(\ge \)99% across all generating structures.
Incorporating sampling variability into factor retention heuristics
As the GWAS used as input to LDSC and Genomic SEM can vary with respect to statistical power, it is important to account for these precision differences for all steps in the modeling pipeline. A first step in EFA is determining the number of exploratory factors to model. Here, we introduce an approach to incorporate sampling variability (i.e., power differences) in applied factor retention heuristics using simulated genetic covariance matrices (s*):
where \(s\) is the vectorized form of the genetic covariance matrix, and \(L\) is the lower triangle Cholesky factor of the sampling covariance matrix \((V)\). We assume:
$${{\rm{z}}}\sim {{\mathscr{N}}}\left(0,{{\rm{I}}}\right)$$
(3)
so that, marginally:
$$s{{\mathscr{*}}}{{\mathscr{\sim }}}{{\mathscr{N}}}\left(s,V\right)$$
(4)
Three different factor retention heuristics (optimal coordinates, acceleration factor, Kaiser rule) can then be applied to each simulated matrix s*, with the median of each heuristic’s distribution used as the final retention criterion. By selecting the median factor count from simulated matrices that incorporate differences in power across GWAS of included traits, we capture the central tendency of sampling variability in our retention criterion rather than relying on a single, potentially unrepresentative estimate.
We validated the utility of incorporating sampling variation in determining the number of factors to model by first generating 100 covariance matrices for computational efficiency from an eight-factor generating model containing realistic variability in parameter estimates, variably powered indicators, and a dense implied correlation structure to obscure factor boundaries. For each of the 100 simulated covariance matrices, we then calculated factor retention heuristics using 100 perturbed covariance structures, with the median heuristic per dataset used to capture central tendency in the perturbed covariance matrices. We find that the applied factor retention heuristics on simulated perturbations of the genetic covariance matrix significantly reduce mean absolute error across all three heuristics (p’s < 0.05; see Supplementary Fig. 13).
Factor analysis across all physical illness traits
The physical illness domains were chosen based on prior epidemiological work on psychiatric and physical comorbidity6. These domains largely reflect physical illness grouped by the body system where the cardinal symptom or disease manifests, with the implicit assumption that these disorders will, in turn, have a shared etiology. Given prior associations with psychiatric illness observed in the phenotypic space using these illness domains, we retained these broad system-based categories for the primary analysis to produce a set of results that are comparable to extant phenotypic literature. In addition, we also conducted a follow-up analysis to explore the factor structure across all curated physical illness traits (e.g., the largest publicly available GWAS; h2SNP Z-statistic > 4) and subsequent relationships with the five psychiatric factors utilized in the primary analysis.
Our data-driven follow-up analysis involved multiple stages. First, we employed the acceleration factor, optimal coordinates, and Kaiser rule to determine the number of factors. Since each index suggested a different number of factors, we chose the most parsimonious factor structure—a common factor solution indicated by the acceleration factor. Next, we conducted a confirmatory factor analysis (CFA) on the 73 curated physical illness traits using Genomic SEM and applied a standardized loading cutoff of 0.6 to decide which traits to include in further modeling. We then fit two subsequent CFAs: (i) to assess the model fit for our data-driven common physical illness factor, and (ii) to evaluate the relationship between the common physical illness factor and the five psychiatric factors used in the primary analysis.
Multivariate GWAS and biological annotation of broadly transdiagnostic physical illness factor
We applied a standard set of QC filters to the individual-level GWAS using the sumstats function in GenomicSEM, which involved filtering SNPs with a minor allele frequency >1% present in the 1000 Genomes Phase 3 reference panel, aligning to the same reference allele, and ensuring that univariate GWAS results were all partially standardized with respect to the variance in the outcome (the physical illness) but not the predictor (the SNP). The userGWAS function in Genomic SEM16 was then used to perform a multivariate GWAS on the physical illness factor (expected sample size50 \(\hat{(N)}\) = 106,118.7) to assess the effects of 6,867,052 SNPs on the shared signal across the 21 physical disorders defining the factor. We utilized FUMA v1.5.251 to identify genomic risk loci with a significance threshold of p < 5 \(\times \,{10}^{-8}\). A genetic locus was defined as genome-wide significant SNPs not in linkage disequilibrium (LD) with other significant SNPs at an R² > 0.1 based on LD structure from the 1000 Genomes Phase 3 European ancestry reference panel. In addition, genomic loci within 250 kilobases (kb) of one another were merged into a single locus, as these proximal SNPs may not have been identified as in LD due to a mismatch between the physical illness GWAS samples and the reference panel. As a quality control step, we also estimated heterogeneity metrics for each SNP (QSNP52), which evaluate if SNP effects do not conform with the identified factor structure (e.g., if a SNP is only associated with one of the physical illnesses that defined the common factor). We pruned SNPs with a QSNP p-value < 5 \(\times \,{10}^{-8}\) and any SNPs within a one-megabase window upstream or downstream of a QSNP-significant SNP.
We then applied MAGMA v1.0853 to conduct gene property analyses, identifying tissue-specific gene expression, and gene set analyses, identifying previously established sets of genes with shared functional properties or associations with external traits. Gene expression data were obtained from GTEx v854 for 54 body tissues, and gene set data were obtained from MSigDB v755. No tissue-specific gene expression or gene sets passed our multiple-testing corrected significance threshold.
Phewas of physical illness factor in PMBB
First, a polygenic score weight file was generated from the summary statistics by using the “auto” setting of LDpred2 and a linkage disequilibrium reference matrix for HapMap3+ variants56. Principal component-normalized scores were then calculated for participants of the Penn Medicine BioBank (PMBB) using pgsc_calc57. The PMBB is a biobank that links to the electronic health records of consenting participants who receive care within the Penn Medicine health system57. Finally, a phenome-wide association study (PheWAS) was performed in individuals genetically similar to a European reference population in the Penn Medicine BioBank (PMBB) using age at enrollment, sex, and the first 10 principal components of genetic ancestry as covariates, testing for associations with disease phecodes.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The data sources for each of the GWAS summary statistics are available in Supplementary Data 7 for physical phenotypes and Supplementary Data 8 for psychiatric phenotypes. No new data were collected for this project, and all data utilized in this study were from publicly available sources. GWAS summary statistics for the physical illness factor are available at https://figshare.com/projects/Shared_Genetic_Liability_across_Systems_of_Psychiatric_and_Physical_Illness/223950.
Code availability
The latest versions of Genomic SEM and E-SEM are publicly available software and can be accessed at https://github.com/GenomicSEM/GenomicSEM. The exact version of all software utilized in this study can be accessed at https://zenodo.org/records/17526988.
References
Kessler, R. C., Chiu, W. T., Demler, O. & Walters, E. E. Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the National Comorbidity Survey Replication. Arch. Gen. Psychiatry 62, 617 (2005).
Caspi, A. & Moffitt, T. E. All for one and one for all: mental disorders in one dimension. Am. J. Psychiatry 175, 831–844 (2018).
Chowdhury, S. R., Chandra Das, D., Sunna, T. C., Beyene, J. & Hossain, A. Global and regional prevalence of multimorbidity in the adult population in community settings: a systematic review and meta-analysis. eClinicalMedicine 57, 101860 (2023).
Kendler, K. S. The dappled nature of causes of psychiatric illness: replacing the organic–functional/hardware–software dichotomy with empirically based pluralism. Mol. Psychiatry 17, 377–388 (2012).
Scott, K. M. et al. Association of mental disorders with subsequent chronic physical conditions: World Mental Health Surveys From 17 Countries. JAMA Psychiatry 73, 150 (2016).
Momen, N. C. et al. Association between mental disorders and subsequent medical conditions. N. Engl. J. Med. 382, 1721–1731 (2020).
Charlson, F. J. et al. Excess mortality from mental, neurological and substance use disorders in the Global Burden of Disease Study 2010. Epidemiol. Psychiatr. Sci. 24, 121–140 (2015).
Harris, C. & Barraclough, B. Excess mortality of mental disorder. Br. J. Psychiatry 173, 11–53 (1998).
Ali, S., Santomauro, D., Ferrari, A. J. & Charlson, F. Excess mortality in severe mental disorders: a systematic review and meta-regression. J. Psychiatr. Res 149, 97–105 (2022).
Saarni, S. I. et al. Impact of psychiatric disorders on health-related quality of life: general population survey. Br. J. Psychiatry 190, 326–332 (2007).
Surtees, P. G., Wainwright, N. W. J., Khaw, K. T. & Day, N. E. Functional health status, chronic medical conditions and disorders of mood. Br. J. Psychiatry 183, 299–303 (2003).
Grandes, G., Montoya, I., Arietaleanizbeaskoa, M. S., Arce, V. & Sanchez, A. on behalf of the MAS group. The burden of mental disorders in primary care. Eur. Psychiatry 26, 428–435 (2011).
Katon, W. et al. Depression and diabetes: a potentially lethal combination. J. Gen. Intern. Med. 23, 1571–1575 (2008).
Rudisch, B. & Nemeroff, C. B. Epidemiology of comorbid coronary artery disease and depression. Biol. Psychiatry 54, 227–240 (2003).
Grotzinger, A. D. et al. Genetic architecture of 11 major psychiatric disorders at biobehavioral, functional genomic and molecular genetic levels of analysis. Nat. Genet 54, 548–559 (2022).
Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Grotzinger, A. D. et al. Transcriptome-wide structural equation modeling of 13 major psychiatric disorders for cross-disorder risk and drug repurposing. JAMA Psychiatry 80, 811 (2023).
Breunig, S. et al. Examining the genetic links between clusters of immune-mediated diseases and psychiatric disorders. Transl Psychiatry. 15, 252 (2025).
De Jonge, P. et al. Associations between DSM-IV mental disorders and diabetes mellitus: a role for impulse control disorders and depression. Diabetologia 57, 699–709 (2014).
Gold, S. M. et al. Comorbid depression in medical diseases. Nat. Rev. Dis. Prim. 6, 69 (2020).
Jameson, N. D. et al. Medical comorbidity of attention-deficit/hyperactivity disorder in US adolescents. J. Child Neurol. 31, 1282–1289 (2016).
Malcomson, F. C. et al. Adherence to the 2018 World Cancer Research Fund (WCRF)/American Institute for Cancer Research (AICR) Cancer Prevention Recommendations and risk of 14 lifestyle-related cancers in the UK Biobank prospective cohort study. BMC Med. 21, 407 (2023).
Merikangas, K. R. et al. Comorbidity of physical and mental disorders in the neurodevelopmental genomics cohort study. Pediatrics. 135, e927–e938 (2015).
Carney, C. P., Jones, L. & Woolson, R. F. Medical comorbidity in women and men with schizophrenia: a population-based controlled study. J. Gen. Intern. Med. 21, 1133–1137 (2006).
Severance, E. G., Prandovszky, E., Castiglione, J. & Yolken, R. H. Gastroenterology issues in schizophrenia: why the gut matters. Curr. Psychiatry Rep. 17, 27 (2015).
Kessing, L. V., Ziersen, S. C., Andersen, P. K. & Vinberg, M. A nation-wide population-based longitudinal study mapping physical diseases in patients with bipolar disorder and their siblings. J. Affect Disord. 282, 18–25 (2021).
Sinha, A. et al. Medical comorbidities in bipolar disorder. Curr. Psychiatry Rep. 20, 36 (2018).
Green, A. I., Canuso, C. M., Brenner, M. J. & Wojcik, J. D. Detection and management of comorbidity in patients with schizophrenia. Psychiatr. Clin. North Am. 26, 115–139 (2003).
Guo, X. et al. Shared genetic architecture and bidirectional clinical risks within the psycho-metabolic nexus. eBioMedicine 111, 105530 (2025).
Tesfaye, M. et al. Shared genetic architecture between irritable bowel syndrome and psychiatric disorders reveals molecular pathways of the gut-brain axis. Genome Med. https://doi.org/10.1186/s13073-023-01212-4 (2023).
Isomura, K. et al. Risk of specific cardiovascular diseases in obsessive-compulsive disorder. J. Psychiatr. Res 135, 189–196 (2021).
Hambleton, A. et al. Psychiatric and medical comorbidities of eating disorders: findings from a rapid review of the literature. J. Eat. Disord. 10, 132 (2022).
Wootton, R. E. et al. Are there causal associations between obsessive-compulsive disorder and cardiometabolic phenotypes? A genetic correlation and bi-directional Mendelian randomization study. medRxiv https://doi.org/10.1101/2025.04.08.25325472 (2025).
Chavira, D. A., Garland, A. F., Daley, S. & Hough, R. The impact of medical comorbidity on mental health and functional health outcomes among children with anxiety disorders. J. Dev. Behav. Pediatr. 29, 394–402 (2008).
Dickey, B., Normand, S. L. T., Weiss, R. D., Drake, R. E. & Azeni, H. Medical morbidity, mental illness, and substance use disorders. Psychiatr. Serv. 53, 861–867 (2002).
Torgersen, K. et al. Shared genetic loci between depression and cardiometabolic traits. Flint J. PLOS Genet. 18, e1010161 (2022).
Zhu, Z. et al. Shared genetics of asthma and mental health disorders: a large-scale genome-wide cross-trait analysis. Eur. Respir. J. 54, 1901507 (2019).
World Health Organisation. The World Health Report: Reducing Risks, Promoting Healthy Life. (Geneva, 2002).
Ba, H., Zhang, L., Peng, H., He, X. & Wang, Y. Causal links between sedentary behavior, physical activity, and psychiatric disorders: a Mendelian randomization study. Ann. Gen. Psychiatry 23, 9 (2024).
Yuan, S., Yao, H. & Larsson, S. C. Associations of cigarette smoking with psychiatric disorders: evidence from a two-sample Mendelian randomization study. Sci. Rep. 10, 13807 (2020).
Werme, J. An integrated framework for local genetic correlation analysis. Nat Genet. 54, 274–282 (2022).
Fanelli, G. Local patterns of genetic sharing between neuropsychiatric and insulin resistance-related conditions. Transl. Psychiatry Published online 2025.
Reynolds, R. H. et al. Local genetic correlations exist among neurodegenerative and neuropsychiatric diseases. Npj Park Dis. 9, 70 (2023).
Grotzinger, A. D., Fuente, J. dela, Privé, F., Nivard, M. G. & Tucker-Drob, E. M. Pervasive downward bias in estimates of liability-scale heritability in genome-wide association study meta-analysis: a simple solution. Biol. Psychiatry 93, 29–36 (2023).
Raîche, G., Walls, T. A., Magis, D., Riopel, M. & Blais, J. G. Non-graphical solutions for Cattell’s scree test. Methodology 9, 23–29 (2013).
Kaiser, H. F. The application of electronic computers to factor analysis. Educ. Psychol. Meas. 20, 141–151 (1960).
Savalei, V. & Bentler, P. M. A two-stage approach to missing data: theory and application to auxiliary variables. Struct. Equ. Model. Multidiscip. J. 16, 477–497 (2009).
Savalei, V. Understanding robust corrections in structural equation modeling. Struct. Equ. Model. Multidiscip. J. 21, 149–160 (2014).
Mallard, T. T. et al. Multivariate GWAS of psychiatric disorders and their cardinal symptoms reveal two dimensions of cross-cutting genetic liabilities. Genetics https://doi.org/10.1101/603134 (2019).
Watanabe, K., Taskesen, E., Van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).
Foote, I. F. et al. Uncovering the multivariate genetic architecture of frailty with genomic structural equation modelling. Nat Genet. 57, 1848–1859 (2025).
De Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLOS Comput. Biol. 11, e1004219 (2015).
The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).
Privé, F., Albiñana, C., Arbel, J., Pasaniuc, B. & Vilhjálmsson, B. J. Inferring disease architecture and predictive ability with LDpred2-auto. Am. J. Hum. Genet. 110, 2042–2055 (2023).
Verma, A. et al. The Penn Medicine BioBank: towards a genomics-enabled learning healthcare system to accelerate precision medicine in a diverse population. J. Pers. Med. 12, 1974 (2022).
Acknowledgments
J.M.L. is supported by NIMH Grant T32MH016880. A.D.G. and L.S.S. are supported by NIH Grant R01MH120219. A.D.G. and I.F.F. are supported by NIA Grant R01AG073593. T.T.M. is supported by NIH grant K08MH135343. S.B. is supported by the Shurl and Kay Curci Foundation. We acknowledge the Penn Medicine BioBank (PMBB) for providing data and thank the patient-participants of Penn Medicine who consented to participate in this research program. We would also like to thank the Penn Medicine BioBank team and Regeneron Genetics Center for providing genetic variant data for analysis. The PMBB is approved under IRB protocol# 813913 and supported by the Perelman School of Medicine at the University of Pennsylvania, a gift from the Smilow family, and the National Center for Advancing Translational Sciences of the National Institutes of Health under CTSA award number UL1TR001878.
Ethics declarations
Competing interests
S.M.D. receives research support from RenalytixAI and in-kind research support from Novo Nordisk, both outside the scope of the current project. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Weihua Yue, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Lawrence, J.M., Foote, I.F., Breunig, S. et al. Shared Genetic Liability across Systems of Psychiatric and Physical Illness. Nat Commun 17, 2993 (2026). https://doi.org/10.1038/s41467-026-69218-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-026-69218-1




