Causal relationship between metabolic syndrome and gastric cancer: insights from comprehensive analysis and biomarker identification
Highlight box
Key findings
• This study used Mendelian randomization to provide the first genetic evidence that genetic susceptibility to metabolic syndrome (MetS) increased the risk of gastric cancer (GC) by approximately 62%.
• Through integrated analyses combining transcriptomics, weighted gene co-expression network analysis, protein-protein interaction networks, and machine learning, four key feature genes—CCNB1, NUF2, THBS2, and GSTM2—were identified. CCNB1, NUF2, and THBS2 were significantly upregulated in GC tissues, while GSTM2 was markedly downregulated.
• The Stepglm[backward] + XGBoost model achieved the best diagnostic performance with an average area under the curve of 0.93.
• Single-cell RNA sequencing and immune infiltration analyses revealed that these genes are mainly expressed in epithelial cells, fibroblasts, and immune cell subpopulations, participating in cell-cycle regulation, metabolic reprogramming, immune signaling, and extracellular matrix remodeling.
What is known and what is new?
• MetS contributes to the development of several malignancies. Its causal link with GC remained unclear due to confounding factors in observational studies.
• This study provides genetic evidence confirming a causal relationship between MetS and GC. It identifies four novel molecular biomarkers and elucidates the mechanisms through which metabolic abnormalities may reshape the immune microenvironment and promote tumor progression.
What is the implication, and what should change now?
• MetS should be recognized as a modifiable risk factor for GC.
• Enhanced GC screening and metabolic management should be implemented in populations with MetS.
• The identified genes should be utilized as potential diagnostic biomarkers and therapeutic targets.
• Metabolic, immune, and genetic profiling should be integrated to enable personalized prevention and treatment strategies.
• Multicenter prospective studies should be conducted to validate these findings and their clinical applications.
Introduction
According to GLOBOCAN 2022 statistics, gastric cancer (GC) accounted for approximately 960,784 new cases and about 660,175 deaths worldwide, ranking fifth globally in both incidence and mortality (1). H. pylori infection, unhealthy lifestyle (such as diet, smoking, and alcohol consumption), and genetic susceptibility have long been recognized as the primary risk factors for GC (2,3). In recent years, increasing attention has been directed toward the impact of metabolic syndrome (MetS) on the incidence and prognosis of GC. MetS is characterized by a constellation of metabolic abnormalities, encompassing central obesity, hypertension, hyperglycemia, and dyslipidemia, affecting approximately 20–25% of the adult population globally (4,5). Substantial evidence indicates that MetS and its components can elevate the risk of various malignancies. For instance, MetS has been associated with an increased risk of breast and colorectal cancers (6-8), while obesity has been strongly linked to a higher risk of upper gastrointestinal tumors such as adenocarcinoma of the gastroesophageal junction (9). These findings suggest that metabolic disorders may create favorable conditions for tumorigenesis, suggesting that MetS could promote the development of GC.
Current epidemiological studies examining the relationship between MetS and GC risk have yielded inconsistent results. A meta-analysis involving both European and Asian populations found no significant link between MetS and the overall incidence of GC (10). In contrast, a case-control study reported that individuals meeting the diagnostic criteria for MetS had an estimated 2.5-fold increase in the risk of developing GC compared to those without MetS (11). Discrepancies among study findings may be attributable to differences in population characteristics, diagnostic criteria for MetS, and methodological approaches. Moreover, conventional observational studies are particularly susceptible to residual confounding and reverse causality, given the complex metabolic, lifestyle, and inflammatory factors underlying both MetS and GC. To further elucidate the relationship between MetS and GC, this study employed Mendelian randomization (MR) to assess the causal effect of MetS on GC risk. MR uses genetic variants as instrumental variables (IVs) to evaluate the causal effect of an exposure on disease outcomes. Since alleles are randomly allocated at conception, similar to randomized controlled trials, MR can reduce confounding bias and avoid reverse causality (12).
Metabolic disturbances caused by MetS induce chronic inflammation, hyperinsulinemia, and aberrant cellular signaling, thereby fostering a favorable environment for tumorigenesis (13). For example, elevated circulating insulin levels can bind to insulin-like growth factor 1 (IGF-1) receptors on the surface of tumor stromal cells, activating signaling pathways such as PI3K/AKT and MAPK, ultimately promoting tumor cell proliferation (14). Against this background, investigating the molecular links between metabolic abnormalities and GC progression is of particular importance. More recently, high-throughput gene expression profiling technologies have been widely applied in GC research and have played a critical role in elucidating its molecular characteristics (15,16). However, most existing studies have focused on molecular alterations within the tumor itself, with a lack of systematic analyses addressing the interaction between MetS-related metabolic abnormalities and the molecular features of GC, leaving the underlying mechanisms largely unclear.
Unlike previous studies, this study integrated MR, transcriptomic analyses, and machine learning (ML)-based feature selection, and further leveraged single-cell transcriptomic data to characterize cellular origins and immune relevance, thereby systematically elucidating the molecular links between MetS and GC across causal, molecular, and cellular levels. These findings not only help clarify the potential molecular mechanisms by which metabolic disturbances promote gastric carcinogenesis and progression, but also provide a basis for early diagnosis in populations at high risk for MetS-associated GC, while laying a foundation for targeted prevention and subsequent mechanistic studies. We present this article in accordance with the STROBE-MR and TRIPOD reporting checklists (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2396/rc).
Methods
Study design and data sources
This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study design workflow is illustrated in Figure 1. Summary statistics for the genome-wide association study of MetS were obtained from the CNCR/CTGlab consortium, where MetS was diagnosed based on the International Diabetes Federation criteria (5), and genetic data for GC were sourced from the FinnGen R10, comprising 1,423 cases and 314,193 controls. All genetic data used for the MR analysis were derived from populations of European ancestry. Subsequently, we searched the Gene Expression Omnibus (GEO) database using the keywords “metabolic syndrome” or “gastric cancer” to retrieve gene expression datasets related to MetS and GC, applying the following inclusion criteria: (I) datasets derived from series; (II) expression profiling by array in Homo sapiens; and (III) inclusion of both controls and cases, with at least five samples per group. In total, one MetS dataset (GSE98895) and five GC datasets (GSE27342, GSE63089, GSE19826, GSE118916, GSE103236) were included. To enhance model generalizability, GSE27342 and GSE63089 were selected as the training datasets, and batch effect normalization was performed using the “sva” package in R. GSE19826 and GSE118916 served as independent test datasets, while GSE103236 was used as an external validation dataset. Finally, we searched the GEO database using the keywords “gastric cancer” and “single cell” to identify publicly available high-throughput sequencing-based Homo sapiens expression profiling datasets, and selected the GSE184198 for single-cell RNA sequencing (scRNA-seq) analysis (Table S1).
MR study
Following the STROBE-MR guidelines, we defined MetS as the exposure and GC as the outcome and conducted an MR study using the “TwoSampleMR” package in R (17). The selected genetic IVs were required to meet the following three core assumptions (18): (I) IVs are strongly associated with the exposure; (II) IVs are independent of any confounders influencing both the exposure and the outcome; and (III) IVs affect the outcome exclusively through the exposure. First, single nucleotide polymorphisms (SNPs) associated with MetS at the genome-wide significance level (P<5×10⁻8) were selected as IVs. The PLINK clumping algorithm (window =10,000 kb, r2<0.001) was used to ensure independence among SNPs, and the F-statistic was calculated, with F>10 indicating sufficient instrument strength (19). Steiger filtering was applied to remove SNPs potentially influencing GC directly rather than via MetS. Subsequently, we aligned the effect alleles for MetS and GC, excluded all palindromic sequences, and harmonized the data for MR analysis, with the inverse variance weighted (IVW) employed as the primary approach. The Steiger test was employed to exclude reverse causality. Finally, to ensure robustness of the results, Cochran’s Q test and funnel plots were applied to evaluate heterogeneity among SNPs; MR-Egger was used to assess horizontal pleiotropy; the “MRPRESSO” package identified outlier SNPs and corrected effect estimates; the leave-one-out (LOO) analysis was conducted to evaluate the impact of each individual SNP on the overall causal estimate; and an online tool (https://shiny.cnsgenomics.com/mRnd/) was employed to calculate the statistical power of the MR analysis, with power >80% indicating reliable results (20).
Identification of differentially expressed genes (DEGs) and weighted gene co-expression network analysis (WGCNA)
Prior to analysis, the expression matrix was log2-transformed and normalized to reduce technical bias, and the mean value was calculated for genes with multiple probes. DEGs in GC were identified using the “limma” package in R, applying thresholds of adjusted P<0.05 and |log2FC| >0.585, while visualization was carried out with the “ggplot2” and “pheatmap” packages. Meanwhile, WGCNA was performed using the “WGCNA” package to identify the gene modules most strongly associated with MetS (21). Low-variance genes (standard deviation <0.5) and outlier samples were first removed, after which an appropriate soft threshold (β) was selected to construct an adjacency matrix and transform it into a topological overlap matrix (TOM). Gene modules (>50 genes) were defined through hierarchical clustering based on 1 – TOM, followed by dynamic tree cutting and merging of modules with gene correlation >0.75. Finally, Pearson correlation analysis was applied to assess the associations between module genes and clinical traits.
Functional enrichment analysis and protein-protein interaction (PPI) network construction
The “venn” package in R was used to obtain intersecting genes between MetS and GC, and the “clusterProfiler” package was employed to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses on these intersecting genes. The enrichment results were visualized using the “circlize”, “ComplexHeatmap,” and “ggplot2” packages. Subsequently, the intersecting genes were used to construct a PPI network via the STRING database (https://cn.string-db.org/) with a minimum confidence score threshold of 0.400 (22). The results were then imported into Cytoscape software, where the “cytoHubba” plugin was employed to perform topological analysis based on node degree, and the top 15 hub genes were identified.
Multi-algorithm ML models construction and performance evaluation
We extracted gene expression data and clinical information from the training and test datasets, and integrated the top 15 hub genes into multiple ML methods for model construction. A total of 12 ML methods were employed, encompassing both feature selection and model construction stages (23). First, feature selection was performed using algorithms with feature-ranking capabilities, including Lasso, Random Forest, Stepglm, and glmBoost. Next, based on the selected features, predictive models were constructed using algorithms such as Ridge, XGBoost, Enet, plsRglm, GBM, Naive Bayes, LDA, and SVM. Stratified 10-fold cross-validation was performed, with hyperparameters optimized internally by the corresponding R packages using training folds only. The overall workflow systematically evaluated the performance of 113 algorithm combinations. Finally, models were ranked according to the average area under the receiver operating characteristic curve (AUC) across training and test datasets, and the optimal model was identified along with candidate genes (24). In addition, confusion matrices were generated, and classification performance and misclassification distribution across different cohorts were evaluated using accuracy, sensitivity, specificity, precision, and F1-score (25).
Expression analysis, diagnostic evaluation, and nomogram construction of feature genes
The “ggpubr” package in R was employed to assess expression differences of candidate genes between case and control groups across the training, test, and validation datasets. To further validate the results, we performed quantitative reverse transcription polymerase chain reaction (qRT-PCR) in biological triplicate, with relative expression calculated using the 2⁻ΔΔCq method and significance assessed by Student’s t-test (P<0.05). The cell lines used in this study included GES-1, AGS, HGC-27, and MKN-45, all of which were cultured under standard conditions (37 ℃, 5% CO2) in appropriate media. Total RNA was extracted using TransZol Up (TransGen Biotech, Beijing, China) according to the manufacturer’s protocol. The extracted RNA was reverse-transcribed into cDNA using the PrimeScript RT Kit (Takara Bio, Kusatsu, Shiga, Japan). qRT-PCR reactions were carried out with TB Green Premix (Takara) on a Bio-Rad real-time fluorescence quantitative PCR system (Bio-Rad Laboratories, Hercules, CA, USA). Primer sequences are listed in Table S2. Subsequently, the “pROC” package was employed to plot receiver operating characteristic (ROC) curves and calculate the AUC to evaluate the diagnostic performance of feature genes, with those having AUC >0.7 considered optimal for constructing predictive models (26). Finally, the selected feature genes were incorporated as predictors to construct a nomogram model using the “rms” package, and calibration curves were plotted to evaluate the concordance between predicted probabilities and observed outcomes.
Enrichment analysis of feature genes and construction of regulatory networks
Gene set enrichment analysis (GSEA) was performed based on MSigDB (https://www.gsea-msigdb.org/gsea/msigdb/) to explore the biological functions and signaling pathways associated with the feature genes, and the results were visualized using the “enrichplot” package in R. miRNA-mRNA interactions from the starBase database and transcription factor (TF)-mRNA regulatory relationships from the JASPAR database were retrieved via the NetworkAnalyst platform (https://www.networkanalyst.ca/) to construct a miRNA-mRNA-TF regulatory network, which was subsequently visualized with Cytoscape software (27).
Processing of scRNA-seq data and analysis of feature gene expression
Preprocessing of scRNA-seq data was performed using the “Seurat” package in R. Low-quality cells with detected genes <100 or mitochondrial gene proportion >15% were first removed (28). The data were log-normalized, and the top 1,500 highly variable genes were selected using variance stabilizing transformation. Principal component analysis (PCA) was then conducted for dimensionality reduction, followed by batch effect correction using the Harmony algorithm. Cell clustering was performed with the Louvain algorithm, and the clustering results were visualized using Uniform Manifold Approximation and Projection (UMAP). Heatmaps of the top 10 marker genes (|log2FC| >1, adjusted P<0.05) were generated. Finally, the “SingleR” package was applied for cell cluster annotation, and “Dotplot”, “FeaturePlot”, and “VlnPlot” functions from the Seurat package were employed to visualize the expression patterns of feature genes across different cell populations.
Analysis of immune cell infiltration
We used the “CIBERSORT” package in R to evaluate the relative abundance of 22 immune cell subtypes in case and control samples (29). Spearman correlation analysis was performed to calculate the correlation coefficients and corresponding P values between feature genes and immune cells. The “linkET” package was employed to construct a feature gene-immune cell correlation network.
Statistical analysis
All statistical analyses were performed using R software (version 4.5.1), and P values <0.05 were considered statistically significant.
Results
Causal relationship between MetS and GC
A total of 164 SNPs were included in the MR analysis, all with F-statistics >10, and Steiger filtering did not identify any SNPs with incorrect causal direction (Figure 2A,2B, Table S3). The IVW method demonstrated a significant association between the genetic liability to MetS and an increased risk of GC [odds ratio (OR): 1.62, 95% confidence interval (CI): 1.12–2.33, P=0.01], which was further supported by the weighted median (OR: 1.89, 95% CI: 1.03–3.47, P=0.04) (Figure 2A). The scatter plot showed highly consistent directions of causal estimates across the five MR methods (Figure 2C). The Steiger test further confirmed the correct causal direction (correct causal direction = TRUE, P<0.05). Meanwhile, Cochran’s Q test (MR-Egger: Q=156.70, P=0.60; IVW: Q=156.85, P=0.62) indicated no heterogeneity, and funnel plots revealed no obvious asymmetry (Figure 2D). The MR-Egger intercept (intercept =−0.0036, P=0.70) suggested no horizontal pleiotropy, and MR-PRESSO analysis (RSSobs =158.35, P=0.65) did not detect any outliers. LOO analysis demonstrated that the overall causal effect was not driven by any single SNP (Figure 2E). Finally, the statistical power was 89%, further supporting the reliability of the results.
Identification of DEGs and selection of key WGCNA module
We selected GSE27342 and GSE63089 as training datasets. PCA revealed batch differences prior to merging, which were effectively eliminated after correction (Figure 3A). A total of 1,712 DEGs were identified as associated with GC, including 964 upregulated genes and 748 downregulated genes (Figure 3B,3C, Table S4). Subsequently, WGCNA was performed on the GSE98895 dataset, and a scale-free network was constructed with β=6 and R2=0.9 as the criteria (Figure 3D). Based on the clustering dendrogram combined with the dynamic tree-cutting method, 16 gene modules were identified, among which the MEyellow module was most significantly associated with MetS (r=−0.66, P=3×10⁻6), containing a total of 1,314 genes (Figure 3E,3F, Table S5).
Functional enrichment analysis and identification of hub genes
A total of 72 intersecting genes were identified using a Venn diagram (Figure 4A). GO analysis indicated that these genes were predominantly enriched in biological processes, including chromatin remodeling, calcium signaling, cardiac function regulation, and extracellular matrix-receptor interactions (Figure 4B, Table S6). KEGG pathway analysis further revealed that these genes were predominantly enriched in metabolic detoxification, DNA repair, cardiac function, and intercellular signal transduction pathways (Figure 4C, Table S7). Subsequently, a PPI network comprising 29 nodes and 38 edges was constructed, from which 15 hub genes were screened (Figure 4D,4E, Table S8).
Identification of the optimal ML model and selection of feature genes
We constructed 113 ML algorithm combinations, among which the Stepglm[backward] + XGBoost model performed best, achieving an average AUC of 0.93 (Figure 5A). Specifically, the AUC was 0.985 (95% CI: 0.973–0.994) in the training dataset, 0.938 (95% CI: 0.831–1.000) in the test dataset GSE118916, and 0.867 (95% CI: 0.706–0.983) in the test dataset GSE19826 (Figure 5B-5D). Moreover, it maintained high accuracy across the training and both test datasets, with only a few misclassifications (Figure 5E-5G, Table S9). Based on this optimal model, six candidate genes—CCNB1, CDH2, GSTM2, NUF2, SNTA1, and THBS2—were identified (Table S10).
Expression analysis, diagnostic evaluation, and nomogram construction of feature genes
In the case groups of the training, test, and validation datasets, CCNB1, NUF2, and THBS2 were significantly upregulated, whereas GSTM2 was significantly downregulated. In contrast, SNTA1 exhibited no significant difference in the test datasets (GSE118916: P=0.217; GSE19826: P=0.183), and CDH2 showed no significant difference in the validation dataset (P=0.054); therefore, both were excluded (Figure 6A). The qRT-PCR results were consistent with the analysis, confirming CCNB1, NUF2, THBS2, and GSTM2 as feature genes (Figure 6B, Table S11). Among them, THBS2 (AUC: 0.823–0.922) and NUF2 (AUC: 0.794–0.902) consistently exhibited the best diagnostic performance, while CCNB1 (AUC: 0.744–0.849) and GSTM2 (AUC: 0.739–0.922) also demonstrated discriminative ability, albeit inferior to the former two (Figure 6C). Finally, a nomogram model incorporating the four feature genes was constructed (C-index =0.885, calibration slope =0.967), and the calibration curve demonstrated a high degree of concordance between predicted probabilities and observed outcomes (Figure 6D).
GSEA analysis and construction of the miRNA-mRNA-TF regulatory network
Figure 7A-7D illustrates the top five significantly enriched signaling pathways identified by GSEA for each feature gene in the high- and low-expression groups. Overall, the four feature genes were mainly enriched in critical biological processes, including cell cycle regulation, DNA repair, metabolic reprogramming, immune signaling, and extracellular matrix interactions (Table S12). Meanwhile, a total of 28 miRNAs and 21 TFs were predicted, and a miRNA-mRNA-TF regulatory network centered on the four feature genes was constructed (Figure 7E).
scRNA-seq analysis reveals differential expression of feature genes across distinct cell populations
After stringent quality control, a total of 17,927 cells were identified (Figure 8A). PCA was performed based on the top 1,500 highly variable genes, and the first 20 principal components were statistically significant (P<0.05) (Figure 8B,8C). Following UMAP dimensionality reduction and clustering analysis, 14 cell clusters were identified and annotated into nine cell types based on marker genes (Table S13), including 1,250 epithelial cells, 358 fibroblasts, 249 endothelial cells, 238 neutrophils, 381 monocytes, 114 hematopoietic stem cells, 2,129 B cells, 4,551 CD4+ T cells, and 7,657 CD8+ T cells (Figure 8D-8F, Table S14). Subsequently, the expression patterns of the four feature genes were assessed across distinct cell populations (Figure 8G-8I). CCNB1 and GSTM2 were found to be mainly expressed in epithelial cells. However, in GC-derived cells, CCNB1 was significantly upregulated in CD4+/CD8+ T cells and epithelial cells, whereas GSTM2 was significantly downregulated in epithelial cells (P<0.05). THBS2 was primarily enriched in fibroblasts and exhibited high expression in the GC group (P<0.05). NUF2 showed an overall low expression level but was significantly upregulated in CD4+/CD8+ T cells and significantly downregulated in epithelial cells in the GC group (P<0.05).
Characteristics of immune cell infiltration
As shown in Figure 9A,9B, significant differences in immune profiles were observed between the case and control groups. Compared with the control group, GC tissues exhibited a marked increase in resting NK cells, resting dendritic cells, M0/M1/M2 macrophages, and activated CD4+ memory T cells, while plasma cells, CD8+ T cells, resting CD4+ memory T cells, activated NK cells, monocytes, and resting mast cells were significantly reduced. The immune cell correlation heatmap revealed a distinct immune profile and intercellular interaction pattern in the GC group (Figure 9C). Further analysis of the relationships between the four feature genes and immune cells showed that CCNB1, NUF2, and THBS2 were significantly associated with various pro-inflammatory and antigen-presenting cells (Figure 9D,9E, Table S15). Specifically, CCNB1 and NUF2 exhibited broadly consistent trends across multiple immune cell subtypes. Both were positively correlated with M0/M1 macrophages, activated and resting dendritic cells, and activated CD4+ memory T cells, while being negatively correlated with memory B cells (P<0.05). THBS2 was primarily positively correlated with M0/M1/M2 macrophages, neutrophils, and resting mast cells, while showing negative correlations with resting CD4+ memory T cells, eosinophils, and activated mast cells (P<0.05). In contrast, GSTM2 displayed an opposite trend, suggesting that its downregulation may impair antigen presentation and T-cell activation (P<0.05).
Discussion
Accumulating evidence underscores the critical role of MetS in the development of GC (30-32). However, previous findings remain inconsistent due to the influence of confounding factors. In this study, we provided genetic evidence through MR, showing that genetic susceptibility to MetS increases the risk of GC by approximately 62%. Nevertheless, MR alone cannot elucidate the underlying biological mechanisms. To address this, we integrated bioinformatics, ML, and scRNA-seq into a multi-omics framework to systematically investigate the causal relationship between MetS and GC, as well as the molecular mechanisms involved.
At the molecular level, we identified the top 15 hub genes associated with MetS-related GC through DEGs, WGCNA, and PPI network analyses. By integrating 113 ML algorithm combinations, we determined the optimal model and extracted candidate genes. Validation across multiple GEO datasets and in vitro qRT-PCR ultimately confirmed four feature genes. Among them, CCNB1, NUF2, and THBS2 were significantly upregulated in the case group, while GSTM2 was markedly downregulated. These genes all demonstrated high diagnostic value and were broadly implicated in essential biological processes, including extracellular matrix interactions, metabolic reprogramming, immune signaling, and cell cycle regulation. scRNA-seq analysis showed that these genes exhibited specific expression patterns in epithelial cells, fibroblasts, and immune cells, which were highly consistent with the results of immune infiltration analysis, suggesting that they may contribute not only to intrinsic tumor cell abnormalities but also to GC progression through remodeling of the immune microenvironment. In summary, this multi-omics research framework overcomes the limitations of single techniques, and further provides new systems biology insights into the pathogenesis of MetS-related GC.
CCNB1 is a key regulator of the cell cycle, driving the transition of cells from the G2 phase to mitosis (33). Previous studies have confirmed that CCNB1 is highly expressed in various cancers and is associated with poor prognosis (34-36); it is also significantly upregulated in GC tissues, consistent with our findings (37). Notably, CCNB1 may play a critical role in MetS-related GC. Patients with MetS typically have elevated insulin and IGF-1, which activate downstream PI3K/Akt and Ras/MAPK pathways to promote the expression of cyclins (38-40). In high-glucose and high-insulin environments, intracellular CCNB1 levels increase, promoting excessive proliferation of gastric epithelial cells and creating conditions conducive to tumorigenesis (40-42). Our GSEA results indicated that the high-CCNB1 expression group was significantly enriched in cell proliferation-related pathways, and scRNA-seq analysis further confirmed its high expression in GC epithelial cells. These findings suggest that sustained mitogenic signaling induced by abnormal metabolic states may lead to the activation of CCNB1 and other cell cycle genes in gastric epithelial cells, thereby accelerating tumor initiation and progression. In addition, M1 macrophages enriched in the inflammatory microenvironment can promote tumor cell proliferation and CCNB1 expression by secreting cytokines such as IL-6 and TNF-α, forming a positive feedback loop between immune inflammation and tumor proliferation (43,44). Immune infiltration analysis also revealed significant M1 macrophage infiltration in GC tissues with high CCNB1 expression, corroborating the above mechanism. These findings indicate that CCNB1 may function as a potential predictive biomarker and immunotherapeutic target in MetS-related GC, promoting tumor progression through the “metabolic abnormality-immune inflammation-tumor proliferation” axis.
THBS2 is an extracellular matrix glycoprotein that may act as a bridge through fibroblasts in MetS-related GC. MetS is commonly accompanied by tissue fibrosis exemplified by hepatic fibrosis in non-alcoholic fatty liver disease (NAFLD), where chronic inflammation and insulin resistance can activate fibroblasts and induce extracellular matrix remodeling (45). Studies have demonstrated that plasma THBS2 levels are positively associated with the severity of NAFLD and MetS, suggesting that metabolic stress may stimulate fibroblasts to secrete matrix proteins such as THBS2 (45,46). In the GC microenvironment, cancer-associated fibroblasts are significantly enriched and secrete THBS2, which binds to tumor cell receptors (e.g., integrins) through exosomes or the extracellular matrix, activating the TGF-β, FAK, and PI3K/AKT pathways to promote epithelial-mesenchymal transition, cell invasion, proliferation, immune evasion, and regulation of angiogenesis (47-49). Our study also confirmed that THBS2 is highly expressed in GC, particularly in fibroblasts, and that the high-expression group is enriched in fibroblast-associated signaling pathways, including extracellular matrix-receptor interaction, focal adhesion, and cytokine-receptor interaction. These findings indicate that metabolic abnormalities drive fibroblasts to secrete high levels of THBS2, thereby promoting GC progression through microenvironmental remodeling.
NUF2 is another upregulated gene we identified, encoding a key component of the kinetochore complex. Previous studies have shown that NUF2 is highly expressed in various malignancies (50-52), and NUF2 knockdown in GC cells can lead to G2/M phase arrest and suppressed proliferation, consistent with our findings (53). Although no direct evidence currently links NUF2 to MetS, given that it functions as a cell cycle effector similar to CCNB1, the sustained mitogenic signaling and metabolic stress induced by MetS may indirectly drive its aberrant expression. Notably, NUF2 and CCNB1 exhibited highly consistent patterns in GSEA signaling pathways, scRNA-seq, and immune infiltration analyses, suggesting that they may synergistically promote the development of MetS-related GC.
GSTM2 belongs to the phase II detoxification enzyme family and catalyzes the conjugation of glutathione with toxic electrophilic compounds to mitigate oxidative stress and genotoxic damage. Deficiency of detoxification enzymes increases cancer risk. For instance, individuals with GST gene deletions are more prone to gastrointestinal tumors upon exposure to environmental carcinogens, due to reduced detoxification capacity and accumulation of mutagens (54). Studies have confirmed that molecules targeting GSTM2 degradation, such as the E3 ubiquitin ligase SMURF1, can accelerate GC progression, suggesting a potential tumor-suppressive role of GSTM2 (55). Against this background, our study found that GSTM2 was significantly downregulated in GC tissues, particularly within epithelial cells. Considering that MetS patients are chronically exposed to environments of high oxidative stress and lipid peroxidation, low expression of GSTM2 in gastric epithelial cells may exacerbate cellular damage and accelerate malignant transformation. Moreover, we observed that in tumor microenvironments with low GSTM2 expression, antigen-presenting cells and effector T cells were reduced, while immunosuppressive and progenitor-like cells increased, suggesting that GSTM2 downregulation may promote immune evasion by impairing immune activation (56). For the first time, we revealed that GSTM2 downregulation may represent an adaptive change in GC under MetS-related conditions of chronic oxidative stress and metabolic burden, further supporting its significance as a potential protective biomarker.
miRNAs predominantly regulate gene expression at the post-transcriptional level, whereas TFs govern the transcriptional process, and both play critical roles in diverse biological processes and disease development (57). Based on this, we constructed a miRNA-mRNA-TF regulatory network centered on the four feature genes and revealed several potential key TFs, including SP1, NFYA, and E2F1, that may contribute to the progression of MetS-related GC. For instance, E2F1 promotes tumor cell proliferation and invasion by driving glycolysis and lipid metabolic reprogramming, leading to lactate accumulation and the establishment of an immunosuppressive microenvironment (58,59). NFYA cooperates with SREBP1 to upregulate glycolytic genes such as HK2 and PFKFB3, enhancing glucose utilization in tumor cells and accelerating the metabolic adaptation and malignant growth of GC (60). Moreover, hyperglycemic conditions can amplify SP1-mediated metabolic reprogramming, thereby accelerating tumor progression (61). Meanwhile, miR-7, miR-124, miR-30b/30e, and miR-182 can promote tumor cell proliferation and migration through the activation of distinct signaling pathways (62-65). Therefore, the feature genes we identified may jointly drive the progression of MetS-related GC by regulating TFs and miRNAs, although their precise molecular mechanisms require further experimental validation.
This study has several limitations that should be acknowledged. First, the integration of MetS-related and GC tissue transcriptomes is inherently indirect, as systemic metabolic alterations may not be directly reflected in tumor-specific gene expression. Accordingly, this study is more suitable for identifying molecular features in the context of metabolic dysregulation for risk stratification and diagnostic prediction, while providing clues for potential mechanistic investigations. Second, although MR analyses provided evidence supporting a genetic causal association between MetS and GC, the results may be influenced by the source of IVs, population differences, and tumor heterogeneity. Meanwhile, this study systematically evaluated 113 combinations of ML algorithms to identify the optimal predictive model, with its performance validated through stratified cross-validation and independent datasets. Nevertheless, potential overfitting could not be entirely excluded in the context of high-dimensional features. Third, gene expression analyses were based on publicly available microarray datasets, which are subject to batch effects, probe design, and sample size limitations, potentially leading to the omission or bias of certain key genes. Moreover, the identified feature genes and the constructed model were primarily based on retrospective data and limited experimental validation, with a lack of systematic evaluation of protein expression and post-translational modifications. Therefore, future studies should integrate multicenter clinical cohorts and comprehensive functional experiments to further validate the clinical utility and molecular mechanisms of the identified feature genes.
Conclusions
In summary, this study first provided genetic evidence supporting an association between genetic susceptibility to MetS and the risk of GC. Building on this, multi-omics transcriptomic data were further integrated to systematically screen molecular features of MetS-associated GC, ultimately identifying four feature genes—CCNB1, NUF2, THBS2, and GSTM2—and constructing a robust predictive model, while also revealing the potential roles of these genes in tumor proliferation, tumor microenvironment remodeling, and immune regulation. These findings may facilitate risk stratification and early diagnosis in populations at high risk for MetS-associated GC, and provide a foundation for targeted preventive strategies and related mechanistic studies.
Acknowledgments
We first thank the CNCR/CTGlab, FinnGen, and GEO databases for providing the original data used in this study. We also acknowledge the Medical Innovation Center of The First Affiliated Hospital of Nanchang University for providing the experimental platform.
Footnote
Reporting Checklist: The authors have completed the STROBE-MR and TRIPOD reporting checklists. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2396/rc
Data Sharing Statement: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2396/dss
Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2396/prf
Funding: This work was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2396/coif). All authors report that this work was supported by the Ganpo Talent 555 Project of Jiangxi Province. The authors have no other conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
- Huang RJ, Laszkowska M, In H, et al. Controlling Gastric Cancer in a World of Heterogeneous Risk. Gastroenterology 2023;164:736-51. [Crossref] [PubMed]
- Choi IJ, Kim CG, Lee JY, et al. Family History of Gastric Cancer and Helicobacter pylori Treatment. N Engl J Med 2020;382:427-36. [Crossref] [PubMed]
- Saklayen MG. The Global Epidemic of the Metabolic Syndrome. Curr Hypertens Rep 2018;20:12. [Crossref] [PubMed]
- van Walree ES, Jansen IE, Bell NY, et al. Disentangling Genetic Risks for Metabolic Syndrome. Diabetes 2022;71:2447-57. [Crossref] [PubMed]
- Deng L, Liu T, Liu CA, et al. The association of metabolic syndrome score trajectory patterns with risk of all cancer types. Cancer 2024;130:2150-9. [Crossref] [PubMed]
- Zhou L, Gao H, Zhang J, et al. Metabolic syndrome and cancer risk: a two-sample Mendelian randomization study of European ancestry. Int J Surg 2025;111:311-21. [Crossref] [PubMed]
- Ma Z, Wang S, Liu S, et al. Metabolic syndrome in colorectal cancer liver metastasis: metabolic reprogramming and microenvironment crosstalk. Front Immunol 2025;16:1653442. [Crossref] [PubMed]
- Olsen CM, Pandeya N, Green AC, et al. Population attributable fractions of adenocarcinoma of the esophagus and gastroesophageal junction. Am J Epidemiol 2011;174:582-90. [Crossref] [PubMed]
- Mariani M, Sassano M, Boccia S. Metabolic syndrome and gastric cancer risk: a systematic review and meta-analysis. Eur J Cancer Prev 2021;30:239-50. [Crossref] [PubMed]
- Li F, Du H, Li S, et al. The Association Between Metabolic Syndrome and Gastric Cancer in Chinese. Front Oncol 2018;8:326. [Crossref] [PubMed]
- Richmond RC, Davey Smith G. Mendelian Randomization: Concepts and Scope. Cold Spring Harb Perspect Med 2022;12:a040501. [Crossref] [PubMed]
- O’Neill S, O’Driscoll L. Metabolic syndrome: a closer look at the growing epidemic and its associated pathologies. Obes Rev 2015;16:1-12. [Crossref] [PubMed]
- Gallagher EJ, LeRoith D. Hyperinsulinaemia in cancer. Nat Rev Cancer 2020;20:629-44. [Crossref] [PubMed]
- Sheng T, Ho SWT, Ooi WF, et al. Integrative epigenomic and high-throughput functional enhancer profiling reveals determinants of enhancer heterogeneity in gastric cancer. Genome Med 2021;13:158. [Crossref] [PubMed]
- Song WM, Elmas A, Farias R, et al. Multiscale protein networks systematically identify aberrant protein interactions and oncogenic regulators in seven cancer types. J Hematol Oncol 2023;16:120. [Crossref] [PubMed]
- Skrivankova VW, Richmond RC, Woolf BAR, et al. Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization: The STROBE-MR Statement. JAMA 2021;326:1614-21. [Crossref] [PubMed]
- Boef AG, Dekkers OM, le Cessie S. Mendelian randomization studies: a review of the approaches used and the quality of reporting. Int J Epidemiol 2015;44:496-511. [Crossref] [PubMed]
- Pierce BL, Ahsan H, Vanderweele TJ. Power and instrument strength requirements for Mendelian randomization studies using multiple genetic variants. Int J Epidemiol 2011;40:740-52. [Crossref] [PubMed]
- Brion MJ, Shakhbazov K, Visscher PM. Calculating statistical power in Mendelian randomization studies. Int J Epidemiol 2013;42:1497-501. [Crossref] [PubMed]
- Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008;9:559. [Crossref] [PubMed]
- Antonietti M, Gonzalez DJT, Djulbegovic M, et al. Intrinsic disorder in PRAME and its role in uveal melanoma. Cell Commun Signal 2023;21:222. [Crossref] [PubMed]
- Chen B, Sun X, Huang H, et al. An integrated machine learning framework for developing and validating a diagnostic model of major depressive disorder based on interstitial cystitis-related genes. J Affect Disord 2024;359:22-32. [Crossref] [PubMed]
- Xu L, Chen S, Fu W, et al. Environmental toxicant 2,3,7,8-tetrachlorodibenzo-p-dioxin induces non-obstructive azoospermia: New insights from network toxicology, integrated machine learning, and biomolecular modeling. Ecotoxicol Environ Saf 2025;295:118173. [Crossref] [PubMed]
- Hogue SC, Chen F, Brassard G, et al. Pharmacists’ perceptions of a machine learning model for the identification of atypical medication orders. J Am Med Inform Assoc 2021;28:1712-8. [Crossref] [PubMed]
- Zou C, Su L, Pan M, et al. Exploration of novel biomarkers in Alzheimer’s disease based on four diagnostic models. Front Aging Neurosci 2023;15:1079433. [Crossref] [PubMed]
- Chen Y, Tang Z, Tang Z, et al. Identification of core immune-related genes CTSK, C3, and IFITM1 for diagnosing Helicobacter pylori infection-associated gastric cancer through transcriptomic analysis. Int J Biol Macromol 2025;287:138645. [Crossref] [PubMed]
- Tan H, Zhu F, Yan H, et al. Genetic Associations of Clonal Hematopoiesis With Cardioembolic Stroke: Insights From Genome-Wide Mendelian Randomization, Bulk RNA, Single-Cell RNA Sequencing. CNS Neurosci Ther 2025;31:e70515. [Crossref] [PubMed]
- Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 2015;12:453-7. [Crossref] [PubMed]
- Huang D, Shin WK, De la Torre K, et al. Association between metabolic syndrome and gastric cancer risk: results from the Health Examinees Study. Gastric Cancer 2023;26:481-92. [Crossref] [PubMed]
- Wu E, Wei GF, Li Y, et al. Serum urea concentration and risk of 16 site-specific cancers, overall cancer, and cancer mortality in individuals with metabolic syndrome: a cohort study. BMC Med 2024;22:536. [Crossref] [PubMed]
- Rothwell JA, Jenab M, Karimi M, et al. Metabolic Syndrome and Risk of Gastrointestinal Cancers: An Investigation Using Large-scale Molecular Data. Clin Gastroenterol Hepatol 2022;20:e1338-52. [Crossref] [PubMed]
- Wang Z, Fan M, Candas D, et al. Cyclin B1/Cdk1 coordinates mitochondrial respiration for cell-cycle G2/M progression. Dev Cell 2014;29:217-32. [Crossref] [PubMed]
- Jin X, He X, Huang R, et al. SNRPB/CCNB1 axis promotes hepatocellular carcinoma progression and cisplatin resistance through enhancing lipid metabolism reprogramming. J Exp Clin Cancer Res 2025;44:211. [Crossref] [PubMed]
- Aljohani AI, Toss MS, Green AR, et al. The clinical significance of cyclin B1 (CCNB1) in invasive breast cancer with emphasis on its contribution to lymphovascular invasion development. Breast Cancer Res Treat 2023;198:423-35. [Crossref] [PubMed]
- Dai P, Xiong L, Wei Y, et al. A pancancer analysis of the oncogenic role of cyclin B1 (CCNB1) in human tumors. Sci Rep 2023;13:16226. [Crossref] [PubMed]
- Lu XQ, Zhang JQ, Zhang SX, et al. Identification of novel hub genes associated with gastric cancer using integrated bioinformatics analysis. BMC Cancer 2021;21:697. [Crossref] [PubMed]
- Lu CC, Chu PY, Hsia SM, et al. Insulin induction instigates cell proliferation and metastasis in human colorectal cancer cells. Int J Oncol 2017;50:736-44. [Crossref] [PubMed]
- Klement RJ, Fink MK. Dietary and pharmacological modification of the insulin/IGF-1 system: exploiting the full repertoire against cancer. Oncogenesis 2016;5:e193. [Crossref] [PubMed]
- Grabiec K, Gajewska M, Milewska M, et al. The influence of high glucose and high insulin on mechanisms controlling cell cycle progression and arrest in mouse C2C12 myoblasts: the comparison with IGF-I effect. J Endocrinol Invest 2014;37:233-45. [Crossref] [PubMed]
- Yoshizawa N, Yamaguchi H, Yamamoto M, et al. Gastric carcinogenesis by N-Methyl-N-nitrosourea is enhanced in db/db diabetic mice. Cancer Sci 2009;100:1180-5. [Crossref] [PubMed]
- Yu J, Hu D, Wang L, et al. Hyperglycemia induces gastric carcinoma proliferation and migration via the Pin1/BRD4 pathway. Cell Death Discov 2022;8:224. [Crossref] [PubMed]
- Wang XH, Hong X, Zhu L, et al. Tumor necrosis factor alpha promotes the proliferation of human nucleus pulposus cells via nuclear factor-κB, c-Jun N-terminal kinase, and p38 mitogen-activated protein kinase. Exp Biol Med (Maywood) 2015;240:411-7. [Crossref] [PubMed]
- Wei XM, Lu SC, Li L, et al. Norcantharidin promotes M1 macrophage polarization and suppresses colorectal cancer growth. Acta Pharmacol Sin 2025;46:2820-34. [Crossref] [PubMed]
- Peiseler M, Schwabe R, Hampe J, et al. Immune mechanisms linking metabolic injury to inflammation and fibrosis in fatty liver disease - novel insights into cellular communication circuits. J Hepatol 2022;77:1136-60. [Crossref] [PubMed]
- Wu X, Cheung CKY, Ye D, et al. Serum Thrombospondin-2 Levels Are Closely Associated With the Severity of Metabolic Syndrome and Metabolic Associated Fatty Liver Disease. J Clin Endocrinol Metab 2022;107:e3230-40. [Crossref] [PubMed]
- Li Y, Zheng Y, Huang J, et al. CAF-macrophage crosstalk in tumour microenvironments governs the response to immune checkpoint blockade in gastric cancer peritoneal metastases. Gut 2025;74:350-63. [Crossref] [PubMed]
- Nan P, Dong X, Bai X, et al. Tumor-stroma TGF-β1-THBS2 feedback circuit drives pancreatic ductal adenocarcinoma progression via integrin α(v)β(3)/CD36-mediated activation of the MAPK pathway. Cancer Lett 2022;528:59-75. [Crossref] [PubMed]
- Zhou X, Han J, Zuo A, et al. THBS2 + cancer-associated fibroblasts promote EMT leading to oxaliplatin resistance via COL8A1-mediated PI3K/AKT activation in colorectal cancer. Mol Cancer 2024;23:282. [Crossref] [PubMed]
- Shan J, Jiang W, Chang J, et al. NUF2 Drives Cholangiocarcinoma Progression and Migration via Inhibiting Autophagic Degradation of TFR1. Int J Biol Sci 2023;19:1336-51. [Crossref] [PubMed]
- Bao L, Gong Y, Che Y, et al. Maintenance of magnesium homeostasis by NUF2 promotes protein synthesis and anaplastic thyroid cancer progression. Cell Death Dis 2024;15:656. [Crossref] [PubMed]
- Jiang X, Jiang Y, Luo S, et al. Correlation of NUF2 Overexpression with Poorer Patient Survival in Multiple Cancers. Cancer Res Treat 2021;53:944-61. [Crossref] [PubMed]
- Long B, Zhou H, Xiao L, et al. Targeting NUF2 suppresses gastric cancer progression through G2/M phase arrest and apoptosis induction. Chin Med J (Engl) 2024;137:2437-51. [Crossref] [PubMed]
- Lao X, Peng Q, Lu Y, et al. Glutathione S-transferase gene GSTM1, gene-gene interaction, and gastric cancer susceptibility: evidence from an updated meta-analysis. Cancer Cell Int 2014;14:127. [Crossref] [PubMed]
- Ma G, Li J, Lu Y, et al. Suppressing SMURF1 to preserve GSTM2: An approach to reducing gastric cancer aggressiveness in vitro and in vivo. Histol Histopathol 2026;41:505-16. [PubMed]
- Zhang W, Shi Y, Niu S, et al. Integrated computer analysis and a self-built Chinese cohort study identified GSTM2 as one survival-relevant gene in human colon cancer potentially regulating immune microenvironment. Front Oncol 2022;12:881906. [Crossref] [PubMed]
- Eslammanesh T, Mirshekari A, Dahmardeh N, et al. Mechanisms of miRNAs (MicroRNAs) and Their Expression in Gastric Cancer. Arch Razi Inst 2025;80:217-24. [PubMed]
- Jiang X, Chen Z, Zhu J, et al. E2F1 promotes Warburg effect and cancer progression via upregulating ENO2 expression in Ewing sarcoma. Mol Med Rep 2022;26:237. [Crossref] [PubMed]
- Zhu D, Jiang Y, Cao H, et al. Lactate: A regulator of immune microenvironment and a clinical prognosis indicator in colorectal cancer. Front Immunol 2022;13:876195. [Crossref] [PubMed]
- Wu S, Zhang H, Gao C, et al. Hyperglycemia Enhances Immunosuppression and Aerobic Glycolysis of Pancreatic Cancer Through Upregulating Bmi1-UPF1-HK2 Pathway. Cell Mol Gastroenterol Hepatol 2022;14:1146-65. [Crossref] [PubMed]
- Vizcaíno C, Mansilla S, Portugal J. Sp1 transcription factor: A long-standing target in cancer chemotherapy. Pharmacol Ther 2015;152:111-24. [Crossref] [PubMed]
- Gajda E, Grzanka M, Godlewska M, et al. The Role of miRNA-7 in the Biology of Cancer and Modulation of Drug Resistance. Pharmaceuticals (Basel) 2021;14:149. [Crossref] [PubMed]
- Hou F, Shi DB, Guo XY, et al. HRCT1, negatively regulated by miR-124-3p, promotes tumor metastasis and the growth of gastric cancer by activating the ERBB2-MAPK pathway. Gastric Cancer 2023;26:250-63. [Crossref] [PubMed]
- Zhang Q, Liu S, Zhang J, et al. Roles and regulatory mechanisms of miR-30b in cancer, cardiovascular disease, and metabolic disorders Exp Ther Med 2021;21:44. (Review). [Crossref] [PubMed]
- Zhang X, Ma G, Liu J, et al. MicroRNA-182 promotes proliferation and metastasis by targeting FOXF2 in triple-negative breast cancer. Oncol Lett 2017;14:4805-11. [Crossref] [PubMed]

