Tumor-intrinsic B4GALNT3 expression drives a protective immune microenvironment in endometriosis-associated ovarian cancer
Original Article

Tumor-intrinsic B4GALNT3 expression drives a protective immune microenvironment in endometriosis-associated ovarian cancer

Li Luo1, Zirui Zhu2, Weiwei Dai1, Na Cao1, Mingzhu Ye1

1Department of Obstetrics and Gynecology, Zhongshan Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, China; 2School of Medicine, Xiamen University, Xiamen, China

Contributions: (I) Conception and design: L Luo, M Ye; (II) Administrative support: M Ye; (III) Provision of study materials or patients: None; (IV) Collection and assembly of data: L Luo, N Cao; (V) Data analysis and interpretation: L Luo, Z Zhu, W Dai, N Cao; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Mingzhu Ye, MD, PhD. Department of Obstetrics and Gynecology, Zhongshan Hospital of Xiamen University, School of Medicine, Xiamen University, No. 201 Hubin South Road, Xiamen 361004, China. Email: mingzhu875702@163.com.

Background: Although endometriosis-associated ovarian cancer (EAOC) is considered a separate clinical entity, no specific prognostic biomarkers aid in its management. This has, therefore, been among the factors hindering the development of tailored treatments. We aim to develop a robust, histotype-aware biomarker for EAOC through an integrative computational approach to explain its association with the tumor immune microenvironment.

Methods: A multi-stage bioinformatics approach using multiple independent Gene Expression Omnibus (GEO) cohorts was employed. We extracted consensus differentially expressed genes (DEGs) from three discovery datasets (EAOC vs. non-malignant tissue). These DEGs were further distilled into high-confidence hub genes using two machine learning algorithms. The pan-cancer prognostic potential was assessed via meta-analysis and tested for validity in an independent, EAOC-enriched cohort (GSE65986). The derived immune context was assessed using CIBERSORTx deconvolution in a pure EAOC cohort (GSE226870), while the cellular origin of our candidate was determined using an independent ovarian clear cell carcinoma (OCCC) single-cell RNA sequencing (scRNA-seq) dataset (GSE224334).

Results: From our analysis, we identified 75 consensus DEGs distilled into five hub genes. Among these, B4GALNT3 was the key candidate. While the pan-ovarian cancer meta-analysis showed a non-significant protective trend, we confirmed in our EAOC-enriched validation cohort that high B4GALNT3 expression was significantly associated with improved overall survival [hazard ratio (HR) =0.350, P=0.04]. It showed robust diagnostic potential with an overall area under the curve (AUC) of 0.962 [95% confidence interval (CI): 0.923–0.993] in leave-one-dataset-out cross-validation among discovery datasets. Immune deconvolution revealed that B4GALNT3 expression correlated with an anti-tumor microenvironment composed of increased levels of plasma B cells, memory B cells, and activated dendritic cells, with decreased regulatory T cells and M2 macrophages. Finally, scRNA-seq analysis confirmed that B4GALNT3 was intrinsically highly expressed in malignant and epithelial cells, with low expression in immune lineages.

Conclusions: B4GALNT3 is a novel, subtype-specific protective biomarker in EAOC. Our findings support a mechanism by which tumor-cell-intrinsic expression of B4GALNT3 drives protection from immune microenvironments. This work identifies B4GALNT3 as a promising prognostic factor and potential target for further mechanistic studies and protein-level validation in EAOC.

Keywords: Endometriosis-associated ovarian cancer (EAOC); B4GALNT3; tumor immune microenvironment; single-cell RNA sequencing (scRNA-seq); bioinformatics


Submitted Nov 07, 2025. Accepted for publication Dec 10, 2025. Published online Feb 02, 2026.

doi: 10.21037/tcr-2025-aw-2458


Highlight box

Key findings

• B4GALNT3 was identified through integrated transcriptomic, immune infiltration, and single-cell analyses as a histotype-specific protective biomarker in endometriosis-associated ovarian cancer (EAOC).

What is known and what is new?

• EAOC is defined as a new subtype of ovarian cancer, while the immune characteristics and molecular drivers are still poorly defined.

• This study newly links epithelial B4GALNT3 expression with protective immune infiltration using multi-cohort bioinformatics and single-cell validation.

What is the implication, and what should change now?

• B4GALNT3 may thus be a potential biomarker for prognosis and immune phenotyping in EAOC, and further studies at the protein level and mechanisms are needed to better understand the role of this protein in epithelial-immune interactions and assess potential translation.


Introduction

Ovarian cancer continues to be one of the most lethal gynecological malignancies worldwide, a status attributable largely to its typically late-stage diagnosis and the high incidence of recurrence following standard therapies (1). Within the heterogeneous spectrum of ovarian neoplasms, endometriosis-associated ovarian cancer (EAOC), which predominantly includes endometrioid and clear cell histotypes, has been recognized as a distinct clinical and molecular entity arising from endometriotic lesions (2,3). The etiological linkage between endometriosis and subsequent malignant transformation highlights a unique carcinogenic pathway, yet, the precise molecular drivers that govern EAOC initiation and progression remain largely elusive (4). This significant knowledge gap has hindered the development of tailored diagnostic tools and effective therapeutic interventions, underscoring the urgent necessity for robust biomarkers capable of accurately predicting patient outcomes and guiding clinical decision-making.

The advent of high-throughput sequencing technologies, coupled with the increasing availability of public databases such as the Gene Expression Omnibus (GEO), has provided an unprecedented opportunity to explore the complex transcriptomic landscapes of various cancers, including EAOC (5,6). The integration of advanced bioinformatics and sophisticated machine learning algorithms offers a powerful framework for systematically sifting through vast amounts of genomic data to identify genes with genuine biological and clinical relevance. By moving beyond simple gene lists to construct functional interaction networks and validate findings across multiple cohorts, it is possible to identify consensus gene signatures that are robust, reproducible, and reflective of the core pathological mechanisms (7). Such computational approaches have proven invaluable in discovering novel biomarkers for diagnosis, prognosis, and therapeutic targeting across numerous malignancies.

Therefore, this study was designed to execute a comprehensive, multi-stage computational analysis aimed at identifying and validating key prognostic biomarkers for EAOC. We hypothesized that by integrating data from multiple independent cohorts and applying a stringent filtering pipeline that combines differential expression analysis, functional enrichment, and machine learning-based feature selection, we could uncover novel genes critically involved in EAOC pathogenesis. Furthermore, considering the pivotal role of the tumor immune microenvironment in cancer progression and therapeutic response, we conducted histotype-aware immune deconvolution to relate candidate biomarkers to the EAOC-specific immune composition, with secondary benchmarking in ovarian clear cell carcinoma (OCCC) and endometrioid ovarian carcinoma (ENOC) (8). The goal of this work was to identify a high-confidence prognostic marker, explore its role in the EAOC immunological context using deconvolution, and validate its cellular origin using independent single-cell transcriptomic data, thereby laying the groundwork for future functional studies and the development of new therapeutic strategies. We present this article in accordance with the STROBE reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2458/rc).


Methods

Data acquisition and preprocessing

This study utilized publicly available transcriptomic data from five independent cohorts obtained from the National Center for Biotechnology Information (NCBI) GEO database. The datasets were organized into discovery and validation cohorts. The discovery cohorts, used for the identification of differentially expressed genes (DEGs), included three sets: GSE226575, GSE157153, and GSE230956. The validation cohorts included two independent sets: GSE65986 was used for subtype-specific survival analysis, and GSE226870 was used for subtype-specific immune infiltration analysis. The characteristics of all datasets are summarized in Table 1. The raw data files for all datasets were downloaded and subjected to standardized preprocessing procedures, which included background correction, log2 transformation, and quantile normalization. Probes were annotated to their corresponding gene symbols, and for genes represented by multiple probes, the average expression value was calculated. Detailed methodological descriptions are provided in the Appendix 1. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Table 1

Characteristics of the GEO datasets included in this study

Dataset Purpose Platform Sample (n) Groups Sample source Authors
GSE226575 Screening GPL24676 9 EAOC (n=5); EM (n=4) Tissues Chen J et al.
GSE157153 Screening GPL17303 66 EAOC (n=29); EM/non-EAOC (n=37) Tissues Shin HY et al.
GSE230956 Screening GPL18573 8 EAOC (n=4); EM (n=4) Tissues Collins KE et al.
GSE65986 Survival validation GPL570 55 OCCC (n=25); serous (n=16); ENOC (n=14) Tissues Uehara Y et al.
GSE226870 Immune validation GPL16791 57 EAOC-OCCC (n=28); EAOC-ENOC (n=29) Tissues Beddows I et al.
GSE224334 Single-cell validation GPL16791 10 OCCC (n=10) Tissues Mori Y et al.

EAOC, endometriosis-associated ovarian cancer; EM, endometriosis; ENOC, endometrioid ovarian carcinoma; GEO, Gene Expression Omnibus; OCCC, ovarian clear cell carcinoma.

Identification and functional analysis of common DEGs

DEGs were identified for each of the three discovery datasets by comparing EAOC samples (cases) against non-malignant tissues (endometrioma and/or normal, collectively treated as controls) using the limma package in R software. DEG analyses were conducted within each dataset separately without cross-dataset batch correction or pooling, and consensus DEGs were obtained by intersection. A stringent cutoff criterion was established, with genes exhibiting an absolute log2 fold change (log2FC) ≥2 and a false discovery rate (FDR) of <0.05 considered to be significant DEGs. Volcano plots were generated to visualize the overall distribution of these DEGs prior to intersection analysis. Subsequently, a Venn diagram analysis was performed to isolate the subset of DEGs that were commonly dysregulated across all three datasets. This consensus gene set was then subjected to a comprehensive suite of functional and pathway enrichment analyses. Gene ontology (GO), focusing on biological process (BP) and cellular component (CC) terms, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed. Furthermore, a protein-protein interaction (PPI) network analysis was conducted using the Metascape web portal. Gene set enrichment analysis (GSEA) was performed with fgsea (pre-ranked; 10,000 permutations), and pathways with FDR <0.05 were considered significantly enriched. Genes were ranked based on the t-statistic derived from the limma analysis, and the HALLMARK gene sets (Category H) from the Molecular Signatures Database (MSigDB) were used as the reference gene set collection (9,10).

Machine learning-based feature selection

To further distill the list of common DEGs and identify the most impactful features, two distinct machine learning algorithms were applied using Python’s scikit-learn library. All preprocessing and model tuning were performed within stratified 5-fold cross-validation (fixed random seed) to minimize information leakage. The least absolute shrinkage and selection operator (LASSO) logistic regression model was utilized for its ability to perform simultaneous regularization and variable selection, with the optimal regularization parameter determined via cross-validation to select a final set of predictive features (11). Additionally, the support vector machine-recursive feature elimination (SVM-RFE) algorithm was implemented. A pipeline consisting of median imputation and standard scaling was constructed, and the recursive feature elimination with cross-validation (RFECV) function was then applied with a LinearSVC estimator, using 5-fold stratified cross-validation and optimizing for the F1 macro score to recursively eliminate features and identify an optimal subset of predictive genes.

Identification and survival validation of hub genes

A final set of candidate hub genes was identified through an intersection analysis of the gene sets generated from the preceding steps, yielding a consensus set of five hub genes. The prognostic significance of the candidate hub genes was rigorously evaluated across three independent sources: The Cancer Genome Atlas (TCGA) cohort, the International Cancer Genome Consortium (ICGC) cohort, and the Kaplan-Meier Plotter (KM-Plotter) cohort. The primary endpoint was overall survival (OS). For TCGA RNA sequencing (RNA-seq) data, STAR counts were normalized using upper-quartile scaling and each gene was modeled as a continuous predictor [per 1-standard deviation (SD) increase] in a separate univariable Cox proportional hazards model. For ICGC and KM-Plotter, effect sizes were obtained from binary contrasts of high vs. low expression. Per gene, study-level effects {log [hazard ratio (HR)] and its standard error} were combined across sources using a DerSimonian-Laird random-effects model, implemented with the metafor package in R. To validate the hypothesis of subtype-specific effects, an additional survival analysis was performed using the independent GSE65986 validation cohort.

Evaluation of diagnostic performance

For the final identified hub gene(s), a diagnostic receiver operating characteristic (ROC) curve analysis was performed to evaluate their ability to discriminate between tumor (EAOC) and non-malignant tissues. To strictly evaluate the robustness of the hub gene and rule out dataset-specific bias, a leave-one-dataset-out cross-validation was performed. We iteratively trained the logistic regression model on two of the three discovery datasets and validated it on the remaining independent dataset. This process was repeated for all three combinations to assess generalizability. An overall combined ROC curve was generated, and the area under the curve (AUC) with its 95% confidence interval (CI) was calculated to quantify diagnostic performance.

Quantification of immune cell infiltration

To investigate the role of the candidate hub gene(s) in the EAOC-specific immune microenvironment, immune cell deconvolution was performed on the EAOC samples of the GSE226870 validation cohort using the CIBERSORTx web portal (12). The relative abundance of 22 immune cell types was estimated using the LM22 leukocyte signature matrix. Following developer recommendations, the analysis was run in absolute mode, with B-mode batch correction enabled to mitigate platform effects, and quantile normalization was disabled for the RNA-seq data. Significance analysis was based on 500 permutations. For downstream analysis, only samples passing pre-specified quality control filters [deconvolution P<0.05, reconstruction correlation ≥0.80, and root-mean-square error (RMSE) ≤0.30] were retained. The association between the expression of the hub gene(s) and the resulting immune cell fractions was assessed using two-sided Spearman’s rank correlation. To account for multiple testing across the 22 cell types, the FDR was controlled using the Benjamini-Hochberg procedure, with q<0.05 considered statistically significant (13,14). In additional histotype-aware analyses, we applied the identical CIBERSORTx pipeline and quality-control criteria to OCCC and ENOC.

Single-cell RNA-seq (scRNA-seq) data analysis

To validate the cellular origin of B4GALNT3 expression, we analysed an independent, publicly available scRNA-seq dataset GSE224334 of OCCC. Given that OCCC is a predominant histotype of EAOC and our bulk-data immune correlations were directionally concordant in the OCCC subgroup, this cohort served as a valid and relevant proxy to determine the cellular origin of B4GALNT3 within the EAOC context. The dataset was processed using the Seurat package in R (3,15). After quality control, cells were clustered and annotated into major cell lineages (including malignant, epithelial, CD4T, CD8T, mono/macro, fibroblasts, etc.) based on canonical marker gene expression. Normalized B4GALNT3 expression levels were then visualized across all cell clusters on t-distributed stochastic neighbor embedding (t-SNE) plots (16,17). The mean expression of B4GALNT3 within each major cell lineage was calculated and compared to determine the primary cellular source of B4GALNT3.

Statistical analysis

All statistical analyses and machine learning tasks were conducted using R software (version 4.5.2) and Python (version 3.11). Differential expression analysis was performed using the limma package based on linear models and empirical Bayes moderated t-statistics. Feature selection was conducted using LASSO regression and SVM-RFE algorithms. Survival outcomes were evaluated using Kaplan-Meier curves with log-rank tests and univariable Cox proportional hazards regression models to estimate HR and 95% CI. Meta-analysis of HRs was performed using a DerSimonian-Laird random-effects model. Correlations between gene expression and immune infiltration were assessed using Spearman’s rank correlation coefficient. Diagnostic performance was quantified using the area under the ROC curve (AUC). The Benjamini-Hochberg procedure was applied to control the FDR for multiple hypothesis testing. Unless otherwise stated, all statistical tests were two-sided, and a P value <0.05 was considered statistically significant.


Results

Systematic identification of hub genes in EAOC

The overall workflow of this study is depicted in Figure 1. Following this pipeline, the initial differential expression analysis of the three integrated GEO discovery datasets revealed a widespread alteration of the transcriptomic landscape in EAOC tissues. A volcano plot identified 7,019 DEGs (Figure 2A). To isolate the most robust and consistently dysregulated genes, an intersection analysis yielded a core signature of 75 common DEGs (Figure 2B).

Figure 1 Overall workflow of the study. The flowchart illustrates the multi-stage analytical pipeline, from data acquisition and identification of DEGs to functional enrichment, machine learning-based feature selection, and the identification of five hub genes. DEG, differentially expressed gene; EAOC, endometriosis-associated ovarian cancer; GO, Gene Ontology; GSE, Gene Expression Omnibus Series; GSEA, Gene Set Enrichment Analysis; ICGC, International Cancer Genome Consortium; KEGG, Kyoto Encyclopedia of Genes and Genomes; KM, Kaplan-Meier; LASSO, least absolute shrinkage and selection operator; PPI, protein-protein interaction; RNA-Seq, RNA sequencing; SVM, Support Vector Machine; TCGA, The Cancer Genome Atlas.
Figure 2 Identification of DEGs in EAOC. (A) Volcano plot visualizing DEGs between EAOC and control tissues from the combined GEO datasets. Up-regulated (n=2,847) and down-regulated (n=4,172) genes are shown in red and blue, respectively. (B) Venn diagram showing the overlap of DEGs from three independent GEO datasets (GSE226575, GSE157153, GSE230956), identifying 75 common DEGs. Numbers in panels indicate gene counts and overlaps between datasets. DEG, differentially expressed gene; EAOC, endometriosis-associated ovarian cancer; GEO, Gene Expression Omnibus; GSE, Gene Expression Omnibus Series.

Functional enrichment analyses of common DEGs

To elucidate the collective biological significance of the 75 common DEGs, a series of functional analyses was performed. GO analysis indicated that these genes were primarily involved in BP related to cell division (Figure 3A) and CCs like the mitotic checkpoint complex (Figure 3B). The PPI network analysis revealed a densely interconnected functional network with its core enriched in the mitotic cell cycle (Figure 3C). KEGG pathway-gene correlations (Figure 3D) and GSEA (Figure 3E) also pointed towards cancer-related pathways, such as epithelial-mesenchymal transition (EMT) and TNFα signaling via NF-κB.

Figure 3 Functional enrichment analysis of the 75 common DEGs. (A) GO enrichment results for BP. (B) GO enrichment results for CC. (C) PPI network of the common DEGs, with the main cluster enriched in mitotic cell cycle processes. Node size: gene degree within PPI network. Edge thickness: interaction confidence score. (D) Network depicting the relationship between key KEGG pathways and associated DEGs. (E) GSEA plot showing enrichment of the HALLMARK epithelial-mesenchymal transition and TNFα signaling via NF-κB pathways in EAOC. BP, biological process; CC, cellular component; DEG, differentially expressed gene; EAOC, endometriosis-associated ovarian cancer; FDR, false discovery rate; GO, gene ontology; GSEA, Gene Set Enrichment Analysis; KEGG, Kyoto Encyclopedia of Genes and Genomes; PPI, protein-protein interaction.

Hub gene selection via machine learning and integrated analysis

To distill the most robust biomarkers from the 75 common DEGs, a multi-pronged strategy was employed. LASSO regression (Figure 4A) and SVM-RFE algorithms (Figure 4B) identified 10 and 23 optimal predictive genes, respectively. The intersection of these gene sets with genes from the KEGG analysis yielded a final consensus list of five high-confidence hub genes, i.e., B4GALNT3, CLDN4, MARVELD2, OCLN, and SGPP2 (Figure 4C).

Figure 4 Hub gene selection via machine learning and integrated analysis. (A) Feature selection using the LASSO regression model, identifying 10 optimal features. (B) Feature selection using the SVM-RFE model, identifying 23 optimal features. (C) Venn diagram illustrating the intersection of gene sets from LASSO, SVM-RFE, and KEGG analyses to identify five hub genes. λ: regularization parameter in LASSO. F1: harmonic mean of precision and recall. N: number of features retained at minimum deviance. KEGG, Kyoto Encyclopedia of Genes and Genomes; LASSO, least absolute shrinkage and selection operator; RFECV, recursive feature elimination with cross-validation; SVM-RFE, support vector machine-recursive feature elimination.

Survival analysis and subtype-specific validation of B4GALNT3

We first systematically evaluated the prognostic value of the five candidate hub genes via a meta-analysis of three large-scale pan-ovarian cancer cohorts (Table 2). The analysis confirmed that high expression of CLDN4 and OCLN were significant risk factors for poor prognosis. In contrast, the combined HR for B4GALNT3 indicated a protective trend (combined HR =0.851), but this did not reach statistical significance (P=0.26) due to high inter-study heterogeneity (I2=78.3%) (Figure 5A). To test the hypothesis that this effect was subtype-specific, we performed a validation analysis in the independent GSE65986 cohort, which is highly enriched with EAOC-related subtypes. Strikingly, the protective association of B4GALNT3 with OS was independently and significantly observed in this cohort (HR =0.350, P=0.04) (Figure 5B). Taken together, B4GALNT3 showed lower expression in EAOC than in non-malignant tissue, and among EAOC cases higher expression correlated with better OS. This series of analyses ultimately established B4GALNT3 as our sole primary gene of interest.

Table 2

Meta-analysis of the association between hub gene expression and OS across three cohorts

Gene Combined HR 95% CI P value Heterogeneity (I2), %
B4GALNT3 0.851 0.643–1.126 0.26 78.3
CLDN4 1.141 1.048–1.243 0.002 0
MARVELD2 1.010 0.788–1.295 0.94 75.1
SGPP2 0.818 0.599–1.119 0.21 81.7
OCLN 1.118 1.008–1.240 0.04 0

CI, confidence interval; HR, hazard ratio; OS, overall survival.

Figure 5 Prognostic association and diagnostic performance of B4GALNT3. (A) Forest plot of the meta-analysis for B4GALNT3 expression and OS in pan-ovarian cancer cohorts. Study-specific HRs include a univariable Cox model for TCGA. (B) Kaplan-Meier survival curve analysis of B4GALNT3 in the EAOC-related subtype cohort GSE65986 (HR =0.35, P=0.04). (C) Diagnostic performance of B4GALNT3 evaluated via leave-one-dataset-out cross-validation. The plot displays ROC curves for validation in each independent dataset (GSE226575, AUC =0.950; GSE157153, AUC =0.979; GSE230956, AUC =0.938) and the overall combined performance (AUC =0.962). Error bars: 95% CI of HR. P: log-rank test significance. AUC, area under the curve; CI, confidence interval; EAOC, endometriosis-associated ovarian cancer; GSE, Gene Expression Omnibus Series; HR, hazard ratio; ICGC, International Cancer Genome Consortium; KM, Kaplan-Meier; OS, overall survival; ROC, receiver operating characteristic; TCGA, The Cancer Genome Atlas.

Diagnostic performance of B4GALNT3

Further investigation indicated strong discriminative performance of B4GALNT3 in distinguishing EAOC from non-malignant tissues. Through leave-one-dataset-out cross-validation, B4GALNT3 demonstrated consistent diagnostic ability across independent cohorts, with validation AUCs of 0.950 in GSE226575, 0.979 in GSE157153, and 0.938 in GSE230956. The overall combined AUC was 0.962 (95% CI: 0.923–0.993) (Figure 5C).

Association with the immune microenvironment in EAOC

To interrogate the immunologic context of B4GALNT3, we performed immune cell deconvolution in the independent EAOC cohort (GSE226870). As shown in Figure 6A, B4GALNT3 expression was positively correlated with plasma B cells (rho =0.683, P=0.02), memory B cells (rho =0.595, P=0.01), CD4+ memory resting T cells (rho =0.545, P=0.04), and activated myeloid dendritic cells (rho =0.421, P=0.05). In contrast, inverse associations were observed with M0 macrophages (rho =−0.413, P=0.04), naive B cells (rho =−0.446, P=0.02), regulatory T cells (Tregs; rho =−0.479, P=0.04), and M2 macrophages (rho =−0.557, P=0.01).

Figure 6 Correlation of B4GALNT3 with the immune microenvironment by histotype. (A) Lollipop plot of Spearman correlations (rho) between B4GALNT3 expression and CIBERSORTx-inferred immune cell fractions in the combined EAOC cohort (GSE226870). (B) Lollipop plot of Spearman correlations (rho) in the OCCC subgroup of GSE226870, analysed with the identical pipeline. (C) Lollipop plot of Spearman correlations (rho) in the ENOC subgroup of GSE226870, analysed with the identical pipeline. Asterisks denote FDR-adjusted significance (q<0.05). Asterisks indicate significant correlations after FDR correction (q<0.05); bars to the right of zero denote positive correlations; bars to the left denote negative correlations. EAOC, endometriosis-associated ovarian cancer; ENOC, endometrioid ovarian carcinoma; FDR, false discovery rate; GSE, Gene Expression Omnibus Series; OCCC, ovarian clear cell carcinoma.

A directionally concordant pattern was observed in the OCCC subgroup (Figure 6B), where B4GALNT3 correlated positively with memory B cells (rho =0.608, P=0.03) and activated dendritic cells (rho =0.507, P=0.04), and inversely with M2 macrophages (rho =−0.589, P=0.04) and M0 macrophages (rho =−0.404, P=0.046).

In the ENOC subgroup (Figure 6C), associations were again directionally consistent and of slightly larger magnitude than in OCCC: positive correlations with plasma B cells (rho =0.645, P=0.02), memory B cells (rho =0.511, P=0.03), activated dendritic cells (rho =0.464, P=0.02), and CD4+ memory resting T cells (rho =0.426, P=0.02); and negative correlations with Tregs (rho =−0.466, P=0.02), naïve B cells (rho =−0.492, P=0.01), and M2 macrophages (rho =−0.506, P=0.01). A borderline inverse association was noted for M0 macrophages (rho =−0.335, P=0.052).

Overall, across EAOC, OCCC, and ENOC, the direction of correlations was consistent, with positive correlations for B-cell and activated dendritic cell fractions and negative correlations for Treg and M2 macrophage fractions. Effect sizes were smaller in OCCC and ENOC than in EAOC; exact statistics and multiple-testing details are reported in Figure 6A-6C and legends.

Single-cell analysis confirms malignant epithelial origin of B4GALNT3 expression

Our bulk transcriptomic deconvolution revealed a strong correlation between overall B4GALNT3 expression and an anti-tumor immune infiltrate. However, this analysis could not determine the cellular source of B4GALNT3.

To resolve this ambiguity, we analysed an independent scRNA-seq cohort from an OCCC cohort (GSE224334), which represents a major histological subtype of EAOC. We identified nine major cell lineages, including malignant cells, epithelial cells, T cells (CD4T, CD8T), myeloid cells (mono/macro), and various stromal cells (Figure 7A). Visualization of B4GALNT3 expression on the t-SNE plot demonstrated that its expression was almost exclusively confined to the malignant and epithelial cell clusters (Figure 7B).

Figure 7 Single-cell RNA-Seq analysis of B4GALNT3 cellular localization in OCCC. (A) t-SNE plot of all cells from an OCCC scRNA-seq cohort, colored by major cell lineage. (B) The same t-SNE plot colored by the normalized expression level of B4GALNT3, showing high expression localized to the malignant and epithelial clusters. Color gradient (purple-yellow) represents normalized B4GALNT3 expression intensity; bar height indicates mean expression per cell lineage. (C) Bar plot quantifying the mean expression of B4GALNT3 across each major cell lineage. OCCC, ovarian clear cell carcinoma; scRNA-seq, single-cell RNA sequencing; t-SNE, t-distributed Stochastic Neighbor Embedding.

Quantification of mean expression levels across lineages confirmed this stark localization. Malignant cells (mean expression =2.33) and epithelial cells (mean expression =1.48) showed dramatically higher B4GALNT3 expression than all other cell types. In contrast, immune and stromal lineages, including CD4T/CD8T cells, mono/macro, and fibroblasts, showed negligible baseline expression (all mean expression <0.86) (Figure 7C). This result provides direct evidence that the protective B4GALNT3 signal observed in bulk tissue analysis originates from the tumor cells themselves, supporting a tumor-cell-intrinsic mechanism.


Discussion

This study employed a systematic, integrative analytical pipeline and identified B4GALNT3 as a subtype-specific protective biomarker in EAOC, linking its expression to a more permissive immunologic tumor microenvironment. Convergent evidence from statistics, machine learning, and pathway analyses, together with immune deconvolution and single-cell findings, supports a model in which a tumor-intrinsic signal exerts cell-extrinsic immunologic effects. These observations warrant orthogonal validation at the protein level.

Our multi-step filtering strategy underscores the value of combining differential expression, network biology, and feature selection. Functional enrichment, particularly the protein-protein interaction network, revealed a highly connected module centered on mitotic cell-cycle programs, consistent with the proliferative phenotype of EAOC against an endometriosis-associated inflammatory background (18). This framework provides biological context for interpreting immune variation and helps explain the stronger B-cell and dendritic-cell activity observed in B4GALNT3-high tumors.

A pivotal step was prognostic validation. In the pan-ovarian meta-analysis, B4GALNT3 showed a protective trend that did not reach statistical significance, suggesting subtype dependence. This hypothesis was then tested in an EAOC-enriched cohort, where the association was independently and significantly confirmed, elevating B4GALNT3 from a candidate to a supported prognostic factor in a defined pathologic context.

Across three pan-ovarian cohorts, high CLDN4 and OCLN consistently correlated with worse outcomes, while B4GALNT3 exhibited a protective trend (combined HR =0.851) that did not meet statistical significance in the presence of substantial heterogeneity (I2=78.3%, P=0.26) (Figure 5A, Table 2). In the independent EAOC-enriched GSE65986 cohort, we reproduced the protective effect, with higher B4GALNT3 associated with longer OS (HR =0.350, P=0.04; Figure 5B). These results indicate that clinical interpretation should be firmly anchored in the appropriate histologic context. We recognize that histotype mixing, residual confounding, and limited sample size may affect effect estimates and cutoff generalizability; larger, rigorously stratified external EAOC cohorts are needed for replication and calibration.

At the diagnostic level, the cross-validated overall AUC of 0.962 (95% CI: 0.923–0.993) demonstrates robust discrimination. Importantly, the consistent performance observed across the leave-one-dataset-out folds confirms that the diagnostic utility of B4GALNT3 is generalizable and not driven by dataset-specific artifacts. In clinical practice, it should be considered alongside CA125/HE4 and imaging (19). While this strict cross-validation mitigates algorithmic optimism, establishing precise thresholds for real-world settings will still require prospective validation in larger, multi-center cohorts.

Immune analyses indicate that higher B4GALNT3 associates with strengthened humoral and antigen-presentation axes and attenuation of suppressive programs. Specifically, proportions of plasma cells, memory B cells, and activated dendritic cells increase, whereas regulatory T cells and M2 macrophages decrease, consistent with microenvironmental remodeling in a protective direction (20). Differences across histotypes are biologically meaningful. EAOC and ENOC more clearly show increases in B cells and activated dendritic cells with declines in Tregs and M2 macrophages, aligning with a chronic inflammatory, endometriosis-derived epithelial context and epithelial-stromal crosstalk (21). OCCC follows the same direction with smaller amplitudes, a pattern that may reflect histotype-specific metabolic and stromal programs, recurrent genomic alterations, and study-level factors such as sample size, purity, and platform. Directional concordance across EAOC (Figure 6A), OCCC (Figure 6B), and ENOC (Figure 6C) supports a unified view: B4GALNT3 tracks with microenvironmental features permissive to antitumor immunity, with effect size contingent on histology, underscoring the need for histotype-stratified validation.

Single-cell atlases provide clinical and biological context for these bulk findings. In GSE224334, B4GALNT3 transcripts localize predominantly to malignant and adjacent epithelial cells, with relatively low signal in lymphoid and myeloid lineages (Figure 7A-7C). This distribution indicates a tumor-cell origin and suggests that upregulation shapes the immune milieu through paracrine and glyco-biological mechanisms (22). In practical terms, epithelial territories with high B4GALNT3 can exhibit enhanced B-cell maturation and dendritic-cell activation without requiring expression of B4GALNT3 in those immune cells. Anchoring B4GALNT3 to epithelial compartments also reduces interpretive ambiguity inherent to bulk analyses, which are susceptible to compositional confounding (23).

Mechanistically, the epithelial localization is consistent with the glycobiology of B4GALNT3, which encodes a β4-N-acetylgalactosaminyltransferase that synthesizes the LacdiNAc motif (24,25). Previous glycomic profiling in ovarian cancer has directly linked B4GALNT3 and its family member B4GALNT4 to specific alterations in N-linked glycan structures, providing mass-spectrometry-based evidence for this modification in the ovarian tumor context (26,27). Such remodeling of tumor-surface and secreted glycoproteins can alter lectin-mediated recognition on dendritic cells, influence antigen processing and major histocompatibility complex class II (MHC-II) presentation, and tune co-stimulatory thresholds that govern B-cell help and T-cell activation (28,29). In this setting, higher epithelial B4GALNT3 may facilitate efficient licensing of antigen-presenting cells, improve the quality and persistence of humoral responses, and make consolidation of Treg and M2 programs more difficult to maintain. The single-cell evidence of epithelial enrichment provides a cellular anchor for this model, consistent with tumor-intrinsic glycan editing that secondarily modulates antigen presentation and B-cell maturation within the local microenvironment (30).

These observations carry translational implications. Tumors with high B4GALNT3 and a corresponding immune fingerprint characterized by robust B-cell and dendritic-cell activity may be candidates for approaches that consolidate tertiary lymphoid structures or enhance B-T cooperation (31). Tumors with low B4GALNT3 dominated by suppressive Treg/M2 circuits may be better served by macrophage reprogramming or Treg-modulating strategies (32,33). The single-cell localization points to immunohistochemistry and multiplex immunofluorescence that co-localize B4GALNT3 with epithelial cytokeratins and quantify spatial relationships with CD20/CD138-positive aggregates and mature dendritic cells, ideally complemented by spatial transcriptomics (34).

It is also important to place these findings within the broader context of the glycosyltransferase family. B4GALNT3 is part of a larger class of enzymes whose members often exhibit distinct prognostic values in gynecologic cancers. For instance, a recent study constructed a transcriptomic prognostic score validated across multiple datasets, in which other glycosylation-related genes, such as B4GALNT1, emerged as significant risk factors (35). This suggests that while the dysregulation of glycosylation machinery is a common feature of ovarian malignancies, specific family members may drive opposing oncologic outcomes, with B4GALNT1 potentially linked to high-risk profiles and B4GALNT3, as shown here, associated with a protective, immune-permissive phenotype in EAOC.

Furthermore, integrating a pan-gynecologic perspective enhances the translational value of these findings. While broad prognostic models in gynecologic cancers, including recent sex-informed frameworks, often identify immune-evasive and fibroblast-rich tumor microenvironments as drivers of poor outcomes, our data suggest that B4GALNT3 marks a distinct, immunologically hot or permissive niche within EAOC. The strong correlation with B cells and dendritic cells observed here contrasts sharply with the stromal-dominant, exclusion signatures typical of high-risk gynecologic tumors. This distinction underscores the necessity of histotype-specific immune stratification, as the specific protective immune context driven by B4GALNT3 in EAOC might be obscured in broader sex-informed analyses that prioritize common stromal risk factors.

The role of B4GALNT3 is context dependent. Reports of oncogenic activity in gastric and colorectal cancer contrast with associations with favorable prognosis and reduced migration/invasion in neuroblastoma (36-38); the WNK1-B4GALNT3 fusion described in papillary thyroid carcinoma further illustrates function under specific genomic configurations (39,40). These cross-cancer observations reinforce the contextual nature of B4GALNT3 biology and are consistent with our subtype-focused interpretation in EAOC.

Several limitations merit consideration. The single-cell evidence derives from a single public dataset with limited sample size, potential platform effects, and lineage annotations inherited from the source study, making ambient RNA and copy-number-related transcriptional inflation difficult to exclude. Bulk deconvolution is sensitive to cellular composition and batch effects, and survival estimates in the EAOC-enriched cohort are constrained by histotype mixing and a limited number of events. The next steps include orthogonal, protein-level validation using multiplex immunohistochemistry to co-localize B4GALNT3 with epithelial markers and to quantify spatial proximity to B-cell and dendritic-cell niches, supported by spatial transcriptomics. Mechanistic studies employing B4GALNT3 perturbation, glycoproteomics, and lectin-binding assays, together with epithelial-antigen-presenting cell co-culture systems, will be essential to test causality and map the specific glycan edits that influence antigen processing and co-stimulatory signaling.


Conclusions

This study suggests that B4GALNT3 is a novel, subtype-specific protective factor in EAOC-related cancers. Our analysis indicates a strong potential for its use in diagnostics and provides the first evidence that its favorable prognostic impact is likely mediated through beneficial remodeling of the tumor immune microenvironment. These findings highlight B4GALNT3 as a promising candidate for further research, a hypothesis that warrants protein-level validation and functional perturbation studies.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the STROBE reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2458/rc

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2458/prf

Funding: This work was supported by the Natural Science Foundation of Fujian Province (No. 2022J011342).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2458/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Herrington CS, Oswald AJ, Stillie LJ, et al. Compartment-specific multiomic profiling identifies SRC and GNAS as candidate drivers of epithelial-to-mesenchymal transition in ovarian carcinosarcoma. Br J Cancer 2024;130:327-35. [Crossref] [PubMed]
  2. Wang Y, Nicholes K, Shih IM. The Origin and Pathogenesis of Endometriosis. Annu Rev Pathol 2020;15:71-95. [Crossref] [PubMed]
  3. Steinbuch SC, Lüß AM, Eltrop S, et al. Endometriosis-Associated Ovarian Cancer: From Molecular Pathologies to Clinical Relevance. Int J Mol Sci 2024;25:4306. [Crossref] [PubMed]
  4. Zondervan KT, Becker CM, Koga K, et al. Endometriosis. Nat Rev Dis Primers 2018;4:9. [Crossref] [PubMed]
  5. Clough E, Barrett T, Wilhite SE, et al. NCBI GEO: archive for gene expression and epigenomics data sets: 23-year update. Nucleic Acids Res 2024;52:D138-44. [Crossref] [PubMed]
  6. Ma R, Zheng Y, Wang J, et al. Identification of key genes associated with endometriosis and endometrial cancer by bioinformatics analysis. Front Oncol 2024;14:1387860. [Crossref] [PubMed]
  7. Zhu Z, Zeng Z, Song B, et al. Identification of diagnostic biomarkers and immune cell profiles associated with COPD integrated bioinformatics and machine learning. J Cell Mol Med 2024;28:e70107. [Crossref] [PubMed]
  8. Tiwari A, Trivedi R, Lin SY. Tumor microenvironment: barrier or opportunity towards effective cancer therapy. J Biomed Sci 2022;29:83. [Crossref] [PubMed]
  9. Castanza AS, Recla JM, Eby D, et al. Extending support for mouse data in the Molecular Signatures Database (MSigDB). Nat Methods 2023;20:1619-20. [Crossref] [PubMed]
  10. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell 2011;144:646-74. [Crossref] [PubMed]
  11. Mongardi S, Cascianelli S, Masseroli M. Biologically weighted LASSO: enhancing functional interpretability in gene expression data analysis. Bioinformatics 2024;40:btae605. [Crossref] [PubMed]
  12. Newman AM, Steen CB, Liu CL, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol 2019;37:773-82. [Crossref] [PubMed]
  13. Yang L, Wang P, Chen J. 2dGBH: Two-dimensional group Benjamini-Hochberg procedure for false discovery rate control in two-way multiple testing of genomic data. Bioinformatics 2024;40:btae035. [Crossref] [PubMed]
  14. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 1995;57:289-300.
  15. Hao Y, Hao S, Andersen-Nissen E, et al. Integrated analysis of multimodal single-cell data. Cell 2021;184:3573-3587.e29. [Crossref] [PubMed]
  16. Molla Desta G, Birhanu AG. Advancements in single-cell RNA sequencing and spatial transcriptomics: transforming biomedical research. Acta Biochim Pol 2025;72:13922. [Crossref] [PubMed]
  17. van der Maaten L, Hinton G. Visualizing data using t-SNE. Journal of machine learning research 2008;9:2579-605.
  18. Leenen S, Hermens M, de Vos van Steenwijk PJ, et al. Immunologic factors involved in the malignant transformation of endometriosis to endometriosis-associated ovarian carcinoma. Cancer Immunol Immunother 2021;70:1821-9. [Crossref] [PubMed]
  19. Xiao Y, Bi M, Guo H, et al. Multi-omics approaches for biomarker discovery in early ovarian cancer diagnosis. EBioMedicine 2022;79:104001. [Crossref] [PubMed]
  20. Kasikova L, Rakova J, Hensler M, et al. Tertiary lymphoid structures and B cells determine clinically relevant T cell phenotypes in ovarian cancer. Nat Commun 2024;15:2528. [Crossref] [PubMed]
  21. Xu H, Zhao J, Lu J, et al. Ovarian endometrioma infiltrating neutrophils orchestrate immunosuppressive microenvironment. J Ovarian Res 2020;13:44. [Crossref] [PubMed]
  22. Jia H, Jiang L, Shen X, et al. Post-translational modifications of cancer immune checkpoints: mechanisms and therapeutic strategies. Mol Cancer 2025;24:193. [Crossref] [PubMed]
  23. Hirz T, Mei S, Sarkar H, et al. Dissecting the immune suppressive human prostate tumor microenvironment via integrated single-cell and spatial transcriptomic analyses. Nat Commun 2023;14:663. [Crossref] [PubMed]
  24. Tokoro Y, Nagae M, Nakano M, et al. LacdiNAc synthase B4GALNT3 has a unique PA14 domain and suppresses N-glycan capping. J Biol Chem 2024;300:107450. [Crossref] [PubMed]
  25. Sato T, Gotoh M, Kiyohara K, et al. Molecular cloning and characterization of a novel human beta 1,4-N-acetylgalactosaminyltransferase, beta 4GalNAc-T3, responsible for the synthesis of N,N'-diacetyllactosediamine, galNAc beta 1-4GlcNAc. J Biol Chem 2003;278:47534-44. [Crossref] [PubMed]
  26. Anugraham M, Jacob F, Nixdorf S, et al. Specific glycosylation of membrane proteins in epithelial ovarian cancer cell lines: glycan structures reflect gene expression and DNA methylation status. Mol Cell Proteomics 2014;13:2213-32. [Crossref] [PubMed]
  27. Anugraham M, Jacob F, Everest-Dass AV, et al. Tissue glycomics distinguish tumour sites in women with advanced serous adenocarcinoma. Mol Oncol 2017;11:1595-615. [Crossref] [PubMed]
  28. Reis e Sousa C, Yamasaki S, Brown GD. Myeloid C-type lectin receptors in innate immune recognition. Immunity 2024;57:700-17. [Crossref] [PubMed]
  29. Zhang Z, Wiencke JK, Kelsey KT, et al. HiTIMED: hierarchical tumor immune microenvironment epigenetic deconvolution for accurate cell type resolution in the tumor microenvironment using tumor-type-specific DNA methylation data. J Transl Med 2022;20:516. [Crossref] [PubMed]
  30. Xu X, Peng Q, Jiang X, et al. Altered glycosylation in cancer: molecular functions and therapeutic potential. Cancer Commun (Lond) 2024;44:1316-36. [Crossref] [PubMed]
  31. Teillaud JL, Houel A, Panouillot M, et al. Tertiary lymphoid structures in anticancer immunity. Nat Rev Cancer 2024;24:629-46. [Crossref] [PubMed]
  32. Rannikko JH, Hollmén M. Clinical landscape of macrophage-reprogramming cancer immunotherapies. Br J Cancer 2024;131:627-40. [Crossref] [PubMed]
  33. Tay C, Tanaka A, Sakaguchi S. Tumor-infiltrating regulatory T cells as targets of cancer immunotherapy. Cancer Cell 2023;41:450-65. [Crossref] [PubMed]
  34. Walsh LA, Quail DF. Decoding the tumor microenvironment with spatial technologies. Nat Immunol 2023;24:1982-93. [Crossref] [PubMed]
  35. Cuello MA, Gómez-Valenzuela F, Wichmann I, et al. A sex-informed transcriptomic prognostic score for gynecologic cancers: Multiplatform validation and spatial characterization. Int J Gynaecol Obstet 2025; Epub ahead of print. [Crossref]
  36. Fernández-Ponce C, Geribaldi-Doldán N, Sánchez-Gomar I, et al. The Role of Glycosyltransferases in Colorectal Cancer. Int J Mol Sci 2021;22:5822. [Crossref] [PubMed]
  37. Indellicato R, Trinchera M. Epigenetic Regulation of Glycosylation in Cancer and Other Diseases. Int J Mol Sci 2021;22:2980. [Crossref] [PubMed]
  38. Horwacik I. The Extracellular Matrix and Neuroblastoma Cell Communication-A Complex Interplay and Its Therapeutic Implications. Cells 2022;11:3172. [Crossref] [PubMed]
  39. Morton LM, Lee OW, Karyadi DM, et al. Genomic characterization of cervical lymph node metastases in papillary thyroid carcinoma following the Chornobyl accident. Nat Commun 2024;15:5053. [Crossref] [PubMed]
  40. Costa V, Esposito R, Ziviello C, et al. New somatic mutations and WNK1-B4GALNT3 gene fusion in papillary thyroid carcinoma. Oncotarget 2015;6:11242-51. [Crossref] [PubMed]
Cite this article as: Luo L, Zhu Z, Dai W, Cao N, Ye M. Tumor-intrinsic B4GALNT3 expression drives a protective immune microenvironment in endometriosis-associated ovarian cancer. Transl Cancer Res 2026;15(2):109. doi: 10.21037/tcr-2025-aw-2458

Download Citation