Machine learning-driven prognostic model based on sphingolipid-related gene signature in pancreatic cancer: development and validation
Original Article

Machine learning-driven prognostic model based on sphingolipid-related gene signature in pancreatic cancer: development and validation

Qi Zou1,2#, Hailin Jiang2#, Qihui Sun2#, Qian Peng3, Jie He1,2, Keping Xie2 ORCID logo, Fang Wei1,2

1Guangzhou Digestive Disease Center, Guangzhou First People’s Hospital and The Second Affiliated Hospital, South China University of Technology School of Medicine, Guangzhou, China; 2Center for Pancreatic Cancer Research and Department of Immunology, South China University of Technology School of Medicine, Guangzhou, China; 3Department of Dermatology, General Hospital of Southern Theatre Command, Guangzhou, China

Contributions: (I) Conception and design: All authors; (II) Administrative support: F Wei, J He, K Xie; (III) Provision of study materials or patients: F Wei, Q Zou, K Xie; (IV) Collection and assembly of data: Q Zou, H Jiang, Q Sun, Q Peng; (V) Data analysis and interpretation: F Wei, Q Zou, Q Sun, K Xie; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Fang Wei, MD. Guangzhou Digestive Disease Center, Guangzhou First People’s Hospital and The Second Affiliated Hospital, South China University of Technology School of Medicine, Panfu Road, Guangzhou 510180, China; Center for Pancreatic Cancer Research and Department of Immunology, South China University of Technology School of Medicine, Guangzhou, China. Email: eyfangwei@scut.edu.cn; Keping Xie, MD. Center for Pancreatic Cancer Research and Department of Immunology, South China University of Technology School of Medicine, 382 Waihuan Road, Guangzhou 510006, China. Email: scutmedicine@scut.edu.cn.

Background: Pancreatic cancer, a highly malignant tumor with poor prognosis, lacks effective early diagnosis and treatment strategies. Sphingolipids have emerged as key players in tumorigenesis, with certain sphingolipid-related genes linked to patient survival. This study aims to identify prognostic glycosphingolipid (GSL)-related genes and construct a predictive model to improve survival prediction and guide personalized treatment. By providing potential biomarkers, our findings may enhance clinical decision-making and offer new insights into pancreatic cancer diagnosis and therapy.

Methods: This study utilized 150 pancreatic cancer samples from The Cancer Genome Atlas-Pancreatic Adenocarcinoma (TCGA-PAAD) and 69 from GSE62452 [Gene Expression Omnibus (GEO)] for training and validation. Cox univariate regression identified sphingolipid-related genes with prognostic value. Over 100 machine learning algorithms, including Cox models, support vector machines (SVM), and random forests (RF), were applied to construct an optimal survival prediction model for pancreatic ductal adenocarcinoma (PDAC). Model accuracy was evaluated using the concordance index (C-index). Enrichment, immune infiltration, mutation spectrum, and cell communication analyses were performed to explore sphingolipid mechanisms in pancreatic cancer.

Results: Using 10 machine learning algorithms, we developed over 100 models to predict sphingolipid-related survival in pancreatic cancer. A robust prognostic model was constructed, incorporating three GSL-related genes (MET, GBA2, DEFB1), represented by the equation: weighted score = 0.469 * MET + (−0.357) * GBA2 + 0.103 * DEFB1. The model demonstrated strong predictive performance, with a C-index of 0.854 for overall survival in 150 pancreatic cancer patients from the TCGA database and 0.652 in 69 patients from the GEO validation set. Pathway enrichment analysis revealed that high-risk patients were significantly enriched in oncogenic and immune-related pathways. Mutation spectrum analysis indicated a higher mutation load in high-risk patients, with mutations concentrated in common oncogenic pathways. Immune infiltration analysis showed that the risk score positively correlated with immune-suppressive genes but negatively correlated with immune-killing cell infiltration. Cell communication analysis highlighted elevated activity in the macrophage migration inhibitory factor (MIF) pathway within high-risk groups, associated with tumor proliferation and immune escape. In conclusion, this study establishes a sphingolipid-based prognostic model with significant potential for predicting pancreatic cancer outcomes.

Conclusions: The sphingolipid-based model accurately predicts pancreatic cancer survival and suggests sphingolipids promote tumor progression by mediating immune-suppressive microenvironments, aiding prognostic prediction and personalized treatment.

Keywords: Pancreatic ductal adenocarcinoma (PDAC); machine learning; survival prediction; sphingolipids


Submitted Oct 05, 2024. Accepted for publication Feb 25, 2025. Published online May 26, 2025.

doi: 10.21037/tcr-24-1893


Highlight box

Key findings

• Sphingolipid gene expression in pancreatic cancer: sphingolipid-related genes are highly expressed in pancreatic cancer tumor cells and macrophages, indicating their potential roles in tumorigenesis and immune escape.

• Predictive model development: a machine learning-based survival prediction model demonstrated that sphingolipid-related genes effectively predict poor survival in pancreatic cancer patients.

• Prognostic significance: the sphingolipid-related gene model exhibited high accuracy and stability in predicting patient survival times, highlighting its clinical relevance.

• Immune microenvironment and cell communication: high-risk patients showed increased mutation burdens, oncogenic pathway activation, and immunosuppressive microenvironment formation, mediated by unique macrophage migration inhibitory factor (MIF) pathway-driven cell communication patterns.

What is known and what is new?

• Sphingolipids are known to play key roles in cancer biology and tumor development.

• This study investigates sphingolipid-related genes to construct a predictive model for pancreatic cancer survival, exploring their impact on the immune landscape and tumor microenvironment (TME) communication.

What is the implication, and what should change now?

• The findings provide a novel biomarker panel for pancreatic cancer prognosis, potentially improving patient stratification for treatment.

• The study suggests that targeting sphingolipid metabolism could be a therapeutic strategy to modulate the immunosuppressive TME.

• Clinical practice: integrate sphingolipid-related gene data into clinical decision-making for pancreatic cancer.

• Research focus: investigate sphingolipid mechanisms in tumor progression and immune evasion.

• Therapeutic development: explore drugs targeting sphingolipid metabolism and the MIF pathway to improve patient outcomes.


Introduction

Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal malignant tumor with a rising incidence and mortality rate. According to the global cancer statistics for 2022, the annual incidence of pancreatic cancer worldwide is approximately 5 to 6 cases per 100,000 people. With the aging of the population and changes in lifestyle, especially in high-income countries such as North America, Europe, and Australia, the incidence of pancreatic cancer is on the rise (1). Despite its lower incidence compared to other cancers, the mortality rate is roughly equivalent to the incidence rate, and the 5-year survival rate is extremely low, typically less than 10%, making it one of the cancers with the highest mortality rates globally. The high mortality rate and poor prognosis of PDAC are due to the fact that it often has no obvious symptoms in its early stages, leading to more than 80% of patients being diagnosed at a locally advanced or metastatic stage (2,3). Even with various treatments such as surgery, chemotherapy, and radiotherapy, the survival of pancreatic cancer patients remains short, with a 5-year survival rate post-surgery ranging only between 10–25% (4). Therefore, identifying new oncogenes that play a role in the carcinogenesis of PDAC is crucial to improving the overall survival rate of patients.

Glycosphingolipids (GSLs) are a class of biomolecules composed of lipids and sugar moieties derived from sphingosine, playing a significant role in the cell membrane (5). They are not only an essential part of membrane structure but also participate in a variety of biological functions, such as cell signaling, cell proliferation, differentiation, and apoptosis (6). In tumor biology, the metabolic disorder of GSLs, as part of the seventh hallmark of cancer “abnormal energy metabolism”, is usually closely related to the occurrence and development of cancer. Studies have shown that the disruption of GSL synthesis and degradation pathways can lead to the accumulation or reduction of certain specific GSL molecules in pancreatic cancer cells, thereby affecting the biological behavior of cancer cells, such as enhancing their invasiveness, promoting metastasis, and resisting apoptosis (7,8). In addition, GSL synthesis is also involved in the immune escape mechanism of tumors (9,10). A clinical study has shown that the abnormal expression of GSL metabolic enzymes is also considered a marker of poor prognosis in pancreatic cancer (11). Therefore, GSLs and their metabolic pathways may provide new targets for the diagnosis and prognostic assessment of pancreatic cancer.

The continuous expansion of biological data has led researchers to increasingly use machine learning methods in the fields of biology and medicine to construct potential biological information and predictive models (12,13). Numerous gene combinations have been proposed for the prognostic prediction of pancreatic cancer. For example, a study utilized a gene set consisting of ARNTL2 and nine other genes, constructing a prognostic model for pancreatic cancer using least absolute shrinkage and selection operator (Lasso) regression analysis (14). Another gene set including FOXM1, TPX2, and OSBPL3, has been shown to potentially predict the malignant progression of pancreatic cancer (15). These models have demonstrated good predictive performance, with receiver operating characteristic (ROC) values exceeding 0.8. Additionally, due to the clinical need for precise patient stratification, several studies have explored molecular subtypes and prognostic models based on specific gene sets. Although these models generally have lower predictive efficacy compared to whole transcriptome-based prognostic models, they offer stronger clinical relevance due to their greater specificity. For example, Xu et al. constructed a prognostic model for pancreatic cancer patients based on the S100 family of molecules, with a one-year survival prediction ROC value of 0.654 (16). Furthermore, a study has applied Lasso regression analysis to construct prognostic models based on specific genes, with an ROC value of 0.75 (17).

In this study, we aim to describe the characteristics and prognostic indicators of pancreatic cancer based on GSL-related genes, thereby revealing the significance of GSLs in pancreatic cancer prognosis and precision therapy. Our goal is to provide new insights for the diagnosis and treatment of pancreatic cancer. Existing risk prediction models often have several limitations, such as poor predictive performance, limited applicability, single prediction functions, or the use of outdated algorithms. To optimize the model and construct one with the best predictive performance based on a specific gene set, we employ a diverse range of machine learning techniques to identify the most optimal prognostic risk model, with the aim of more accurately predicting patient prognosis. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-1893/rc).


Methods

Selection of GSL-related genes

To comprehensively understand the role of GSL-related genes in the survival prediction of pancreatic cancer, this part of the study obtained the “glycosphingolipid” related gene set in the “GeneSet” module of the PID database (https://maayanlab.cloud/Harmonizome/dataset/PID+Pathways). The detailed information is shown in the Table 1.

Table 1

Sphingolipid-related gene sets

Gene set name Gene number
Glycosphingolipid 10
Glycosphingolipids 17
Glycosphingolipid metabolic process 60
Glycosphingolipid catabolic process 32
Glycosphingolipid biosynthetic process 16
Glycosphingolipid binding 8
Abnormality of glycosphingolipid metabolism 8
Gangliosides 94

Data preparation

Single-cell pancreatic cancer datasets (GSE154778 and GSE155698) were obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). After integration and batch effect removal using the Harmony package, a total of 26 pancreatic cancer tissue samples were obtained. Transcriptional count data from The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and Genotype-Tissue Expression (GTEx) dataset were retrieved in the UCSC XENA database (https://xena.ucsc.edu/). Transcripts per million (TPM) normalized transcriptional data, along with matched survival data, were retrieved from the extracted data (https://xenabrowser.net/datapages/?cohort=TCGA%20TARGET%20GTEx&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443). Samples with the primary site “Pancreas” were extracted, which included pancreatic cancer and normal control samples. After excluding the TARGET dataset samples, 150 pancreatic tumor tissues from the TCGA dataset were used to construct the prognosis model for the training set. Subsequently, GEO datasets GSE62452 (69 pancreatic cancer samples) was selected for model validation as the validation set. The clinical diagnostic and prognostic information of the samples used in the two datasets can be found in UCSC XENA and previously published articles (18,19).

The training dataset for this study was sourced from the pancreatic cancer data within the TCGA database, with the majority of samples collected between 2008 and 2015. Long-term follow-up and updates on survival data for the patients enrolled are ongoing. The validation datasets, GSE62452, comprise pancreatic cancer samples from the University of Maryland Medical System in Baltimore, Maryland; however, the precise collection dates for these samples were not specified. All samples were definitively diagnosed with pancreatic cancer and were included in this study only if survival data were available and the follow-up period exceeded 30 days. A double-blind procedure was implemented for the subsequent model development and analysis using all samples. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Construction and validation of the sphingolipid-related survival prediction model

The TCGA-Pancreatic Adenocarcinoma (TCGA-PAAD) pancreatic cancer dataset was selected for use as the training set, and the GSE62452 datasets from the GEO database was used as the validation datasets. The data were imported in list format and named “TCGA” and “GSE62452”. The entire model analysis process ensured the reproducibility of the results by setting a random seed to 123 (‘seed = 123’). First, we applied functions such as ‘rfsrc’ and ‘cv.glmnet’ to construct pancreatic cancer prognostic models based on over one hundred algorithm combinations, including ‘random survival forests (RSF)’, ‘RSF+eNet’, and ‘RSF+CoxBoost’. After the models were constructed, we ranked them based on their concordance indices (C-index) and visualized the results using a heatmap generated by the ‘Heatmap’ function. All patients who have been definitively diagnosed with pancreatic cancer can utilize this model for prognostic prediction and survival assessment. C-index >0.70: generally considered a good model, indicating strong predictive ability and the capacity to effectively distinguish between different risk groups; C-index 0.61–0.70: indicates the model has moderate predictive ability; C-index 0.50–0.60: models within this range are usually considered poor, close to random guessing, and may require adjustments or improvements; C-index <0.50: indicates poor predictive ability, potentially worse than random guessing, and may require complete reconstruction or replacement of the model.

Single-cell dataset processing and enrichment analysis

We integrated two single-cell transcriptomic datasets of pancreatic cancer (GSE154778 and GSE155698) using the “Harmony” algorithm. Following standard procedures, including batch effect removal, dimensionality reduction, clustering, and cell annotation, we conducted further analysis of target gene expression. Subsequently, we selected acinar cells, tumor cells, and epithelial-mesenchymal transition (EMT) cells for subsequent enrichment analysis. Subsequently, the presence of target genes (MET, GBA2, DEFB1) within the single-cell data was confirmed, and their expression levels were extracted. A weighted score formula was calculated based on the weighted coefficients of each gene: weighted score = 0.469 * MET + (−0.357) * GBA2 + 0.103 * DEFB1. The calculated weighted scores were added to the meta.data of the single-cell object. Thereafter, the weighted scores were divided into high-expression (Score_HIGH) and low-expression (Score_LOW) groups based on the median, and downstream analyses were conducted based on this grouping. The FindMarkers function was utilized to perform differential gene expression analysis between cells with high and low weighted scores, and a set of significantly differential genes was selected. Volcano plots and violin plots were then generated to display the expression differences of high and low scores across various cell populations. For functional enrichment analysis, gene set enrichment analysis (GSEA), Gene Ontology (GO) enrichment analysis, and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were employed to explore the potential biological functions and pathways of the differential genes. The GO analysis primarily covers three aspects: biological process (BP), cellular component (CC), and molecular function (MF). KEGG pathway enrichment was used to identify key signaling pathways associated with differences in weighted scores. Finally, a bubble plot was created using dotplot and a sunburst graph was drawn using ggraph to display the enrichment results. The gseaplot2 function was also used to visualize the GSEA analysis results, showcasing the degree of enrichment and significance.

Gene mutation spectrum analysis

Utilizing the somatic mutation profiles from TCGA, we proceeded to conduct mutation analysis using the “maftools” R package. We calculated the tumor mutational burden (TMB) for individual patients and compare the high-risk and low-risk cohorts.

Immune infiltration analysis

Based on the “immunedeconv” package, we employed the “TME_deconvolution_all” for multiple tumor microenvironment (TME) deconvolution analyses to assess the differences in the composition of the TME within the TCGA-PAAD dataset. Using the trained model results ‘res’, in conjunction with the deconvolution analysis results ‘devo’, we generated heatmaps related to immune features. The ‘immuno_heatmap’ function was applied, employing the “RSF+StepCox[forward]” model to demonstrate the expression of immune features across different samples in the TCGA dataset.

Cell communication analysis

The R package “CellChat” was utilized to conduct cell communication analysis on the integrated single-cell dataset. A new CellChat object was created from the existing Seurat object, and cell types were added as cell metadata to the CellChat object. Subsequently, CellChat identified differentially expressed ligands and receptors for each cell group, associating each interaction with a probability value, thereby quantifying the communication between two cell groups mediated by these signaling molecules. CellChat was applied to analyze the ligand-receptor pairs of cell communication between cell subgroups, predicting the potential cell communication networks between cells.

Statistical analysis

Statistical analyses were conducted using R software (version 4.1.2; R Foundation for Statistical Computing, Vienna, Austria) and GraphPad PRISM (version 9.3.0; GraphPad Software). Differential expression analysis between cell groups was performed using a two-sided Wilcoxon rank-sum test with Bonferroni correction for false discovery rate (FDR). Group comparisons were analyzed using the Wilcoxon signed-rank test, while survival analysis was assessed through log-rank testing. All graphical representations were generated using specialized R package software. A threshold of P<0.05 was established for determining statistical significance throughout the study.


Results

Sphingolipid-related genes were highly expressed in pancreatic cancer tumor cells and macrophages

Existing studies have shown that sphingolipid metabolism reprogramming frequently occurs in pancreatic cancer, suggesting that sphingolipids may play an important role in the development and progression of this disease. As essential components of cell membranes, sphingolipids are involved in regulating cell growth, differentiation, and apoptosis. When sphingolipid metabolism is disrupted, it can affect the proliferation, migration, and invasion capabilities of tumor cells, thereby promoting tumor development and metastasis. Additionally, metabolites such as sphingosine and ceramide can act as signaling molecules, participating in the regulation of various tumor-related signaling pathways, including those involved in proliferation, apoptosis, and treatment response (20). Furthermore, abnormalities in sphingolipid metabolism can influence the immune escape phenomenon in the TME by modulating the functions of immune cells, ultimately affecting the prognosis of cancer patients (9,10).

To explore the role of sphingolipids in pancreatic cancer, we integrated several sphingolipid gene sets, choosing 143 intersecting genes (unique genes) to investigate their involvement in the occurrence and progression of pancreatic cancer. As shown in Figure 1A,1B, the selected 143 sphingolipid-related genes were primarily enriched in pathways related to sphingolipids, sphingolipid biosynthesis, and membrane lipid metabolism, confirming the enrichment pathways of this gene set. By integrating two single-cell transcriptome datasets of pancreatic cancer, removing batch effects, and performing dimensionality reduction, we obtained a total of 12 cell subpopulations, including macrophages, tumor cells, and natural killer (NK) cells (Figure 1C). All cell subpopulations were annotated using marker genes, such as keratin 19 (KRT19), keratin 8 (KRT8), and epithelial cell adhesion molecule (EPCAM) for epithelial cells, and CD4 molecule (CD4), CD3 delta chain (CD3D), and CD3 epsilon chain (CD3E) for immune cells (Figure 1D). After scoring the sphingolipid-related gene set, it was observed that these genes had higher scores primarily in tumor cells and immune-suppressive cells like macrophages (Figure 1E,1F). This indicates that sphingolipid-related genes play a significant role in pancreatic cancer tumors and immune-suppressive cells, potentially participating in tumorigenesis and immune escape in pancreatic cancer. The critical role of sphingolipid biosynthesis in mediating Kirsten rat sarcoma viral oncogene homolog (KRAS)-driven tumor immune escape provides new insights into the complex interactions between tumor cells and the immune system.

Figure 1 Expression of sphingolipid-related genes in various cell populations of pancreatic cancer. (A) A pie chart displaying the top 10 KEGG pathways enriched by sphingolipid-related genes, arranged by enrichment score. (B) A sunburst chart illustrating the top 4 GO pathways enriched by sphingolipid-related genes, with the 10 genes enriched in each pathway shown in the figure. (C) A UMAP plot showcasing the 12 cell subpopulations obtained from the integrated single-cell transcriptome dataset. (D) A bubble chart displaying the expression of marker genes across different cell subpopulations. (E) A UMAP plot illustrating the distribution of sphingolipid-related gene scores in different cell subpopulations. (F) A violin plot showing the sphingolipid gene set scores across various cell subpopulations. GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; NK, natural killer; UMAP, uniform manifold approximation and projection.

Sphingolipid-related gene sets could predict poor survival in pancreatic cancer patients

Subsequently, we identified independent risk factors for pancreatic cancer within the sphingolipid gene set and constructed a prognostic prediction model. After identifying independent prognostic risk factors for pancreatic cancer through univariate Cox regression analysis, a survival prediction model was constructed using machine learning algorithms. As shown in Figure 2A, the C-index of the survival prediction model based on the sphingolipid-related gene set, constructed using various machine learning algorithms, reached above 0.65 in the training set. Additionally, in the Test sets, the C-index reached 0.652, indicating that the sphingolipid-related gene prediction model demonstrates good predictive performance. We then selected the top-performing algorithm, RSF+StepCox[forward], to build the model for subsequent analysis. The specific model is represented as 0.469 × MET − 0.357 × GBA2 + 0.103 × DEFB1. The RSF+StepCox[forward] algorithm’s survival prediction model had a C-index of 0.854 in the TCGA dataset and a C-index of 0.652 in Test (GSE62452 set (Figure 2B). After conductingsurvival analysis based on this model, it was observed that patients in the high-risk factor group across the three datasets exhibited significantly poorer prognosis (P value in the TCGA dataset <0.001; P value in the GSE62452 dataset =0.02; P value in the GSE28735 dataset =0.04) (Figure 2C). After the model was constructed, we plotted the univariate Cox regression meta-analysis forest plot based on this model. As shown in the results of Figure 3A, the hazard ratio (HR) values in the TCGA and GSE62452 datasets were all greater than 2, with significant statistical significance.

Figure 2 Construction of a prediction model for sphingolipid-related features. (A) The C-index performance of the sphingolipid-related feature prediction model across different cohorts, sorted by the average C-index of the Train and Test cohorts. (B) A bar chart displaying the C-indices of the combined RSF+StepCox[forward] model in both the training set and the validation set. (C) Kaplan-Meier survival curves demonstrating the survival of the prediction model based on the RSF+StepCox[forward] method constructed using the “rs_sur” function in different datasets. C-index, concordance index; CI, confidence interval; Lasso, least absolute shrinkage and selection operator; RSF, random survival forests; TCGA, The Cancer Genome Atlas.
Figure 3 Meta-analysis of survival prediction model and key genes. (A) A forest plot showing the univariate Cox regression meta-analysis results of different datasets grouping patients based on the median risk score. (B) A UMAP plot displaying the expression of DEFB1, GBA2, and MET genes used in the model construction within the integrated single-cell dataset. CI, confidence interval; HR, hazard ratio; SE, standard error; TE, treatment effect; TCGA, The Cancer Genome Atlas; UMAP, uniform manifold approximation and projection.

Sphingolipid-related genes were highly expressed in pancreatic cancer and regulated multiple tumor and immune pathways

Following the construction of the sphingolipid-related pancreatic cancer survival prediction model in the previous section, we analyzed the expression of the variables used in the model within pancreatic cancer. As shown in Figure 3B, the GBA2 and MET molecules included in the sphingolipid-related gene survival prediction model were significantly overexpressed in tumor cells and the pancreatic intraepithelial neoplasia (PanIN) subgroup, with the DEFB1 gene mainly overexpressed in PanIN. Subsequently, we extracted tumor epithelial cells from the pancreatic cancer cohort dataset for further analysis. Based on the risk factor score from the survival model, all cells were divided into high and low expression groups. The volcano plot results in Figure 4A showed that genes such as transforming growth factor beta 2 (TGFB2), mucin 16 (MUC16), and Wnt family member 10A (WNT10A) were significantly upregulated in the high-risk factor expression group, while acinar cell-related transcription factors like protease serine 1 (PRSS1) were significantly downregulated, indicating that sphingolipid-related genes may be associated with an increase in the malignancy of pancreatic cancer cells. After GSEA enrichment analysis of the differentially expressed genes, it was observed that pathways related to Th1 and Th2 cell differentiation and primary immune deficiencies were downregulated, suggesting that sphingolipid-related genes may be involved in the regulation of the immune microenvironment (Figure 4B). The GO and KEGG pathway enrichment analysis results also showed that cells with high risk factor scores were significantly enriched in various tumor and immune-related pathways (Figure 4C,4D).

Figure 4 Differential expression and enrichment analysis of cells in high and low sphingolipid prognostic model score groups. (A) A volcano plot displaying the differentially expressed genes obtained using the “FindMarker” function after dividing cells into high and low expression groups based on the median risk factor score. (B) A GSEA plot illustrating the differential expression genes. (C,D) Show the enriched pathways obtained from GO and KEGG enrichment analyses of the differentially expressed genes, respectively. BP, biological process; CC, cellular component; FC, fold change; GO, Gene Ontology; GSEA, gene set enrichment analysis; KEGG, Kyoto Encyclopedia of Genes and Genomes; MF, molecular function.

Sphingolipid-related high- and low-risk pancreatic cancer patients exhibited higher mutation burdens and enrichment of malignant oncogenic pathways

KRAS mutations play a pivotal role in tumor immune evasion and systemic pathological remodeling, particularly in PDAC, where the mutation rate exceeds 90% (21). KRAS mutations are involved in multiple mechanisms, including the hijacking of muscle energy, activation of protein degradation pathways, induction of oxidative stress, and remodeling of the immune microenvironment (22). Additionally, a study has indicated that sphingolipid synthesis plays a critical role in mediating KRAS-driven tumor immune evasion, potentially facilitating complex interactions between tumor cells and the immune system (10). Subsequently, we analyzed the mutational spectrum and immune characteristics in patients with high and low risk scores in the sphingolipid-related prognostic model for pancreatic cancer. As shown in Figure 5A,5B, there were more mutations in KRAS, tumor protein p53 (TP53) and cyclin-dependent kinase inhibitor 2A (CDKN2A) in the high-risk score group. In Figure 5C, we observed that patients in the high-risk group have a higher frequency of Frame_Shift_Ins and Nonsense_Mutation, which significantly impact protein function, while the low-risk group has a higher proportion of missense mutations. The total TMB and number of mutations in the high-risk group were also significantly higher than those in the low-risk group (Figure 5D,5E). These results suggest that patients in the high-risk group may exhibit more pronounced genomic instability and are more likely to have genomic alterations that affect protein function. We observed different patterns of mutation co-occurrence and exclusivity between the high- and low-risk groups, indicating that there may be different synergistic or synthetic lethal effects among genes in the two groups of patients, providing new insights into the genetic changes in tumor samples and the mechanisms of cancer development (Figure 5F). After conducting drug-mutation interaction analysis for the two groups of patients, it was found that patients in the high-risk group are more likely to have effects on drug targets such as histone modifications and ion channels, offering potential avenues for clinical treatment (Figure 5G). The results of the mutation pathway enrichment analysis showed that patients in the high-risk group have higher enrichment scores in various oncogenic pathways, including transforming growth factor beta (TGF-β), NOTCH, and rat sarcoma (RAS) (Figure 5H).

Figure 5 Genomic mutational spectrum characteristics of high- and low-risk sphingolipid-related patients. (A) An oncoplot displaying the top 30 mutated genes in high- and low-risk groups, with bar charts above showing the TMB of each sample, stacked bar charts on the side displaying mutation rates of different genes, and bar charts below showing the percentage of various base substitution types. (B) Grouped bar charts presenting differentially mutated genes between high- and low-risk groups, with odds ratios indicated as follows: ***, P<0.001; Fisher’s exact test. (C) Percentage bar charts showing the proportion of different types of mutations in high- and low-risk groups. (D) Violin plots illustrating the TMB of high- and low-risk groups, with a T-test applied. (E) Density plots presenting the distribution density of mutation numbers in high- and low-risk groups, with the horizontal axis representing mutation numbers and the vertical axis representing mutation density. (F) Exclusive and co-occurring gene pairs in high- and low-risk groups are displayed as a triangular matrix, with green indicating a co-occurring trend and brown indicating an exclusiveness trend. (G) Bar charts showing the drug-gene interactions in high- and low-risk groups, with the vertical axis representing drug entries and the horizontal axis representing mutated genes involved in each drug. (H) Oncogenic pathways enriched in high- and low-risk groups. OR, odds ratio; TMB, tumor mutational burden.

Sphingolipid-related genes contributed to the formation of the immunosuppressive microenvironment

TME of PDAC is characterized by a highly desmoplastic and immunosuppressive stroma. Its intricate network of cellular components—including cancer-associated fibroblasts (CAFs), dense extracellular matrix (ECM), and dysregulated immune cells [e.g., regulatory T cells (Tregs), M2-polarized macrophages]—collectively establishes an “immune-cold” barrier, resulting in poor response rates to conventional immunotherapies such as programmed cell death protein 1 (PD-1)/programmed death-ligand 1 (PD-L1) inhibitors (23). A recent preclinical study has demonstrated that reshaping the TME can reactivate antitumor immunity, offering new potential for immunotherapeutic strategies (24). Previous enrichment analyses suggest that sphingolipid-related genes may modulate immune-associated pathways to influence the TME. Building on this evidence, we employed multiple computational algorithms to perform immune infiltration analysis on high- and low-risk score subgroups within the TCGA-PAAD dataset, aiming to further elucidate the relationship between our risk signature and the immunomodulatory landscape of pancreatic cancer. The estimate algorithm analysis revealed that the immune score, stromal score, and microenvironment score were significantly higher in the low-risk group compared to the high-risk group. Additionally, results from xCell and CIBERSORT, among other algorithms, indicated that the infiltration of immune cytotoxic cells such as CD4+ T cells and CD8+ T cells was also higher in the low-risk group (Figure 6A). Correlation analysis and heatmaps demonstrated consistent conclusions (Figure 6B,6C). Apart from the low infiltration of immune cytotoxic cells, high-risk pancreatic cancer patients showed a higher infiltration of immunosuppressive cells such as macrophages. As shown in Figure 6D, box plots indicate that the infiltration levels of B cells and NK cells were lower in the high-risk group compared to the low-risk group patients, while the immune infiltration levels of macrophages M0 and M2 were higher. M0 macrophages are intermediate transitional macrophages that may differentiate into M1 or M2, with the specific direction of differentiation potentially requiring further analysis using single-cell transcriptome datasets. The immune infiltration analysis results suggest that sphingolipid-related molecules may promote the occurrence and development of pancreatic cancer by mediating the generation of an immunosuppressive microenvironment.

Figure 6 Immune infiltration analysis of high and low score groups in the sphingolipid prognostic model. (A) A heatmap displaying the results of immune infiltration analysis performed by algorithms such as xcell, epic and abis on combined pancreatic cancer samples from the training and validation sets; the colors in the heatmap represent immune infiltration scores from different algorithms. (B) A lollipop plot illustrating the correlation between risk scores and various immune cells. (C) A heatmap showing the immune infiltration predicted by TIMER and CIBERSORT algorithms in the TCGA-PAAD dataset for each sample. (D) Box plots presenting the immune infiltration of various immune cells in high- and low-risk groups of patients, with a T-test indicating significance levels, *, P<0.05; **, P<0.01. NK, natural killer; TCGA-PAAD, The Cancer Genome Atlas-Pancreatic Adenocarcinoma.

Sphingolipid-related genes mediated unique cell communication patterns through the macrophage migration inhibitory factor (MIF) pathway

We applied “CellChat” cell communication analysis to further explore the cell communication patterns between high- and low-risk groups. The results shown in Figure 7A indicated that high-risk score cells were primarily distributed in PanIN cells and tumor cell subgroups, and also had a higher distribution in macrophage and fibroblast subgroups. The overall number and intensity of cell communications in high-risk group pancreatic cancer patients were lower than those in the low-risk group, but the number or intensity of cell communications related to tumor cells, PanIN cells, NK cells, and macrophages were higher in the high-risk group (Figure 7B,7C, Figure S1A). Analysis of cell communication pathways revealed that the cell communication signals in high-risk group patients were significantly enriched in tumor growth-related pathways such as claudin (CLDN), endothelin (EDN), and fibroblast growth factor (FGF), with endothelin pathway (EDN pathway)-related cell communications being present only in high-risk group patients (Figure 7D, Figure S1B). The EDN pathway could promote tumor cell proliferation, migration, invasion, and angiogenesis, and also played an important role in immune regulation. This further corroborated the role of sphingolipid-related genes in the malignant biological behaviors of tumors.

Figure 7 Analysis of cell communication patterns in high and low risk patients. (A) A UMAP plot illustrating the spatial distribution of cells in high- and low-risk groups, with red indicating cells from the high-risk group and blue indicating cells from the low-risk group. (B) A faceted bar chart displaying the number and strength of cell communications in patients of high- and low-risk groups. (C) A circle plot summarizing the differential receptor-ligand pair numbers (interactions) and communication strength between high- and low-risk groups. The thicker the line, the stronger the interaction. Lines and points in the same color indicate that the point represents a source cell and the other end is a target cell; red signifies stronger cell communications in the high-risk group, while blue indicates stronger cell communications in the low-risk group. (D) A stacked bar chart showing differential cell communication-related pathways in high- and low-risk groups, with red representing pathways that are more significant in the high-risk group and blue representing pathways that are more significant in the low-risk group. (E) A circle plot demonstrating the number of cell communications between tumor cells, PanIN cells, macrophages, and NK cells. (F) A bubble chart illustrating specific ligand-receptor pairs between tumor cells/PanIN cells and macrophages/NK cells. The color of the bubble represents the proportion of the probability sum of the ligand-receptor in all subgroups to the probability sum of all ligand-receptors in the pathway across all subgroups, with the size of the bubble representing the P value. NK, natural killer; PanIN, pancreatic intraepithelial neoplasia; UMAP, uniform manifold approximation and projection.

Subsequently, we conducted an in-depth analysis of the cell communication patterns among cancer cells, PanIN cells, NK cells, and macrophages, which showed higher numbers and intensities of cell communications in the high-risk group. As shown in Figure 7E, there were more cell communications between PanIN cells and tumor cells, macrophages, and NK cells in the high-risk group patients, suggesting that sphingolipid-related genes may play a more significant role during precancerous lesions. Figure 7F illustrated the cell communication ligand-receptor pairs when tumor cells and PanIN cells acted as ligands and macrophages and NK cells acted as receptors. It could be seen that immune escape-related ligand-receptor pairs such as amyloid precursor protein (APP)-cluster of differentiation 74 (CD74) and MIF-CD74 existed in both high and low expression groups of tumor cells and receptor cells, while cell communications between PanIN cells and macrophages and NK cells were only present in high-risk cells. Analysis of differentially expressed ligand-receptor pairs in high-risk group cells also revealed that PanIN cells showed heightened activity of immune escape-related pairs, including MIF-CD74 and APP-CD74, particularly in interactions with macrophages and NK cells (Figure 8A).

Figure 8 Analysis of MIF-related cell communication patterns in high and low risk patients. (A) A bubble chart illustrating the upregulated and downregulated ligand-receptor pairs in cells of the high-risk group. (B) The upper heatmaps show the likelihood of various cell subsets playing different roles such as signal senders and receivers in the macrophage MIF pathway. The lower heatmaps display the relative strength of signal sending by different cell subsets in high- and low-risk groups. (C) A circle plot showing the number of MIF-related cell communications between tumor cells, PanIN cells, macrophages, and NK cells. (D) A bar chart presenting the contribution of major ligand-receptor pairs in the MIF signaling pathway, with the horizontal axis representing the relative contribution of each ligand-receptor pair to the pathway. (E) A bubble chart demonstrating MIF-related ligand-receptor pairs between tumor cells/PanIN cells and macrophages/NK cells. (F) A bubble chart showing the expression intensity of key genes in the MIF pathway in cells of high- and low-risk groups. L-R, ligand and receptor; MIF, migration inhibitory factor; NK, natural killer; PanIN, pancreatic intraepithelial neoplasia.

Based on the significantly increased cell communication signals of the MIF pathway in PanIN cells of the high-risk group, we speculated that the MIF pathway may be an important target for the action of sphingolipid-related genes in pancreatic cancer. Analysis of the MIF pathway cell communication patterns among various cells indicated that tumor cells, fibroblasts, and PanIN cells mainly act as senders and influencers of MIF signals; macrophages could participate in the most MIF-related cell communications as senders, receivers, and influencers (Figure 8B, Figure S1C,S1D). PanIN cells and macrophages in the high-risk group could send significantly more MIF-related signals (Figure 8B). The number of the MIF signal pathway between PanIN cells and tumor cells shown in Figure 8C also proved this point. Analysis of MIF-related ligand-receptor pairs revealed that the related ligand-receptor pairs present in the high-risk group are mainly MIF-CD74+CXCR4, MIF-CD74+CD44, and MIF-ACKR3 (Figure 8D, Figure S1E). Among them, tumor cells/PanIN cells as signal senders have the main ligand-receptor pairs with macrophages as MIF-CD74+CXCR4, and the most likely ligand-receptor pairs with NK cells are MIF-CD74+CD44 (Figure 8E). Analysis of key genes in the MIF pathway for high- and low-risk group patients showed that the expression of MIF, CD74, and CD44 is higher in the high-risk group (Figure 8F).


Discussion

This study is the first to apply sphingolipid-related genes in the construction of a survival prediction model for pancreatic cancer. We utilized more than 100 ensemble models created by combining 10 machine learning methods to analyze the relationship between sphingolipid metabolism and pancreatic cancer prognosis, aiming to reveal the potential role of sphingolipids as biomarkers in pancreatic cancer outcomes. Through deep learning of clinical data and molecular features, we not only confirmed the importance of sphingolipid metabolism in pancreatic cancer but also further explored its impact on patient prognosis (7).

We employed a combination of various machine learning algorithms, including Lasso, Cox regression, support vector machines (SVM), random forests (RF), and gradient boosting machine (GBM), to systematically analyze the expression data of sphingolipids and clinical prognosis data. These models were trained using a large sample of pancreatic cancer patients’ survival times and outcomes, capturing the complex nonlinear relationships between sphingolipids and pancreatic cancer prognosis. In cross-validation, all models demonstrated high accuracy, indicating a significant association between sphingolipid metabolism and the survival of pancreatic cancer patients. Notably, the RSF+StepCox[forward] model exhibited strong predictive capability, uncovering potential mechanisms of action for sphingolipids in pancreatic cancer. For instance, we found that high expression levels of certain specific gangliosides (MET and DEFB1) were associated with poorer prognosis, providing new clues for further biological mechanism research. Additionally, through pathway enrichment analysis and immune infiltration analysis, we discovered that sphingolipid-related molecules may participate in the occurrence and development of pancreatic cancer by regulating the immune infiltrative microenvironment (16). Our study demonstrated good predictive performance in both the training set and multiple validation sets, but it lacks validation through biological experiments. Additionally, our model is limited to transcriptomic data and has not been integrated with clinical data, genomic information, or methylation data. In future, researchers can further investigate the functions of these genes in the progression of pancreatic cancer and the underlying mechanisms, offering new directions and evidence for future targeted therapies.


Conclusions

In conclusion, our study constructed a pancreatic cancer-related survival time prediction model using sphingolipid-related genes through a combination of over 100 machine learning methods, achieving good predictive performance with various algorithm combinations. Enrichment analysis and immune infiltration analysis revealed that the immune scores, stromal scores, and microenvironment scores in the low-risk factor score group were significantly higher than those in the high-risk group. Furthermore, the immune infiltration of cytotoxic cells, such as CD4+ T cells and CD8+ T cells, was also higher in the low-risk group. Therefore, sphingolipid-related molecules may promote the occurrence and development of pancreatic cancer by mediating the generation of an immunosuppressive microenvironment.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-1893/rc

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-1893/prf

Funding: The work was partly supported by National Natural Science Foundation of China (No. 82072632), Guangzhou Municipality Bureau of Science and Technology, Guangzhou, China (No. 202102010033), and Natural Science Foundation of Guangdong Province, China (No. 2022A1515012585).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-1893/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
  2. Stathis A, Moore MJ. Advanced pancreatic carcinoma: current treatment and future challenges. Nat Rev Clin Oncol 2010;7:163-72. [Crossref] [PubMed]
  3. Michael NL, Krell RW. Pancreatic cancer: a haystack of needles. J Gastrointest Oncol 2024;15:2743-4. [Crossref] [PubMed]
  4. Deplanque G, Demartines N. Pancreatic cancer: are more chemotherapy and surgery needed? Lancet 2017;389:985-6. [Crossref] [PubMed]
  5. Gupta G, Surolia A. Glycosphingolipids in microdomain formation and their spatial organization. FEBS Lett 2010;584:1634-41. [Crossref] [PubMed]
  6. Demarco ML, Woods RJ, Prestegard JH, et al. Presentation of membrane-anchored glycosphingolipids determined from molecular dynamics simulations and NMR paramagnetic relaxation rate enhancement. J Am Chem Soc 2010;132:1334-8. [Crossref] [PubMed]
  7. Rivera IG, Ordoñez M, Presa N, et al. Ceramide 1-phosphate regulates cell migration and invasion of human pancreatic cancer cells. Biochem Pharmacol 2016;102:107-19. [Crossref] [PubMed]
  8. Zhang Y, Ji S, Zhang X, et al. Human CPTP promotes growth and metastasis via sphingolipid metabolite ceramide and PI4KA/AKT signaling in pancreatic cancer cells. Int J Biol Sci 2022;18:4963-83. [Crossref] [PubMed]
  9. Jongsma MLM, de Waard AA, Raaben M, et al. The SPPL3-Defined Glycosphingolipid Repertoire Orchestrates HLA Class I-Mediated Immune Responses. Immunity 2021;54:132-150.e9. [Crossref] [PubMed]
  10. Soula M, Unlu G, Welch R, et al. Glycosphingolipid synthesis mediates immune evasion in KRAS-driven cancer. Nature 2024;633:451-8. [Crossref] [PubMed]
  11. Wolrab D, Jirásko R, Cífková E, et al. Lipidomic profiling of human serum enables detection of pancreatic cancer. Nat Commun 2022;13:124. [Crossref] [PubMed]
  12. Greener JG, Kandathil SM, Moffat L, et al. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022;23:40-55. [Crossref] [PubMed]
  13. Kourou K, Exarchos TP, Exarchos KP, et al. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 2015;13:8-17. [Crossref] [PubMed]
  14. Wu M, Li X, Zhang T, et al. Identification of a Nine-Gene Signature and Establishment of a Prognostic Nomogram Predicting Overall Survival of Pancreatic Cancer. Front Oncol 2019;9:996. [Crossref] [PubMed]
  15. Chhatriya B, Mukherjee M, Ray S, et al. Transcriptome analysis identifies putative multi-gene signature distinguishing benign and malignant pancreatic head mass. J Transl Med 2020;18:420. [Crossref] [PubMed]
  16. Xu ZJ, Li JA, Cao ZY, et al. Construction of S100 family members prognosis prediction model and analysis of immune microenvironment landscape at single-cell level in pancreatic adenocarcinoma: a tumor marker prognostic study. Int J Surg 2024;110:3591-605. [Crossref] [PubMed]
  17. Chen L, Zhang X, Zhang Q, et al. A necroptosis related prognostic model of pancreatic cancer based on single cell sequencing analysis and transcriptome analysis. Front Immunol 2022;13:1022420. [Crossref] [PubMed]
  18. Yang S, He P, Wang J, et al. A Novel MIF Signaling Pathway Drives the Malignant Character of Pancreatic Cancer by Targeting NR3C2. Cancer Res 2016;76:3838-50. [Crossref] [PubMed]
  19. Cancer Genome Atlas Research Network. Electronic address: andrew_aguirre@dfci.harvard; . Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma. Cancer Cell 2017;32:185-203.e13. [Crossref]
  20. Hannun YA, Obeid LM. Principles of bioactive lipid signalling: lessons from sphingolipids. Nat Rev Mol Cell Biol 2008;9:139-50. [Crossref] [PubMed]
  21. Varghese AM, Perry MA, Chou JF, et al. Clinicogenomic landscape of pancreatic adenocarcinoma identifies KRAS mutant dosage as prognostic of overall survival. Nat Med 2025;31:466-77. [Crossref] [PubMed]
  22. Guo Y, Han S, Yu W, et al. Deciphering molecular crosstalk mechanisms between skeletal muscle atrophy and KRAS-mutant pancreatic cancer: a literature review. Hepatobiliary Surg Nutr 2025;14:78-95. [Crossref] [PubMed]
  23. Galon J, Bruni D. Tumor Immunology and Tumor Evolution: Intertwined Histories. Immunity 2020;52:55-81. [Crossref] [PubMed]
  24. Zhou X, Springfeld C, Roth S, et al. Tumour plasticity and tumour microenvironment interactions as potential immunologic targets for pancreatic cancer treatment. Chin Clin Oncol 2024;13:85. [Crossref] [PubMed]
Cite this article as: Zou Q, Jiang H, Sun Q, Peng Q, He J, Xie K, Wei F. Machine learning-driven prognostic model based on sphingolipid-related gene signature in pancreatic cancer: development and validation. Transl Cancer Res 2025;14(5):2779-2796. doi: 10.21037/tcr-24-1893

Download Citation