Constructing and validating a novel prognostic risk score model for rectal cancer based on four immune-related genes
Original Article

Constructing and validating a novel prognostic risk score model for rectal cancer based on four immune-related genes

Ruyun Cai, Zhonghua Hong, Hezhai Yin, Huilin Chen, Mengting Qin, Yihong Huang

Department of Surgery, Jiaxing Hospital of Traditional Chinese Medicine, Zhejiang Chinese Medical University, Jiaxing, China

Contributions: (I) Conception and design: R Cai; (II) Administrative support: H Yin, H Chen; (III) Provision of study materials or patients: R Cai, Z Hong, H Yin; (IV) Collection and assembly of data: H Chen, M Qin, Y Huang; (V) Data analysis and interpretation: R Cai, Z Hong, M Qin, Y Huang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Hezhai Yin, MD. Department of Surgery, Jiaxing Hospital of Traditional Chinese Medicine, Zhejiang Chinese Medical University, No. 1501 Zhongshan East Road, Jiaxing 314000, China. Email: 873628502@qq.com.

Background: Immunotherapy is playing an increasing role in the treatment of various cancers. However, its application in rectal cancer is very limited as only microsatellite-unstable bowel cancers with defective mismatch repair are found to benefit. The majority of rectal cancers belong to the microsatellite-stable phenotype. Therefore, the aim of this study is to explore immune-related genes within the tumor microenvironment of rectal cancer, with the objective of discovering novel biomarkers and therapeutic targets for rectal cancer, and to establish a new prognostic prediction model for rectal cancer based on these immune-related genes.

Methods: The data in The Cancer Genome Atlas (TCGA) database were processed using the Estimation of Stromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE) algorithm to obtain differently expressed genes (DEGs). Then the DEGs were analyzed by Gene Ontology (GO), Kyoto Encyclopedia of Gene and Genomes (KEGG), Reactome function enrichment analysis, and protein-protein interaction (PPI) analysis to screen the core genes, which were utilized to compute the risk scores of individual patients. Finally, combining risk scores and clinical characteristics, a new prognostic prediction model was established by univariate and multivariate Cox analyses, and the prognostic model was validated by the Gene Expression Omnibus (GEO) database.

Results: The study finally identified four core genes (CYBB, CCR4, FOXP3, and CD80), and immune cell infiltration analyses of the four core genes showed that their expression levels were positively correlated with the distribution of various immune cells. The 4-gene risk score categorized rectal cancer patients into high-risk and low-risk groups, and the results showed that the low-risk group had a stronger correlation with the immune response and had a better prognosis. A prognostic model was developed by integrating risk scores and clinical characteristics and showed a strong predictive effect.

Conclusions: In patients with rectal cancer, CYBB, CCR4, FOXP3, and CD80 are immune-related core genes, and low expression of each gene is associated with poor clinical prognosis. The risk score obtained on their basis is independent prognostic factors for rectal cancer, suggesting that the four core genes may provide a foundation for the development of new prognostic biomarkers for rectal cancer and the study of immunotherapy.

Keywords: Rectal cancer; tumor microenvironment; immune infiltration; survival analysis; prognostic model


Submitted Aug 25, 2024. Accepted for publication Dec 17, 2024. Published online Feb 26, 2025.

doi: 10.21037/tcr-24-1511


Highlight box

Key findings

• This study identified four immune-related genes in rectal cancer, namely, CYBB, CCR4, FOXP3, and CD80. The risk score derived from these four genes emerged as an independent prognostic factor for rectal cancer outcomes, underscoring its potential significance in clinical management and treatment strategies.

What is known and what is new?

• The incidence and mortality rates of rectal cancer remain alarmingly high, while the application of current immunotherapy options in this malignancy is still limited.

• The risk score based on four genes has been demonstrated to correlate with the prognosis of rectal cancer patients, as well as the degree of immune cell infiltration.

What is the implication, and what should change now?

CYBB, CCR4, FOXP3, and CD80 have emerged as potential novel targets for investigating immunotherapy in rectal cancer.


Introduction

Over the past few decades, the global incidence of new colorectal cancer (CRC) cases has demonstrated a consistent upward trend, making CRC the third most commonly diagnosed cancer worldwide in recent years (1). Rectal cancer, as a core component of CRC, poses a significant threat to global public health due to its notable incidence and mortality rates (2). Distant metastases remain a significant contributor to the mortality of CRC patients. Approximately 22% of CRC patients develop metastases during the course of their disease. When distant metastases occur, the 5-year survival rate drops sharply from 64.4% to 14.2% (3). For locally advanced rectal cancer, existing study has shown that the local recurrence rate of resectable stage II to III tumors is 15% to 65%, while stage III patients still have a recurrence rate of 20% to 30% after total mesorectal excision (4). Current treatment for advanced rectal cancer with distant metastases or locally advanced rectal cancer consists of total mesorectal excision, radiotherapy, and chemotherapy (5). However, the increasing number of chemotherapy-resistant cases has led to an urgent need for research on novel treatment options. Although there has been an increasing number of studies on immunotherapy for CRC in recent years, the existing studies on programmed death 1 (PD-1) and cytotoxic T-lymphocyte antigen 4 (CTLA-4), as well as the clinical trials conducted, have mainly targeted microsatellite-unstable bowel cancer (6). However, the majority of rectal cancers belong to the microsatellite stable type, which is not effective due to the low density of tumor-infiltrating lymphocytes (7,8). Therefore, there is a need for the development of new immune checkpoints that can be used for treatment.

The tumor microenvironment is the cellular environment surrounding the tumor within the organism. It includes various types of cells besides tumor cells, such as immune cells, fibroblasts, and endothelial cells. Additionally, it comprises several non-cellular components such as the extracellular matrix, cytokines, chemokines, and receptors (9). It has a significant impact on tumor growth, metastasis, and chemoresistance (10). Immune cells and stromal cells are the two main types of non-tumor components considered important for the diagnosis and prognostic assessment of tumors (10). Previous research has uncovered that the tumor immune microenvironment, especially CD3 and CD8 T cells, plays a crucial role in the progression of CRC (11). Currently, there are numerous studies on the tumor immune microenvironment (12-14), among which “Immunoscore” is one (15). It assesses prognosis by analyzing the number of two lymphocyte populations (CD3/CD45RO, CD3/CD8, or CD8/CD45RO) in both the central tumor area and the invasive margin (15). Compared to the current Tumor-Node-Metastasis (TNM) staging system for CRC, this score has been proven to have superior classification and prognostic capabilities (16-19). It has been recognized by several cancer immunology societies worldwide (20). Stromal cells have been discovered to facilitate CRC subtype conversion, modulate the body’s anti-tumor immune response, and modify tumor sensitivity to immunotherapy (21). Moreover, the expression of stromal cell-associated genes can more effectively predict the prognosis of different CRC subtypes compared to genes derived from epithelial tumor cells (22).

The Estimation of Stromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE) algorithm was developed by Yoshihara et al. and utilizes gene expression data to estimate the levels of infiltrating immune and stromal cells in the tumor microenvironment, as well as tumor purity (23). Previous studies have applied this algorithm to analyze various types of cancer, including hepatocellular carcinoma (16,24), cutaneous malignant melanoma (25), and colon cancer (18,26), demonstrating the effectiveness of this big data-based algorithm.

Therefore, this study aimed to investigate immune-related genes within the tumor microenvironment of rectal cancer using the ESTIMATE algorithm, with the objective of identifying and developing novel biomarkers and therapeutic targets for rectal cancer. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-1511/rc).


Methods

Data collection and processing

The gene expression data for rectal cancer (n=71) and rectosigmoid junction cancer (n=51) were downloaded from The Cancer Genome Atlas (TCGA) database (https://www.cancer.gov/ccg/research/genome-sequencing/tcga). The data types were specified as high-throughput sequencing (HTSeq)-Counts and HTSeq-fragments per kilobase million (FPKM). Subsequently, the data were integrated and processed to generate the gene expression matrix. Since the transcripts per million (TPM) data type involves normalizing gene length first and then sequencing depth on the raw data, TPM is more suitable for inter-sample comparison compared to HTSeq-FPKM (27,28). Since the gene names in the TCGA database are numbered with Ensembl and annotated with GENCODE v22, the gencode.v22.annotation.gtf.gz file downloaded from the TCGA website was used to convert the gene names to gene symbols. The corresponding clinical data were also downloaded, and cases were screened based on clinical characteristics. Cases with missing data, such as age, gender, TNM stage, survival time, and those with a follow-up time of less than 30 days, were excluded.

Analysis of immune score and stromal score

The gene expression matrix of rectal cancer was analyzed using the “ESTIMATE” package in R to obtain the immune score, stromal score, and ESTIMATE score (23). The “surv_cutpoint” function in the “survminer” package was then applied to calculate the best cutoff values for the three subjects based on survival time (29), and they were divided into high and low scoring groups accordingly. Kaplan-Meier curve and log-rank analysis were used to determine the difference in survival between the high and low score groups.

Identification of differentially expressed genes (DEGs)

To compare the differences in gene expression between the high and low groups in the immune and stromal scores, we used the “edgeR” package to analyze them separately (30). Gene expression fold change >2 [|log2(fold change)| >1] and P value <0.05 were used as the threshold values for screening TPM. Volcano plots were generated using “R” to visualize the statistically significant up- and down-regulated genes. The genes that were commonly expressed with the same trend in both groups were then identified as the final least absolute shrinkage and selection operator (LASSO) using a Venn diagram.

Enrichment analysis of DEGs

The Gene Ontology (GO) analysis was initially conducted on all DEGs, covering three main aspects: biological processes, molecular functions, and cellular components. Secondly, Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome pathways were enriched in the KEGG Orthology Based Annotation System (KOBAS, http://kobas.cbi.pku.edu.cn/) database, and the top ten pathways in the enrichment results were displayed in bubble plots.

Constructing protein interaction networks and screening the most important molecular complexes

The Search Tool for the Retrieval of Interacting Genes (STRING, https://string-db.org/) database is a web-based online analysis tool that enables the construction and refinement of protein-protein interaction (PPI) networks, including direct binding between proteins and upstream-downstream regulatory relationships. We uploaded the identified DEGs to the STRING database for analysis and imported the results into Cytoscape software for reconfiguration, excluding proteins with interaction nodes ≤1. The remaining proteins were sorted based on the number of interaction nodes using the cytoHubba plugin, and the PPI network map was finalized. The Molecular Complex Detection (MCODE) plugin, which is based on a clustering algorithm, was also used to identify the molecular modules with the highest number of nodes and edges in the PPI network.

Screening of core genes

Univariate Cox analysis was conducted on all DEGs using overall survival (OS) (the time from the diagnosis of rectal cancer until death due to any cause) as the endpoint. Genes associated with prognosis were selected based on a threshold of P value <0.05. The screened gene set was intersected with genes in the molecular module, and genes that were common to both were selected for further analysis. To further reduce the bias caused by covariance data, the LASSO regression analysis was performed, and the genes obtained were used as the core genes.

Analysis of core genes

Survival curves were plotted for each core gene to visualize their relationship with survival time. The Tumor Immune Estimation Resource (TIMER) 2.0 database (http://timer.cistrome.org/) was utilized to analyze the correlation between each core gene and the distribution of six different immune cells (B cells, CD4+ T cells, CD8+ T cells, macrophages, myeloid dendritic cells, and neutrophils) in rectal cancer.

Calculation and analysis of risk scores

The core genes underwent multivariate Cox regression analysis to determine the regression coefficients. These coefficients were then used in conjunction with the gene expressions to compute the risk score using the formula: risk score = −0.29021*CYBB − 0.10752*CCR4 − 0.19254*FOXP3 + 0.08065*CD80. Similarly, the optimal cut-off value was determined based on survival time to divide the risk score into high-risk and low-risk groups. Kaplan-Meier curves were plotted to analyze the difference in survival between the two groups. The cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORT) can be used as a tool to evaluate the density of various cell types within a cell population by analyzing gene expression levels. The analysis was conducted to examine the variance in immune cell infiltration between the high- and low-risk groups. Finally, the “clusterProfiler” package was used to analyze the gene set enrichment of different risk groups (31).

Construction and validation of prognostic models

Clinical characteristics such as sex, age, and TNM stage were integrated with risk scores in a univariate Cox analysis. Subsequently, statistically significant variables were incorporated into a multivariate Cox analysis to develop a model for predicting prognosis. This model was visually represented using a nomogram. The GSE103479 dataset from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) database collected stage II and III CRC cases. We extracted 23 of these rectal cancer cases for validation of the prognostic model. The consistency index (C-index) and receiver operating characteristic (ROC) curve were used to demonstrate the predictive efficacy of the model.

Statistical analysis

The univariate and multivariate Cox analyses, Kaplan-Meier analysis, and LASSO regression analysis conducted in this paper were performed using the R language (version 3.6.2, www.r-project.org). All analyses were considered statistically significant with P value <0.05.

Ethical statement

The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).


Results

Characteristics

Based on the case screening criteria, 122 cases of rectal cancer, including recto-basilar junction cancer, were ultimately included in the TCGA database. Age was categorized according to the mean value of 65 years. There were 68 patients aged 65 years or younger and 54 patients above 65 years old, with a male-to-female ratio of approximately 1.18:1. In classifying the T-stage statistics, it was found that most patients were located in T3 stage (71.31%), while relatively few patients were in T1, T2, and T4 stages (4.10%, 18.85%, and 5.74%). In terms of the N stage, 81.97% of the patients were in relatively early stages (N0 versus N1), of which 53.28% had no apparent lymph node metastases. Among the patients, the majority (104 patients, 85.25%) had no distant metastases. Due to the limited number of rectal cancer patients with survival data in the GEO database, we ultimately chose to analyze 23 cases with comprehensive clinical and follow-up information from the GSE103479 dataset. Twenty-three patients were diagnosed with stage II or III rectal cancer. Due to the limited amount of data, there was a significant disparity in age and gender distribution. The distribution of T and N stages was similar to that of the TCGA database, with 78.26% of patients in the T3 stage and 47.83% in the N0 stage. The specific clinical information characteristics of patients in the TCGA database and GSE103479 dataset are shown in Table 1.

Table 1

Clinical characteristics of patients in the TCGA database and GSE103479 dataset

Variables TCGA (n=122) GSE103479 (n=23)
Sex
   Female 56 (45.90) 9 (39.13)
   Male 66 (54.10) 14 (60.87)
Age (years)
   ≤65 68 (55.74) 9 (39.13)
   >65 54 (44.26) 14 (60.87)
T stage
   T1 5 (4.10) 0 (0)
   T2 23 (18.85) 2 (8.70)
   T3 87 (71.31) 18 (78.26)
   T4 7 (5.74) 3 (13.40)
N stage
   N0 65 (53.28) 11 (47.83)
   N1 35 (28.69) 7 (30.43)
   N2 22 (18.03) 5 (21.74)
M stage
   M0 104 (85.25) 23 (100)
   M1 18 (14.75)
Stage
   I 24 (19.67)
   II 41 (33.61) 11 (47.83)
   III 39 (31.97) 12 (52.17)
   IV 18 (14.75)

Data are presented as n (%). TCGA, The Cancer Genome Atlas.

Correlation of immune score and stromal score with survival time

The ESTIMATE calculation was applied to the gene expression matrix obtained from the TCGA database to determine the immune score and the stromal score. The ESTIMATE score is a composite expression of both of these scores. The immune score ranged from 295.24 to 4,239.70, while the stromal score varied from −1,954.95 to 3,495.27. Overall, the immune score was higher than the stromal score across the entire patient cohort. Figure 1A-1C display the Kaplan-Meier survival curves for the three scores divided into high and low groups based on the optimal cutoff values. All three high-score groups showed higher OS compared to the low-score group, suggesting that patients with high immune scores and high stromal scores had a better prognosis. In addition, we correlated the immune score and stromal score with clinical characteristics. The results indicated that only the N stage was significantly associated with the distribution of the immune score. Figure 1D shows that the immune scores of the N0 stage were generally higher than those of the N1 and N2 stages, and the P values were all less than 0.05.

Figure 1 Correlation of immune and stromal scores with overall survival and clinical characteristics. (A-C) The yellow color represents the high immune/stromal/ESTIMATE score group, while the blue color represents the low immune/stromal/ESTIMATE score group. The horizontal axis indicates the number of days of survival, and the vertical axis indicates the overall survival rate. (D) The solid points indicate the distribution of immune scores for each case in different N stages. ESTIMATE, Estimation of Stromal and Immune cells in MAlignant Tumor tissues using Expression data.

Identification of differential genes

Firstly, we analyzed the differential gene expression between the high and low immune score groups. A total of 1,260 DEGs were screened based on the criteria of P value <0.05 and |log2(fold change)| >1. This included 869 DEGs with up-regulated expression and 391 DEGs with down-regulated expression (Figure 2A). A total of 862 DEGs were identified between the two sets of stromal scores, comprising 786 DEGs with up-regulated expression and 76 DEGs with down-regulated expression (Figure 2B). Subsequently, 490 common DEGs were identified using Venn diagrams. Among these, 462 genes were up-regulated and 28 genes were down-regulated in both the high immune score and high stromal score groups (Figure 2C,2D).

Figure 2 Volcano and Venn plots of DEGs. (A) DEGs between high and low immune score groups. (B) DEGs between high and low stromal score groups. Red indicates elevated expression of this gene, blue indicates decreased expression of this gene, and gray shows no significant difference in gene expression. (C,D) The blue color represents the number of DEGs according to the immune score, the purple color represents the number of DEGs based on the stromal score, and the intersecting part represents the number of DEGs common to both. DEGs, differently expressed genes.

Functional enrichment analysis of DEGs

To further investigate the potential mechanism of the role of DEGs in the immune infiltration of the tumor microenvironment, we conducted GO functional annotation, KEGG pathway enrichment analysis, and Reactome pathway enrichment analysis. As depicted in the GO functional annotation of Figure 3A, we observed that DEGs were primarily associated with the formation of the extracellular matrix and certain immune complexes. These genes could enhance the activation of neutrophils in the immune response and also positively regulate the production of cytokines. The enrichment analysis results of the KEGG pathway in Figure 3B showed that DEGs were mainly involved in immune response pathways such as tuberculosis, Staphylococcus aureus infection, and rheumatoid arthritis, and were related to the interaction of cytokine-cytokine receptors. The results of the Reactome pathway enrichment analysis in Figure 3C showed that DEGs are involved in innate immunity, acquired immunity, cytokine signaling in the immune system, and the composition and degradation of the extracellular matrix.

Figure 3 Biological function analysis of DEGs. (A) GO functional annotation. The results of GO enrichment analysis of DEGs were categorized based on MF, BP, and CC. The top 10 GO term items with the smallest P value and the most significant enrichment in each GO category were selected. (B) KEGG pathway enrichment analysis. (C) Reactome pathway enrichment analysis. Vertical entries represent the names of the pathways, while the length of the horizontal plot indicates the gene ratios. The area of the circle represents the gene count. The depth of the color indicates the adjusted P value. BP, biological process; CC, cellular component; DEGs, differentially expressed genes; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Gene and Genomes; MF, molecular function; MHC, major histocompatibility complex.

Protein interaction network of differential genes

To better understand the interaction relationships among the identified DEGs, we utilized the STRING tool to generate a PPI network comprising 446 genes and 5,632 interaction links. After removing the protein-coding genes with interaction nodes ≤1, the interactions between 438 genes are depicted in Figure 4A. The nodes are color-coded in a gradual transition from red to yellow based on the high ranking of the number of nodes. We used MCODE in Cytoscape to visualize the PPI network. In the analysis results, individual gene modules were ranked according to the network score. The highest scoring gene modules represented the most critical and typical genes in the entire network. Hence, this section of genes was chosen for the subsequent analysis. To visually demonstrate the ordering of the node numbers for each gene in the module, we have arranged them from orange to blue (Figure 4B).

Figure 4 Protein-protein interaction of DEGs. (A) PPI network diagram of DEGs. According to the number of nodes, the color transitions gradually from red to yellow. (B) The most important molecular complexes in the PPI network. According to the number of nodes, the color transitions gradually from orange to blue. DEGs, differently expressed genes; PPI, protein-protein interaction.

Screening of core genes

We conducted univariate Cox regression analysis on 490 DEGs and identified 150 genes that were significantly associated with survival time in patients with rectal cancer. These 150 genes were then cross-analyzed with the aforementioned molecular complexes to identify 17 genes that overlapped with each other. To identify the genes most associated with survival time, LASSO Cox regression analysis was further performed on the 17 prognosis-related genes (Figure 5). The results indicated that the smallest partial likelihood deviation was achieved when only CYBB, CCR4, FOXP3, and CD80 were included in the model. Therefore, these four genes were defined as core genes. It was observed that the expression of the four core genes in the hyperimmune/stromal score was upregulated (Table S1).

Figure 5 LASSO analysis coefficient spectrum of 17 prognosis-related genes. (A) The upper abscissa is the number of non-zero coefficients in this model, and the ordinate is the coefficient value. (B) The numbers above the figure represent the number of parameters in the model. The horizontal axis shows the log(λ) values corresponding to the smallest target coefficient identified in the LASSO coefficient spectrum through 5-fold cross-validation. Two dashed lines are drawn at the smallest target coefficient and at one standard error from the smallest target coefficient. The result shows that the smallest target coefficient occurs when the number of included parameters is 4. LASSO, least absolute shrinkage and selection operator.

Analysis of core genes

To further investigate the relationship between core genes and the prognosis of rectal cancer, we plotted the survival curves using the Kaplan-Meier method. The results showed that among rectal cancer patients, those with high expression of CCR4, CD80, CYBB, and FOXP3 had a good prognosis (Figure 6A-6D). The expression of all four core genes was upregulated in the high immune/stromal score group, which was consistent with the better prognosis of patients with high immune/stromal score, further proving the reliability of our findings. We investigated the relationship between core genes and immune cell infiltration in rectal cancer tissues using the TIMER2.0 database. Figure 6E-6H show the correlation between the expression of CCR4, CD80, CYBB, and FOXP3 with tumor purity and six types of immune cells, respectively. The results showed that the high expression of all four core genes was negatively correlated with tumor purity and positively correlated with the distribution of B cells, CD4+ T cells, CD8+ T cells, myeloid dendritic cells, macrophages, and neutrophils in rectal cancer tissues. The P values were less than 0.05, except for the correlation between CCYB and CD80 and B cells, which were not statistically significant.

Figure 6 Analysis of the core genes. (A-D) Survival curves for core genes. Red lines represent high expression of the genes, and green lines represent low expression. (E-H) The correlation of core genes with immune cell infiltration. The correlation of CYBB, CCR4, FOXP3, and CD80 with the levels of cell infiltration of B cells, CD4+ T cells, CD8+ T cells, myeloid dendritic cells, macrophages, and neutrophils in rectal cancer was demonstrated. TPM, transcripts per million.

Analysis of risk scores

The four core genes were further subjected to multivariate Cox analysis to determine the corresponding regression coefficients (Table S2). Risk scores were calculated for each of the 122 rectal cancer patients based on the following equation: risk score = −0.29021*CYBB − 0.10752*CCR4 − 0.19254*FOXP3 + 0.08065*CD80. Subsequently, these patients were classified into low and high-risk groups. The survival analysis in Figure 7A indicates that the high-risk group was significantly associated with poor survival outcomes. The CIBERSORT analysis in Figure 7B showed that the high-risk score was associated with the infiltration distribution of multiple immune cells. All patients exhibited the highest distribution of resting type memory CD4+ T cells in tumor tissue. In contrast, low-risk patients showed greater macrophage infiltration in rectal cancer tissue, including M0, M1, and M2 subtypes of macrophages, compared to high-risk patients. Given the variances in prognosis and immune cell infiltration in tumor tissues between the high- and low-risk groups, we proceeded to extract DEGs between the low- and high-risk groups. These genes were then sorted based on gene expression ploidy from high to low for gene set enrichment analysis (GSEA) analysis to investigate the primary functional pathways associated with the genes in the low-risk group. As shown in Figure 7C,7D, genes up-regulated in the low-risk group were mainly enriched in immune-related activities, such as the activation of neutrophils in the immune response, involvement in the regulation of nucleotide-binding and oligomerization domain (NOD)-like receptor signaling pathways, and complement-receptor interactions. However, genes up-regulated in the high-risk group were rarely enriched in these pathways.

Figure 7 Analysis of risk scores. (A) Correlation of risk scores with survival time. (B) Correlation of risk score with immune cell infiltration. (C,D) GSEA of differentially expressed genes between low and high-risk score groups. (C) The GO functional annotations in the GSEA results, featuring the top 10 enrichment outcomes. (D) The KEGG-enriched pathways in the GSEA analyses with the top 10 enrichment results. ***, P<0.001; ****, P<0.0001. GO, Gene Ontology; GSEA, gene set enrichment analysis; KEGG, Kyoto Encyclopedia of Gene and Genomes; NK, natural killer; Treg, regulatory T cell.

Construction and validation of the prognostic model

To validate the independent prognostic value of the risk score, we performed univariate and multivariate Cox analyses by combining them with clinical characteristics (Table 2). The results of multivariate Cox analysis showed that only age and risk score were independent prognostic factors, and combined with clinical practice, we finally included age, N stage, M stage, and risk score in the prognostic model and visualized them with a nomogram (Figure 8A). The prognosis of patients with high-risk scores, age >65 years, and more lymph node metastases and distant metastases was poor as seen in the figure. We then calculated a C-index of 0.83 [95% confidence interval (CI): 0.75–0.91] for the prognostic model, and the calibration curves for the 3-year and 5-year prognosis depicted using R (Figure 8B,8C) showed a good agreement between the predicted and observed values, and the area under the curve for the 3-year and 5-year survival rates were 0.795 and 0.797, respectively (Figure 8D). All of these results indicated that this prognostic model could accurately predict the 3-year or 5-year survival rate of patients and could better distinguish patients with poor prognoses from those with good prognoses. Finally, to further validate the accuracy of the prognostic model, we used GSE103479 as the external validation data set, and the calibration curves and ROCs are shown in Figure 8E-8G, respectively. It can be seen that due to the small number of cases used as validation, the 95% confidence interval for each prediction point in the calibration curve was larger, but the overall prediction still showed good accuracy, with an area under the curve of 0.955 for 3-year survival and 0.666 for 5-year survival.

Table 2

Univariate and multivariate Cox analysis of risk scores combined with clinical characteristics

Variables Univariate analysis Multivariate analysis
HR (95% CI) P value HR (95% CI) P value
Sex
   Female Reference
   Male 0.86 (0.34–2.17) >0.05
Age (years)
   ≤65 Reference Reference
   >65 9.32 (2.14–40.66) 0.003 2.15 (1.91–38.37) 0.005
T stage
   T1 Reference
   T2 1 >0.05
   T3 1.800 (1.430–2.267) >0.05
   T4 1 >0.05
N stage
   N0 Reference Reference
   N1 3.20 (1.05–9.76) 0.04 0.50 (0.10–2.63) 0.42
   N2 2.62 (0.79–8.72) >0.05 1.29 (0.34–4.87) 0.71
M stage
   M0 Reference Reference
   M1 4.05 (1.47–11.12) 0.007 2.81 (0.84–9.36) 0.09
Stage
   I Reference
   II 1.02 (0.18–5.64) >0.05
   III 2.04 (0.40–10.39) >0.05
   IV 5.48 (1.08–27.74) 0.04
Risk score
   Low Reference Reference
   High 6.17 (2.43–15.70) <0.001 5.84 (1.72–19.81) 0.004

CI, confidence interval; HR, hazard ratio.

Figure 8 Construction and validation of prognostic models. (A) The nomogram is used to predict 3- and 5-year survival in patients with rectal cancer. (B-D) Calibration curve and ROC curve of the nomogram. (E-G) Calibration curve and ROC curve were validated using the GSE103479 dataset. Calibration curve: the X-axis represents the survival probability predicted by the nomogram, while the Y-axis represents the observed survival probability. The more the solid red line aligns with the gray line, the more accurate the prediction. AUC, area under the curve; OS, overall survival; ROC, receiver operating characteristic.

Discussion

In this study, we analyzed the gene expression data from TCGA using the ESTIMATE algorithm. We discovered a positive correlation between the immune score, stromal score, overall score, and survival time. This finding has also been validated in other studies (32,33). Subsequently, DEGs between the high and low immunity score groups (or high and low stroma score groups) were analyzed and screened for overlapping genes between the two. A total of 1,260 common DEGs were identified. Functional enrichment analysis revealed that our DEGs are associated with extracellular matrix composition and degradation, inflammatory responses in the tumor microenvironment, and the regulation of innate or acquired immune responses, specifically the activation of neutrophils. Some studies have shown that tumor-associated neutrophils have tumor-supportive functions, promoting tumor invasion, angiogenesis, and distant migration. However, other studies have found anti-tumorigenic roles for neutrophils in intestinal cancer (34,35). Therefore, the specific role of neutrophils in bowel cancer needs to be further explored.

Next, we combined the protein interaction module with Cox survival analysis and identified four core genes—CYBB, CCR4, FOXP3, and CD80—that are linked to the prognosis of rectal cancer patients. CYBB is involved in the formation of nicotinamide adenine dinucleotide phosphate hydrogen (NADPH) oxidase 2 (NOX2), which typically catalyzes the reduction of oxygen molecules to generate reactive oxygen species (ROS), thereby contributing to the defense against microbial pathogens (36). In a study of mouse models, Cybb−/− mice were found to have significantly reduced metastasis of melanoma compared to wild-type mice. Restoration of metastatic capacity was observed when macrophages from wild-type mice were transplanted into Cybb−/− mice. It was hypothesized that the metastatic behavior of tumors could be mainly attributed to ROS produced by NOX2 of macrophages (37). This suggests that CYBB deficiency may inhibit tumor cell invasiveness. CCR4 is a chemokine receptor that can influence the migration of effector regulatory T cells (Tregs) (38). Tumor cells can produce ligands for CCR4, such as CCL17 and CCL22, which cause Tregs to aggregate around tumor cells, thereby suppressing anti-tumor immunity (39). It has been shown that CCR4 is highly expressed in colon cancer cells and is influenced by TNF-α to enhance their migration. CCR4 antibodies or antagonists have been found to attenuate the migration of colon cancer cells (40). CCR4 is associated with a poor prognosis in gastric cancer, breast cancer, tongue cancer, and renal cell carcinoma (41-43). However, the results of the present study suggest that CCR4 may act as a protective factor for prognosis in rectal cancer. Further investigation is needed to determine if CCR4 has other mechanisms of action in intestinal cancer. FOXP3 expression is increased in colorectal tumor tissue compared to normal intestinal mucosa (44). Our study revealed that patients with rectal cancer and high FOXP3 expression had a more favorable prognosis. For example, Salama et al. found that a high density of FOXP3+ Tregs in CRC is associated with better survival rates (45), whereas Betts et al. discovered that high levels of FOXP3+ Tregs can lead to disease progression in CRC patients (46). Another study may offer an explanation for this. Saito et al. found that CRC can be categorized into type A (IL12AlowTGFB1low) and type B (IL12AhighTGFB1high) based on the expression levels of IL12A and TGFB1. In type A CRC, high expression of the FOXP3 gene is associated with a better prognosis compared to low expression of FOXP3. On the contrary, in type B CRC, high FOXP3 gene expression was associated with a poorer prognosis (47). CD80 maintains the balance between T-cell activation and suppression. Its expression is significantly increased in epithelial cells of colon precancerous lesions, playing a major role in immune surveillance (48,49). CD80 has been found to be expressed at low levels in a variety of tumor cell lines in mice, which enhances the resistance of cancer cells to the immune system (50). These core genes are directly or indirectly involved in the regulation of rectal cancer.

A risk score formula was established based on the four core genes, and all cases were divided into high- and low-risk groups for comparison. The results showed that the low-risk group had a relatively better prognosis, which may be related to the fact that more genes are involved in immune activities in the low-risk group (51,52). Finally, a prognostic prediction model was established by combining the risk score and clinical characteristics. The results of the model analysis reconfirmed that the risk score can be used as an independent prognostic factor to assess the survival time of rectal cancer patients. GSE103479 was used for external data validation, further demonstrating the model’s strong predictive capabilities.

However, our research still has its limitations. Firstly, the validation data solely originates from the GEO database, resulting in a relatively small sample size that may affect the broad applicability of our results. Secondly, to gain a more accurate understanding of the expression levels of these four genes in rectal cancer samples, we need to conduct more basic experimental research. Lastly, we should undertake further studies to clarify the precise mechanisms through which these four genes operate in rectal cancer.


Conclusions

Changes in the tumor microenvironment and infiltration of various immune cells impact the development and progression of rectal cancer. Our study analyzed immune and stromal cells in rectal cancer tissues and identified four prognostic genes associated with the tumor microenvironment. The combined derived risk scores were also found to be useful in predicting the survival time of rectal cancer patients. This finding may provide new biomarkers for predicting the prognosis of rectal cancer patients and developing personalized immunotherapy strategies.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-1511/rc

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-1511/prf

Funding: This study was supported by Science and Technology Bureau of Jiaxing City, Zhejiang Province, China (No. 2022AD10012 to R.C., and No. 2024AY10028 to H.Y.).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-1511/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
  2. Siegel RL, Wagle NS, Cercek A, et al. Colorectal cancer statistics, 2023. CA Cancer J Clin 2023;73:233-54. [Crossref] [PubMed]
  3. McQuade RM, Stojanovska V, Bornstein JC, et al. Colorectal Cancer Chemotherapy: The Evolution of Treatment and New Approaches. Curr Med Chem 2017;24:1537-57. [Crossref] [PubMed]
  4. Li Y, Wang J, Ma X, et al. A Review of Neoadjuvant Chemoradiotherapy for Locally Advanced Rectal Cancer. Int J Biol Sci 2016;12:1022-31. [Crossref] [PubMed]
  5. De Felice F, Miccini M, Botticelli A, et al. The multidisciplinary management of locally advanced rectal cancer. Expert Rev Anticancer Ther 2024;24:581-7. [Crossref] [PubMed]
  6. Deng Z, Luo Y, Chen X, et al. Pathological response following neoadjuvant immunotherapy and imaging characteristics in dMMR/MSI-H locally advanced colorectal cancer. Front Immunol 2024;15:1466497. [Crossref] [PubMed]
  7. Niu CG, Zhang J, Rao AV, et al. Comparative effectiveness of immunotherapy and chemotherapy in patients with metastatic colorectal cancer stratified by microsatellite instability status. World J Clin Oncol 2024;15:540-7. [Crossref] [PubMed]
  8. Ganesh K, Stadler ZK, Cercek A, et al. Immunotherapy in colorectal cancer: rationale, challenges and potential. Nat Rev Gastroenterol Hepatol 2019;16:361-75. [Crossref] [PubMed]
  9. Hinshaw DC, Shevde LA. The Tumor Microenvironment Innately Modulates Cancer Progression. Cancer Res 2019;79:4557-66. [Crossref] [PubMed]
  10. Hanahan D, Coussens LM. Accessories to the crime: functions of cells recruited to the tumor microenvironment. Cancer Cell 2012;21:309-22. [Crossref] [PubMed]
  11. Galon J, Costes A, Sanchez-Cabo F, et al. Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science 2006;313:1960-4. [Crossref] [PubMed]
  12. Hanus M, Parada-Venegas D, Landskron G, et al. Immune System, Microbiota, and Microbial Metabolites: The Unresolved Triad in Colorectal Cancer Microenvironment. Front Immunol 2021;12:612826. [Crossref] [PubMed]
  13. Schmitt M, Greten FR. The inflammatory pathogenesis of colorectal cancer. Nat Rev Immunol 2021;21:653-67. [Crossref] [PubMed]
  14. Wang Y, Zhong X, He X, et al. Liver metastasis from colorectal cancer: pathogenetic development, immune landscape of the tumour microenvironment and therapeutic approaches. J Exp Clin Cancer Res 2023;42:177. [Crossref] [PubMed]
  15. Galon J, Pagès F, Marincola FM, et al. The immune score as a new possible approach for the classification of cancer. J Transl Med 2012;10:1. [Crossref] [PubMed]
  16. Deng Z, Wang J, Xu B, et al. Mining TCGA Database for Tumor Microenvironment-Related Genes of Prognostic Value in Hepatocellular Carcinoma. Biomed Res Int 2019;2019:2408348. [Crossref] [PubMed]
  17. Yang S, Liu T, Nan H, et al. Comprehensive analysis of prognostic immune-related genes in the tumor microenvironment of cutaneous melanoma. J Cell Physiol 2020;235:1025-35. [Crossref] [PubMed]
  18. Alonso MH, Aussó S, Lopez-Doriga A, et al. Comprehensive analysis of copy number aberrations in microsatellite stable colon cancer in view of stromal component. Br J Cancer 2017;117:421-31. [Crossref] [PubMed]
  19. Bruni D, Angell HK, Galon J. The immune contexture and Immunoscore in cancer prognosis and therapeutic efficacy. Nat Rev Cancer 2020;20:662-80. [Crossref] [PubMed]
  20. Galon J, Pagès F, Marincola FM, et al. Cancer classification using the Immunoscore: a worldwide task force. J Transl Med 2012;10:205. [Crossref] [PubMed]
  21. Turley SJ, Cremasco V, Astarita JL. Immunological hallmarks of stromal cells in the tumour microenvironment. Nat Rev Immunol 2015;15:669-82. [Crossref] [PubMed]
  22. Calon A, Lonardo E, Berenguer-Llergo A, et al. Stromal gene expression defines poor-prognosis subtypes in colorectal cancer. Nat Genet 2015;47:320-9. [Crossref] [PubMed]
  23. Yoshihara K, Shahmoradgoli M, Martínez E, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun 2013;4:2612. [Crossref] [PubMed]
  24. Shi X, Shi D, Yin Y, et al. Cuproptosis-associated genes (CAGs) contribute to the prognosis prediction and potential therapeutic targets in hepatocellular carcinoma. Cell Signal 2024;117:111072. [Crossref] [PubMed]
  25. Yang E, Ding Q, Fan X, et al. Machine learning modeling and prognostic value analysis of invasion-related genes in cutaneous melanoma. Comput Biol Med 2023;162:107089. [Crossref] [PubMed]
  26. Wang Y, Lin MG, Meng L, et al. Identification of necroptosis-related genes for predicting prognosis and exploring immune infiltration landscape in colon adenocarcinoma. Front Oncol 2022;12:941156. [Crossref] [PubMed]
  27. Li B, Ruotti V, Stewart RM, et al. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 2010;26:493-500. [Crossref] [PubMed]
  28. Zhao S, Ye Z, Stanton R. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA 2020;26:903-9. [Crossref] [PubMed]
  29. Liu TT, Li R, Huo C, et al. Identification of CDK2-Related Immune Forecast Model and ceRNA in Lung Adenocarcinoma, a Pan-Cancer Analysis. Front Cell Dev Biol 2021;9:682002. [Crossref] [PubMed]
  30. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010;26:139-40. [Crossref] [PubMed]
  31. Yu G, Wang LG, Han Y, et al. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 2012;16:284-7. [Crossref] [PubMed]
  32. Wang Y, Wang Y, Xu C, et al. Identification of Novel Tumor-Microenvironment-Regulating Factor That Facilitates Tumor Immune Infiltration in Colon Cancer. Mol Ther Nucleic Acids 2020;22:236-50. [Crossref] [PubMed]
  33. Zhang X, Zhao H, Shi X, et al. Identification and validation of an immune-related gene signature predictive of overall survival in colon cancer. Aging (Albany NY) 2020;12:26095-120. [Crossref] [PubMed]
  34. Mizuno R, Kawada K, Itatani Y, et al. The Role of Tumor-Associated Neutrophils in Colorectal Cancer. Int J Mol Sci 2019;20:529. [Crossref] [PubMed]
  35. Granot Z, Jablonska J. Distinct Functions of Neutrophil in Cancer and Its Regulation. Mediators Inflamm 2015;2015:701067. [Crossref] [PubMed]
  36. Martner A, Aydin E, Hellstrand K. NOX2 in autoimmunity, tumor growth and metastasis. J Pathol 2019;247:151-4. [Crossref] [PubMed]
  37. Okada F, Kobayashi M, Tanaka H, et al. The role of nicotinamide adenine dinucleotide phosphate oxidase-derived reactive oxygen species in the acquisition of metastatic ability of tumor cells. Am J Pathol 2006;169:294-302. [Crossref] [PubMed]
  38. Wegrzyn AS, Kedzierska AE, Obojski A. Identification and classification of distinct surface markers of T regulatory cells. Front Immunol 2022;13:1055805. [Crossref] [PubMed]
  39. Feng G, Bajpai G, Ma P, et al. CCL17 Aggravates Myocardial Injury by Suppressing Recruitment of Regulatory T Cells. Circulation 2022;145:765-82. [Crossref] [PubMed]
  40. Ou B, Zhao J, Guan S, et al. CCR4 promotes metastasis via ERK/NF-κB/MMP13 pathway and acts downstream of TNF-α in colorectal cancer. Oncotarget 2016;7:47637-49. [Crossref] [PubMed]
  41. Lee JH, Cho YS, Lee JY, et al. The chemokine receptor CCR4 is expressed and associated with a poor prognosis in patients with gastric cancer. Ann Surg 2009;249:933-41. [Crossref] [PubMed]
  42. Li JY, Ou ZL, Yu SJ, et al. The chemokine receptor CCR4 promotes tumor growth and lung metastasis in breast cancer. Breast Cancer Res Treat 2012;131:837-48. [Crossref] [PubMed]
  43. Liu Q, Rexiati M, Yang Y, et al. Expression of chemokine receptor 4 was associated with poor survival in renal cell carcinoma. Med Oncol 2014;31:882. [Crossref] [PubMed]
  44. Xu J, Gao Y, Ding Y, et al. Correlation between Tregs and ICOS-induced M2 macrophages polarization in colorectal cancer progression. Front Oncol 2024;14:1373820. [Crossref] [PubMed]
  45. Salama P, Phillips M, Grieu F, et al. Tumor-infiltrating FOXP3+ T regulatory cells show strong prognostic significance in colorectal cancer. J Clin Oncol 2009;27:186-92. [Crossref] [PubMed]
  46. Betts G, Jones E, Junaid S, et al. Suppression of tumour-specific CD4+ T cells by regulatory T cells is associated with progression of human colorectal cancer. Gut 2012;61:1163-71. [Crossref] [PubMed]
  47. Saito T, Nishikawa H, Wada H, et al. Two FOXP3(+)CD4(+) T cell subpopulations distinctly control the prognosis of colorectal cancers. Nat Med 2016;22:679-84. [Crossref] [PubMed]
  48. Greenwald RJ, Freeman GJ, Sharpe AH. The B7 family revisited. Annu Rev Immunol 2005;23:515-48. [Crossref] [PubMed]
  49. Marchiori C, Scarpa M, Kotsafti A, et al. Epithelial CD80 promotes immune surveillance of colonic preneoplastic lesions and its expression is increased by oxidative stress through STAT3 in colon cancer cells. J Exp Clin Cancer Res 2019;38:190. [Crossref] [PubMed]
  50. Tirapu I, Huarte E, Guiducci C, et al. Low surface expression of B7-1 (CD80) is an immunoescape mechanism of colon carcinoma. Cancer Res 2006;66:2442-50. [Crossref] [PubMed]
  51. Talaat IM, Elemam NM, Saber-Ayad M. Complement System: An Immunotherapy Target in Colorectal Cancer. Front Immunol 2022;13:810993. [Crossref] [PubMed]
  52. Masucci MT, Minopoli M, Carriero MV. Tumor Associated Neutrophils. Their Role in Tumorigenesis, Metastasis, Prognosis and Therapy. Front Oncol 2019;9:1146. [Crossref] [PubMed]
Cite this article as: Cai R, Hong Z, Yin H, Chen H, Qin M, Huang Y. Constructing and validating a novel prognostic risk score model for rectal cancer based on four immune-related genes. Transl Cancer Res 2025;14(2):1053-1069. doi: 10.21037/tcr-24-1511

Download Citation