Integrating prognosis-related genes with immune-related gene signature for development and validation of a survival stratification model for early-stage colorectal cancer
Highlight box
Key findings
• Integrating prognosis-related genes with immune-related gene signature improves the diagnostic performance of prediction model for survival stratification of early-stage colorectal cancer (CRC).
What is known and what is new?
• Current prognostic factors are inadequate for identifying patients at risk of disease recurrence after treatment for early-stage CRC.
• An individualized gene index based on gene-pairs was built to estimate prognosis in patients with early-stage CRC and it is robust against the technical biases across different sequencing platforms.
What is the implication, and what should change now?
• Prospective studies are needed to evaluate the clinical utility of the signature in predicting prognosis and assisting in treatment decisions.
Introduction
The role of adjuvant chemotherapy in early-stage colorectal cancer (CRC) remains a subject of ongoing debate (1-5). Currently, clinical guidelines typically recommend adjuvant therapy based on specific high-risk factors, such as T4 primary tumors, poor tumor differentiation, bowel perforation, obstruction, lymphovascular and perineural invasion, and harvesting fewer than 12 lymph nodes (1,6,7). A study integrated multiple clinical features and established a nomogram, significantly improving the accuracy of predicting the prognosis of CRC (8), which also provided a strategy for the prognosis stratification of early-stage diseases. However, despite a degree of specificity, the accuracy of these clinical prognostic factors in identifying patients with potential poor prognosis is not satisfying enough, especially for the early-stage cases (9). A subset of patients without these high-risk clinical features still suffer from recurrence (10,11), underscoring the urgent need for a novel signature to stratify early-stage CRC patients by recurrence risk.
Gene expression signatures have become valuable tools for customizing therapies for CRC patients, taking into account the unique molecular characteristics of each tumor (12-14). In recent years, several gene sets have been discovered to have potential in predicting CRC prognosis, including cuproptosis-related genes (15), telomere-related genes (16), mitochondrial-related genes (17). Each of them has demonstrated high accuracy, but their utility in early-stage CRC remains unclear. Multi-omics detection strategy, integrating genomics, transcriptomics, proteomics, has also been proposed (18,19), but the complexity of the technology and the high cost limited its clinical promotion. Despite the development of several multi-gene prognostic signatures, including Oncotype DX, ColoPrint, and ColDx, which have demonstrated potential for predicting outcomes in early-stage CRC and have been already applied in clinical tests (20-22), their limited accuracy and issues such as insufficient validation or overfitting on small discovery datasets hamper their further application (11,23). Furthermore, the integration of datasets for broader analysis remains difficult due to biological variability and technical inconsistencies across sequencing platforms (24). To address these limitations, innovative approaches have been introduced, focusing on the relative ranking of gene expression levels. These methods aim to standardize data processing, providing more reliable signatures for clinical application (25-27).
The immune system has been indicated to affect all aspects of CRC from tumorigenesis to treatment (28,29). Prognosis in CRC patients is considered to be positively correlated with the signature enrichment of type I adaptive immune response and T cell signaling within tumor stroma (28). Several prognostic algorithms based on immune signature have emerged (30-33), according to which the existence of CD8+ T cells, CD27-CD45RA-effector memory T cells and a Th1 gene signature have been linked to the enhanced disease-free survival (DFS). However, despite these advancements, the full prognostic potential of immune-based molecular characteristics in CRC, particularly in early-stage patients, remains underexplored and requires further investigation to determine their broader clinical utility.
Here, we leveraged five independent datasets to exploit and validate a novel individualized prognostic signature for CRC, focusing on 10-gene pair index (GPI) incorporating immune-related gene signatures (IRGS). Additionally, we assessed the prognostic and predictive capabilities of the GPI, particularly in relation to the effectiveness of adjuvant chemotherapy in early-stage CRC patients. This approach aims to enhance personalized treatment strategies by offering a more accurate prediction of patient outcomes based on molecular and immune-related markers. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2024-2511/rc).
Methods
Patients
Gene expression profiles across five independent public cohorts were retrospectively analyzed, among which 309 CRC patients who did not receive adjuvant chemotherapy were identified from the Cluster Identification Tool (CIT) gene microarray dataset (GSE39582), serving as the discovery cohort. For model training and external validation-1, two larger datasets, CIT/GSE39582 and The Cancer Genome Atlas (TCGA) CRC, were selected, respectively. The remaining three cohorts (GSE17536, GSE33113, and GSE39084) were merged into a meta-validation cohort (validation-2). The GSE datasets were downloaded from Gene Expression Omnibus (GEO) in their sorted form using Bioconductor package ‘GEOquery’. RNA expression profile from TCGA CRC cohort was obtained from Broad GDAC Firehose (http://gdac.broadinstitute.org/), with log2-transformed transcripts per million (TPM) utilized for analysis. Batch effects were removed using ‘combat’ function in R package ‘sva’. Overall, 1,525 patients were included in our study. Data collection occurred from July 22 to October 22, 2018, with both paper charts and electronic medical records reviewed as needed. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Institutional Review Board of The Sixth Affiliated Hospital, Sun Yat-sen University (approval No. 2018ZSLYEC-082), and the clinical and pathological characteristics of patients in each dataset are outlined in Table 1.
Table 1
| Characteristics | GSE39582 (n=566) | TCGA (n=624) | Meta-validation (n=335) |
|---|---|---|---|
| Age (years), mean (SD) | 66.9 (13.3) | 66.3 (12.8) | 65.5 (14.6) |
| Sex | |||
| Male | 310 | 332 | 171 |
| Female | 256 | 292 | 164 |
| Tumor stage | |||
| 1 | 33 | 105 | 32 |
| 2 | 264 | 229 | 170 |
| 3 | 205 | 181 | 73 |
| 4 | 60 | 70 | 26 |
| NA | 4 | 39 | 34 |
| T stage | |||
| T1 | 12 | 21 | 1 |
| T2 | 45 | 105 | 11 |
| T3 | 367 | 425 | 119 |
| T4 | 119 | 70 | 26 |
| NA | 23 | 3 | 178 |
| Tumor location | |||
| Left | 342 | 354 | 36 |
| Right | 224 | 270 | 31 |
| NA | 0 | 0 | 268 |
| Adjuvant chemotherapy | |||
| With | 316 | 393 | 0 |
| Without | 233 | 231 | 0 |
| NA | 17 | 0 | 335 |
| MMR status | |||
| MSI | 75 | 189 | 44 |
| MSS | 444 | 431 | 114 |
| NA | 47 | 4 | 177 |
| CIMP status | |||
| Positive | 91 | 0 | 39 |
| Negative | 405 | 0 | 118 |
| NA | 70 | 624 | 178 |
| CIN status | |||
| Positive | 353 | 0 | 0 |
| Negative | 110 | 0 | 0 |
| NA | 103 | 624 | 335 |
| RFS event | |||
| Yes | 177 | 100 | 79 |
| No | 380 | 416 | 222 |
| NA | 9 | 108 | 34 |
| OS event | |||
| Yes | 191 | 67 | 99 |
| No | 371 | 557 | 146 |
| NA | 4 | 0 | 90 |
| DFS event | |||
| Yes | 248 | 146 | 133 |
| No | 314 | 386 | 201 |
| NA | 4 | 92 | 1 |
CIMP, CpG island methylator phenotype; CIN, chromosomal instability; DFS, disease-free survival; MMR, mismatch repair; MSI, microsatellite instable; MSS, microsatellite stable; NA, not available; OS, overall survival; RFS, recurrence-free survival; SD, standard deviation; T, tumor; TCGA, The Cancer Genome Atlas.
Development and validation of the individualized GPI-based prognostic signature
The general work flow is shown in Figure 1. DFS was chosen as the outcome variable of the study. A total number of 18,113 genes were measured across these datasets. We first selected 6,054 genes with expression levels higher than average and a median absolute deviation (MAD) over 0.5 among 309 patients from the discovery cohort. Gene candidates were further refined by applying Cox proportional hazards regression with 1,000 randomizations (using 80% of the patients in each iteration) to evaluate the association between individual genes and DFS in the discovery dataset. This approach ensured robust selection by minimizing overfitting and increasing the likelihood of identifying genes with consistent prognostic value across different patient subsets. Immune-related genes were obtained from the ImmPort database (https://immport.niaid.nih.gov), among which 463 genes overlapping with genes in the discovery dataset were selected. These two gene sets were then combined, resulting in 753 unique genes, which were used to constitute 283,128 gene pairs (COMBIN[753, 2]=283,128). Each gene pair was constructed by pairwise comparing the gene expression level in a specific sample. In this gene pair-based approach, a score of one was assigned if the expression of gene 1 was lower than gene 2 in a specific sample; otherwise, a score of zero was assigned. This method offers a key advantage as it relies solely on the gene expression profile of each tumor sample, making it applicable on an individualized basis without the need for normalization across datasets. However, gene pairs that consistently yielded constant values (either always zero or always one) in a specific platform or dataset were excluded. These constant values could arise from two factors: (I) platform-dependent biases in measurement, which might affect reproducibility across different platforms; and (II) biologically preferential transcription, which might fail to provide discriminative information for patient survival. After filtering out such gene pairs, 373 gene pairs remained for further analysis.
In this study, all stages of CRC were included, as the molecular determinants of prognosis are likely shared across different stages. Gene pairs with prognostic significance, defined by a family-wise error rate of less than 0.05, were considered as candidates for building the GPI. To reduce the risk of overfitting, we used a Cox proportional hazards regression model combined with the least absolute shrinkage and selection operator (LASSO). The penalty parameter for LASSO was estimated using 10-fold cross-validation in the training dataset, selecting the value at one standard error (SE) beyond the minimum partial likelihood deviance, which balances predictive accuracy with model simplicity. This method ensured that the final set of gene pairs used in the GPI had strong prognostic relevance without overfitting the data.
To classify patients into low- or high-risk groups, the optimal cutoff for the GPI was determined using a time-dependent receiver operating characteristic (ROC) curve analysis at 5 years in the training dataset, implemented with the ‘survival ROC’ package (version 1.0.3). The Kaplan-Meier method was used to estimate the ROC curve, and the cutoff value for GPI was defined as the point on the ROC curve that had the shortest distance to the ideal point (100% true-positive rate and 0% false-negative rate).
The prognostic value of the GPI was then evaluated in stage-specific (I/II) CRC patients, as well as across all stages, using univariate analyses in both the training and independent validation cohorts. To further refine the prognostic assessment, GPI was integrated with available clinical and pathological variables in multivariate analyses, ensuring a more comprehensive evaluation of its predictive capability.
Functional annotation and analysis
To better understand the biological relevance of the GPI, we performed gene set enrichment analysis (GSEA) on the immune-related genes that composed the GPI using the Bioconductor package ‘HTSanalyzeR’ (34). The reference gene list consisted of genes that were measured across all platforms to ensure consistency. We examined gene sets from the H (hallmark), C2 (curated), and C5 (gene ontology) categories of the Molecular Signatures Database (MSigDB) (35). Patients were categorized into different immune risk groups based on their GPI values. In the TCGA dataset, we also utilized RNA sequencing data and information regarding the infiltration of various immune cells, such as lymphocytes, monocytes, and neutrophils, along with the necrosis percentage of tumor samples. The Estimation of STromal and Immune cells in Malignant Tumor tissues using Expression data (ESTIMATE) (36) algorithm was applied to estimate the proportions of immune and stromal cells in tumor tissues. The pathological characteristics, including immune cell infiltration and stromal content, were compared between different immune risk groups using both one-tailed and two-tailed t-tests, to assess the statistical significance of differences in immune and stromal cell infiltration across these groups.
Statistical analysis
Statistical analyses were conducted using the Statistical Package for Social Sciences (SPSS, version 22.0.0, IBM Corporation) and R software (version 3.5.1). Descriptive statistics were calculated for all variables, with mean and standard deviation (SD) or median and interquartile range (IQR) reported for continuous variables, and frequencies for categorical variables. Univariate analysis of the association between the GPI and other clinicopathologic factors with DFS was performed using the log-rank test. For factors that were significantly associated with DFS in univariate analyses, multivariate analysis was conducted using the Cox proportional hazards regression model to adjust for potential confounders. The C-index, a measure of the predictive accuracy of the model, was calculated using the ‘survcomp’ package (version 1.22.0), and model performance was compared using the ‘compareC’ package (version 1.3.1). Time-dependent ROC curve analysis was performed using the ‘timeROC’ package (version 0.4) and area under the curve (AUC) was calculated to represent the accuracy of the prognostic model. A two-sided P value less than 0.05 was considered statistically significant.
Results
Construction and definition of the GPI
A total of 309 CRC patients who did not receive adjuvant chemotherapy were selected from the CIT gene microarray dataset (GSE39582) to serve as the discovery cohort. Cox proportional hazards regression, combined with 1,000 randomization tests, was used to identify 309 genes significantly associated with survival. These genes were then combined with 463 immune-related genes from the ImmPort database, resulting in a final set of 753 unique genes for further analysis. The 753 unique genes were further used to construct 283,128 gene pairs (COMBIN[753, 2]=283,128). We then got 373 gene pairs after removing 282,755 gene pairs due to their little deviation (MAD =0) in the discovery dataset. Using LASSO Cox regression, we finally selected 10 gene pairs including 20 unique genes to construct GPI, according to early-stage (I/II) CRC samples in the discovery cohort (Figure S1). The 75th percentile was selected as the cutoff to distinguish immune high- and low-risk group (Figure S2).
Validation of the GPI as a stage-independent prognostic factor of CRC patients
A total of 1,525 CRC patients were brought into analysis, among which 566 patients from GSE39582 dataset were enrolled as the training cohort. Besides, 624 patients from TCGA dataset (validation-1) and 335 patients from the meta-validation cohort (validation-2) were included for independent validation. No significant difference of the clinical and pathologic factors was found among these cohorts (Table 1).
Among early-stage patients (stage 1 and 2), more recurrent cases were observed in the GPI stratified high-risk group in each cohort (Figure 2A-2C). A strong prognostic value was observed according to time-dependent ROC curve analysis in the training cohort (AUC =0.753 at 2 years; AUC =0.770 at 3 years; AUC =0.775 at 5 years), validation-1 (AUC =0.652 at 2 years; AUC =0.683 at 3 years; AUC =0.622 at 5 years), and validation-2 (AUC =0.654 at 2 years; AUC =0.678 at 3 years; AUC =0.669 at 5 years; Figure 2D-2F), respectively. The GPI significantly hierarchized patients between risk groups regarding DFS. A higher GPI was associated with worse prognosis in early-stage CRC patients [training: hazard ratio (HR) =3.91, 95% confidence interval (CI): 2.36–6.5, P<0.001; validation-1: HR =3.01, 95% CI: 1.53–5.93, P<0.001; validation-2: HR =3.25, 95% CI: 1.65–6.38, P<0.001; Figure 2G-2I].
Furthermore, using all stage CRC data, the GPI continued to demonstrate encouraging prognostic value both in training (HR =1.89, 95% CI: 1.4–2.54, P<0.001) and validation cohorts (validation-1: HR =1.68, 95% CI: 1.13–2.5, P<0.001; validation-2: HR =2.15, 95% CI: 1.37–3.36, P<0.001; Figure S3A-S3F). AUC analysis also showed similar results as early-stage CRC, although the under curve area were slightly smaller than that in early-stage CRC (Figure S3G-S3I). Furthermore, in multivariate analyses, it remained an independent prognostic factor, after adjusting to clinical and pathological factors like tumor stage (training cohort: HR =35.58, 95% CI: 10.03–126.15, P<0.001; TCGA: HR =16.13, 95% CI: 2.37–109.81, P=0.004) (Table 2).
Table 2
| Characteristic | GSE39582 | TCGA CRC | Meta-validation | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Univariate | Multivariate | Univariate | Multivariate | Univariate | Multivariate | ||||||||||||
| HR (95% CI) | P value | HR (95% CI) | P value | HR (95% CI) | P value | HR (95% CI) | P value | HR (95% CI) | P value | HR (95% CI) | P value | ||||||
| GPI | 37.27 (12.46–111.44) | <0.001 | 35.58 (10.03–126.15) | <0.001 | 16.13 (2.37–109.81) | 0.004 | 16.13 (2.37–109.81) | 0.004 | 23.86 (3.71–153.29) | <0.001 | 114.29 (0.51–25,602.29) | 0.09 | |||||
| Sex | 1.53 (0.89–2.62) | 0.12 | 1.50 (0.78–2.90) | 0.23 | 0.58 (0.30–1.15) | 0.12 | |||||||||||
| Age | 1.01 (0.99–1.03) | 0.58 | 1.01 (0.98–1.04) | 0.46 | 0.99 (0.97–1.01) | 0.50 | |||||||||||
| Tumor location | 1.08 (0.64–1.84) | 0.78 | 1.15 (0.61–2.17) | 0.66 | 0.13 (0.02–1.08) | 0.03 | 0.27 (0.03–2.74) | 0.27 | |||||||||
| TNM stage | 7.89 (1.11–55.91) | 0.01 | 1.52 (0.17–13.36) | 0.70 | 1.79 (0.74–4.32) | 0.19 | 3.99 (0.96–16.64) | 0.04 | 0.35 (0.01–11.96) | 0.56 | |||||||
| T stage | 2.50 (1.50–4.17) | <0.001 | 2.03 (1.12–3.67) | 0.02 | 1.64 (0.85–3.17) | 0.14 | 2.68 (1.15–6.27) | 0.03 | 0.35 (0.01–11.96) | 0.26 | |||||||
| MMR status | 1.63 (0.70–3.82) | 0.25 | 0.61 (0.32–1.19) | 0.14 | 0.88 (0.38–2.04) | 0.76 | |||||||||||
| CIMP status | 0.95 (0.44–2.02) | 0.89 | 0.91 (0.36–2.28) | 0.84 | |||||||||||||
| CIN status | 1.69 (0.75–3.81) | 0.20 | |||||||||||||||
| TP53 mutation | 1.39 (0.78–2.48) | 0.27 | 2.74 (0.60–12.43) | 0.17 | |||||||||||||
| KRAS mutation | 1.44 (0.86–2.40) | 0.16 | 1.02 (0.23–4.60) | 0.98 | 1.23 (0.53–2.87) | 0.63 | |||||||||||
| BRAF mutation | 1.42 (0.57–3.58) | 0.45 | 0.00 (0.00–Inf) | 0.72 | 2.04 (0.81–5.10) | 0.12 | |||||||||||
CI, confidence interval; CIMP, CpG island methylator phenotype; CIN, chromosomal instability; CRC, colorectal cancer; GPI, gene pair index; HR, hazard ratio; IRGS, immune-related gene signatures; MMR, mismatch repair; TCGA, The Cancer Genome Atlas; TNM, tumor-node-metastasis.
Prognostic and predictive value of GPI for adjuvant chemotherapy among CRC patients
In order to evaluate the role of GPI in forecasting patient’s chemotherapy benefit, patients with adjuvant chemotherapy information were selected for further study. Among those without adjuvant chemotherapy, GPI stratified high risk group had distinctly worse DFS than low risk group in early stage cases (training cohort: HR =7.82, 95% CI: 4.13–14.8, P<0.001; and validation-1: HR =3.07, 95% CI: 1.33–7.07, P=0.006) (Figure 3A,3B). Similar prognostic value was found in all stage without adjuvant chemotherapy (training cohort: HR =4.43, 95% CI: 2.76–7.11, P<0.001; validation-1: HR =2.36, 95% CI: 1.26–4.44, P=0.006) (Figure 3C,3D). However, among those who had received adjuvant chemotherapy, no difference was observed between the risk groups in terms of DFS, except those with early-stage CRC in the TCGA cohort (Figure S4). Collectively, GPI appears to be more efficient to estimate DFS in CRC patients without adjuvant chemotherapy, rather than those with chemotherapy.
When restricted to patients in GPI stratified low-risk group, those treated with adjuvant chemotherapy had even worse DFS (training cohort: HR =4.74, 95% CI: 2.28–9.82, P<0.001; validation-1: HR =2.85, 95% CI: 0.95–8.59, P=0.051), both for early stage (Figure 4A,4B) and all stage CRC patients (Figure 4C,4D). However, this difference was not found for high-risk patients (Figure S5).
Functional annotation of the GPI
We further evaluated the biological significance of GPI. Stromal and immune signaling contributions were calculated by ESTIMATE algorithm. Higher immune and stromal infiltration was found in GPI stratified high-risk group (stromal score: P=0.01; immune score: P=0.001; and ESTIMATE score: P=0.003) (Figure 5A). Enrichment analysis of the 20 genes included in the GPI model identified several overrepresented biological processes within gene ontology, such as P53 pathway, KRAS signaling, IL6/JAK/STAT3 signaling, TNFα/NF-KB signaling, epithelial-mesenchymal transit (EMT), apoptosis, and angiogenesis (Figure 5B).
Comparison with other gene expression signature (Oncotype DX recurrence score)
To further determine the prognostic value, we compared GPI with commercialized 12-gene Oncotype DX recurrence score. Higher C-indexes were found for GPI than Oncotype DX in training cohort (C-index =0.81, 95% CI: 0.73–0.89 vs. C-index =0.65, 95% CI: 0.52–0.77, P=0.03) and TCGA validation cohort (C-index =0.72, 95% CI: 0.56–0.88 vs. C-index =0.61, 95% CI: 0.44–0.78, P=0.04) (Figure 6).
Discussion
In this study, we trained and validated the GPI model incorporating IRGS to predict DFS among CRC patients including those with early-stage. The GPI divided patients into two groups by risk stratification. Among early-stage CRC patients divided in the low-risk group, we observed a worse DFS in those treated with adjuvant chemotherapy than those without. As far as we know, this is the first study to leverage an individualized gene pair-based signature incorporating immune-related gene signature independent of the clinical parameter for prognostic prediction of CRC.
Early-stage CRC patients still face substantial risk of recurrence, even after complete surgical resection (37-39). The controversy on adjuvant chemotherapy in early-stage CRC is caused by the lack of convincing data and inconsistent results from previous studies. This highlights the urgent need to establish a novel signature to predict chemotherapy benefit among early-stage CRC by risk stratification. Several multi-gene prognostic signatures reported previously including Oncotype DX, ColoPrint, and ColDx, were limited by issues such as data normalizations (20-22). The necessity of data normalization to address batch effects caused by experimental issues has resulted in potentially spurious risk classification produced by these signatures simply based on gene expression levels (40). To tackle this issue, we incorporated gene expression profiles from different datasets and utilized methods based on the relative ranking of gene expression, which were specifically designed to perform robustly across different platforms, considering the inherent technical biases (26,41,42).
The rank-based gene pair characterization that we developed in this study involves only pairwise comparison in gene expression profile of particular samples, making it robust against data normalization and batch effects consequently. Thus, our gene pair-based characterization can be easily applied to survival prediction in individual samples and deserves further validation in clinical trials. As expected, the GPI derived in our study achieved a higher accuracy compared to the Oncotype DX Score.
The GPI effectively separated patients into high- and low-risk groups for DFS among CRC patients, including those with early-stage disease. This is valuable for both CRC patients and colorectal surgeons, as it offers a useful tool for estimating the risk of recurrence after surgery. Additionally, we used GPI to evaluate the prognostic outcome stratified by the presence of chemotherapy and found that those treated with adjuvant chemotherapy had worse DFS among low-risk group in early-stage CRC. Our findings provide additional evidence on the value of GPI in identifying early-stage CRC patients who may not benefit from adjuvant chemotherapy and thus should avoid receiving chemotherapy.
The significance of the CRC microenvironment, containing stromal and immune cells, in survival prediction has been repeatedly demonstrated (43-47). For instance, an increased gene signature induced by TGFβ in tumor stromal cells has been indicated to define poor-prognosis subtypes in CRC (44). Besides, decreased densities of CD8(+) cytotoxic T-lymphocyte infiltrate have been linked with the growth of the primary CRC and metastatic spread (46). In the current study, we also observed a remarkably different stromal and immune score between GPI stratified high- and low-risk groups, further indicating the capability of GPI in risk stratification and prognostic prediction among CRC patients. These findings also suggest that dysregulated stromal and immune signature might account for the survival differences observed between GPI-defined patient groups. Additionally, enrichment analysis of our gene pair signature identified several valuable biological processes such as KRAS signaling, P53 pathway, IL6/JAK/STAT3 signaling, and EMT, which have been already demonstrated to be correlated with CRC survival (48-51). Considering the complex cell interactions in the tumor microenvironment among CRC patients, a comprehensive understanding of tumor-associated normal stroma and immune cells and the cross-talk among different signaling pathways in tumor tissues can provide valuable information for understanding CRC progression and help develop reliable prognostic features.
The retrospective design limits this study, though rigorous validation was performed across multiple independent datasets Secondly, CRC is known for its intra-tumor genetic heterogeneity (52,53), which would introduce sampling bias when constructing gene pair signature. Moreover, not all batch effects can be addressed (24) although we made efforts to reduce them by excluding gene pairs with constant values.
Conclusions
The proposed GPI incorporating prognosis-related genes and IRGS is a promising prognostic signature among CRC, particularly for early-stage cases. It has excellent practicality and clinical application value for its robustness against the technical biases across different sequencing platforms. For clinical implementation, future studies utilizing multicenter datasets for prospective validation would be necessary to determine whether GPI could improve risk assessment for CRC patients and guide adjuvant chemotherapy decisions for those with early-stage disease.
Acknowledgments
This work was presented at Digestive Disease Week (DDW) in May 2021, and the abstract was published in the conference proceedings.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2024-2511/rc
Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2024-2511/prf
Funding: This study was funded by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2024-2511/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Institutional Review Board of The Sixth Affiliated Hospital, Sun Yat-sen University (approval No. 2018ZSLYEC-082).
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Tournigand C, de Gramont A. Chemotherapy: Is adjuvant chemotherapy an option for stage II colon cancer? Nat Rev Clin Oncol 2011;8:574-6. [Crossref] [PubMed]
- Meropol NJ. Ongoing challenge of stage II colon cancer. J Clin Oncol 2011;29:3346-8. [Crossref] [PubMed]
- Simillis C, Singh HKSI, Afxentiou T, et al. Postoperative chemotherapy improves survival in patients with resected high-risk Stage II colorectal cancer: results of a systematic review and meta-analysis. Colorectal Dis 2020;22:1231-44. [Crossref] [PubMed]
- Dienstmann R, Salazar R, Tabernero J. Personalizing colon cancer adjuvant therapy: selecting optimal treatments for individual patients. J Clin Oncol 2015;33:1787-96. [Crossref] [PubMed]
- Lee JJ, Chu E. Adjuvant Chemotherapy for Stage II Colon Cancer: The Debate Goes On. J Oncol Pract 2017;13:245-6. [Crossref] [PubMed]
- Brenner H, Kloor M, Pox CP. Colorectal cancer. Lancet 2014;383:1490-502. [Crossref] [PubMed]
- Costas-Chavarri A, Nandakumar G, Temin S, et al. Treatment of Patients With Early-Stage Colorectal Cancer: ASCO Resource-Stratified Guideline. J Glob Oncol 2019;5:1-19. [Crossref] [PubMed]
- An Y, Gong J, Xiao A. Development and validation of nomograms for predicting the prognosis of colorectal cancer patients. Transl Cancer Res 2025;14:1651-63. [Crossref] [PubMed]
- Van Cutsem E, Oliveira J. Primary colon cancer: ESMO clinical recommendations for diagnosis, adjuvant treatment and follow-up. Ann Oncol 2009;20:49-50. [Crossref] [PubMed]
- Dotan E, Cohen SJ. Challenges in the management of stage II colon cancer. Semin Oncol 2011;38:511-20. [Crossref] [PubMed]
- Sveen A, Nesbakken A, Ågesen TH, et al. Anticipating the clinical use of prognostic gene expression-based tests for colon cancer stage II and III: is Godot finally arriving? Clin Cancer Res 2013;19:6669-77. [Crossref] [PubMed]
- Agesen TH, Sveen A, Merok MA, et al. ColoGuideEx: a robust gene classifier specific for stage II colorectal cancer prognosis. Gut 2012;61:1560-7. [Crossref] [PubMed]
- Marisa L, de Reyniès A, Duval A, et al. Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med 2013;10:e1001453. [Crossref] [PubMed]
- Srivastava G, Renfro LA, Behrens RJ, et al. Prospective multicenter study of the impact of oncotype DX colon cancer assay results on treatment recommendations in stage II colon cancer patients. Oncologist 2014;19:492-7. [Crossref] [PubMed]
- Chen W, Hu K, Liu Y, et al. Comprehensive analysis of cuproptosis-related genes involved in prognosis and tumor microenvironment infiltration of colorectal cancer. Transl Cancer Res 2024;13:4555-73. [Crossref] [PubMed]
- Chen H, Pan Y, Lv C, et al. Telomere-related gene risk model for prognosis prediction in colorectal cancer. Transl Cancer Res 2024;13:3495-521. [Crossref] [PubMed]
- Wang S, Li Y, Wang Z, et al. Constructing a mitochondrial-related genes model based on machine learning for predicting the prognosis and therapeutic effect in colorectal cancer. Discov Oncol 2025;16:661. [Crossref] [PubMed]
- Li B, Xiao M, Zeng R, et al. Developing a multiomics data-based mathematical model to predict colorectal cancer recurrence and metastasis. BMC Med Inform Decis Mak 2025;25:188. [Crossref] [PubMed]
- Sun J, Liu Y, Zhao J, et al. Plasma proteomic and polygenic profiling improve risk stratification and personalized screening for colorectal cancer. Nat Commun 2024;15:8873. [Crossref] [PubMed]
- O'Connell MJ, Lavery I, Yothers G, et al. Relationship between tumor gene expression and recurrence in four independent studies of patients with stage II/III colon cancer treated with surgery alone or surgery plus adjuvant fluorouracil plus leucovorin. J Clin Oncol 2010;28:3937-44. [Crossref] [PubMed]
- Salazar R, Roepman P, Capella G, et al. Gene expression signature to improve prognosis prediction of stage II and III colorectal cancer. J Clin Oncol 2011;29:17-24. [Crossref] [PubMed]
- Kennedy RD, Bylesjo M, Kerr P, et al. Development and independent validation of a prognostic assay for stage II colon cancer using formalin-fixed paraffin-embedded tissue. J Clin Oncol 2011;29:4620-6. [Crossref] [PubMed]
- Sharif S, O'Connell MJ. Gene Signatures in Stage II Colon Cancer: A Clinical Review. Curr Colorectal Cancer Rep 2012;8:225-31. [Crossref] [PubMed]
- Leek JT, Scharpf RB, Bravo HC, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 2010;11:733-9. [Crossref] [PubMed]
- Heinäniemi M, Nykter M, Kramer R, et al. Gene-pair expression signatures reveal lineage control. Nat Methods 2013;10:577-83. [Crossref] [PubMed]
- Li B, Cui Y, Diehn M, et al. Development and Validation of an Individualized Immune Prognostic Signature in Early-Stage Nonsquamous Non-Small Cell Lung Cancer. JAMA Oncol 2017;3:1529-37. [Crossref] [PubMed]
- Salunkhe S, Chandran N, Chandrani P, et al. CytoPred: 7-gene pair metric for AML cytogenetic risk prediction. Brief Bioinform 2020;21:348-54. [PubMed]
- Disis ML. Immune regulation of cancer. J Clin Oncol 2010;28:4531-8. [Crossref] [PubMed]
- Chen DS, Mellman I. Elements of cancer immunity and the cancer-immune set point. Nature 2017;541:321-30. [Crossref] [PubMed]
- Camus M, Tosolini M, Mlecnik B, et al. Coordination of intratumoral immune reaction and human colorectal cancer recurrence. Cancer Res 2009;69:2685-93. [Crossref] [PubMed]
- Pagès F, Galon J, Dieu-Nosjean MC, et al. Immune infiltration in human tumors: a prognostic factor that should not be ignored. Oncogene 2010;29:1093-102. [Crossref] [PubMed]
- Wu X, Li J, Zhang Y, et al. Identification of immune cell infiltration landscape for predicting prognosis of colorectal cancer. Gastroenterol Rep (Oxf) 2023;11:goad014. [Crossref] [PubMed]
- Wankhede D, Yuan T, Kloor M, et al. Clinical significance of combined tumour-infiltrating lymphocytes and microsatellite instability status in colorectal cancer: a systematic review and network meta-analysis. Lancet Gastroenterol Hepatol 2024;9:609-19. [Crossref] [PubMed]
- Wang X, Terfve C, Rose JC, et al. HTSanalyzeR: an R/Bioconductor package for integrated network analysis of high-throughput screens. Bioinformatics 2011;27:879-80. [Crossref] [PubMed]
- Liberzon A, Birger C, Thorvaldsdóttir H, et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 2015;1:417-25. [Crossref] [PubMed]
- Yoshihara K, Shahmoradgoli M, Martínez E, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun 2013;4:2612. [Crossref] [PubMed]
- Danielsen HE, Hveem TS, Domingo E, et al. Prognostic markers for colorectal cancer: estimating ploidy and stroma. Ann Oncol 2018;29:616-23. [Crossref] [PubMed]
- Young PE, Womeldorph CM, Johnson EK, et al. Early detection of colorectal cancer recurrence in patients undergoing surgery with curative intent: current status and challenges. J Cancer 2014;5:262-71. [Crossref] [PubMed]
- Lee MM, MacKinlay A, Semira C, et al. Stage-based Variation in the Effect of Primary Tumor Side on All Stages of Colorectal Cancer Recurrence and Survival. Clin Colorectal Cancer 2018;17:e569-77. [Crossref] [PubMed]
- Qi L, Chen L, Li Y, et al. Critical limitations of prognostic signatures based on risk scores summarized from gene expression levels: a case study for resected stage I non-small-cell lung cancer. Brief Bioinform 2016;17:233-42. [Crossref] [PubMed]
- Eddy JA, Sung J, Geman D, et al. Relative expression analysis for molecular cancer diagnosis and prognosis. Technol Cancer Res Treat 2010;9:149-59. [Crossref] [PubMed]
- Tan AC, Naiman DQ, Xu L, et al. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 2005;21:3896-904. [Crossref] [PubMed]
- Calon A, Espinet E, Palomo-Ponce S, et al. Dependency of colorectal cancer on a TGF-β-driven program in stromal cells for metastasis initiation. Cancer Cell 2012;22:571-84. [Crossref] [PubMed]
- Isella C, Terrasi A, Bellomo SE, et al. Stromal contribution to the colorectal cancer transcriptome. Nat Genet 2015;47:312-9. Erratum in: Nat Genet 2016;48:1296. [Crossref] [PubMed]
- Calon A, Lonardo E, Berenguer-Llergo A, et al. Stromal gene expression defines poor-prognosis subtypes in colorectal cancer. Nat Genet 2015;47:320-9. [Crossref] [PubMed]
- Galon J, Pagès F, Marincola FM, et al. The immune score as a new possible approach for the classification of cancer. J Transl Med 2012;10:1. [Crossref] [PubMed]
- Mlecnik B, Tosolini M, Kirilovsky A, et al. Histopathologic-based prognostic factors of colorectal cancers are associated with the state of the local immune reaction. J Clin Oncol 2011;29:610-8. [Crossref] [PubMed]
- Wong CC, Qian Y, Li X, et al. SLC25A22 Promotes Proliferation and Survival of Colorectal Cancer Cells With KRAS Mutations and Xenograft Tumor Progression in Mice via Intracellular Synthesis of Aspartate. Gastroenterology 2016;151:945-960.e6. [Crossref] [PubMed]
- Veschi V, Liu Z, Voss TC, et al. Epigenetic siRNA and Chemical Screens Identify SETD8 Inhibition as a Therapeutic Strategy for p53 Activation in High-Risk Neuroblastoma. Cancer Cell 2017;31:50-63. [Crossref] [PubMed]
- Schulz-Heddergott R, Stark N, Edmunds SJ, et al. Therapeutic Ablation of Gain-of-Function Mutant p53 in Colorectal Cancer Inhibits Stat3-Mediated Tumor Growth and Invasion. Cancer Cell 2018;34:298-314.e7. [Crossref] [PubMed]
- Spaderna S, Schmalhofer O, Hlubek F, et al. A transient, EMT-linked loss of basement membranes indicates metastasis and poor survival in colorectal cancer. Gastroenterology 2006;131:830-40. [Crossref] [PubMed]
- Árnadóttir SS, Jeppesen M, Lamy P, et al. Characterization of genetic intratumor heterogeneity in colorectal cancer and matching patient-derived spheroid cultures. Mol Oncol 2018;12:132-47. [Crossref] [PubMed]
- Sobral D, Martins M, Kaplan S, et al. Genetic and microenvironmental intra-tumor heterogeneity impacts colorectal cancer evolution and metastatic development. Commun Biol 2022;5:937. [Crossref] [PubMed]

