Integrative single-cell and bulk transcriptomes analyses reveal T cell-related prognostic risk model and tumor immune microenvironment modulation in colorectal cancer
Original Article

Integrative single-cell and bulk transcriptomes analyses reveal T cell-related prognostic risk model and tumor immune microenvironment modulation in colorectal cancer

Ping Liu1#, Yizhou Wang1#, Shu Huang2,3, Rui Luo1, Ruiheng Xie1, Jieyv Peng1, Ruiyu Wang1, Ping Wang1, Xiaomin Shi1, Wei Zhang1, Lei Shi1, Xian Zhou1, Xiaowei Tang1

1Department of Gastroenterology, the Affiliated Hospital of Southwest Medical University, Luzhou, China; 2Department of Gastroenterology, Lianshui County People’ Hospital, Huai’an, China; 3Department of Gastroenterology, Lianshui People’ Hospital of Kangda College Affiliated to Nanjing Medical University, Huai’an, China

Contributions: (I) Conception and design: P Liu, Y Wang, X Zhou, X Tang; (II) Administrative support: X Zhou, X Tang; (III) Provision of study materials or patients: None; (IV) Collection and assembly of data: P Liu, S Huang, R Xie, R Luo, J Peng, X Zhou; (V) Data analysis and interpretation: P Liu, R Wang, P Wang, X Shi, W Zhang, L Shi; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Xiaowei Tang, MD, PhD; Xian Zhou, MD. Department of Gastroenterology, the Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou 646099, China. Email: solitude5834@hotmail.com; 853023378@qq.com.

Background: Colorectal cancer (CRC) is among the most common and lethal cancers worldwide. Recent advances in tumor immunotherapy have highlighted the importance of T-cell subsets and the tumor microenvironment (TME) in CRC, both critical for mounting a successful anti-tumor immune response. Thus, there is an urgent need for comprehensive research in these areas to accelerate the development of personalized immunotherapeutic strategies for CRC patients. Therefore, this study aims to explore T-cell heterogeneity, identify characteristic genes, and develop a reliable prognostic model to predict patient outcomes and immunotherapy responses.

Methods: CRC single-cell RNA sequencing (scRNA-seq) data were downloaded from the Gene Expression Omnibus (GEO) database. We identified T cell feature genes and employed Cox and least absolute shrinkage and selection operator (LASSO) regression methods to construct a prognostic model using The Cancer Genome Atlas (TCGA) dataset, while GEO dataset was utilized for validation. Additionally, we analyzed the immune microenvironment, immune checkpoints, and drug sensitivity between the high-risk and low-risk groups.

Results: Based on scRNA-seq data, we identified 4,440 T cell marker genes. There were also interactions between T cells and various other cell types. The results of the enrichment analysis revealed that these marker genes were primarily involved in immune-related pathways. After conducting univariate Cox and LASSO analyses, we developed a prognostic model based on the expression levels of three prognostic genes: TIMP1, HMMR, and HIST1H1C. The feasibility of the model was validated using external validation cohort. In addition, the high-risk group exhibited higher immune scores, stromal scores, ESTIMATE scores, and Tumor Immune Dysfunction and Exclusion (TIDE) prediction scores compared to the low-risk group, indicating a poorer response to immunotherapy.

Conclusions: This study discovered three key genes with prognostic relevance in CRC, providing valuable insights into T-cell heterogeneity and the associated prognostic risk model. The results demonstrated that this model accurately predicts patient prognosis and responses to immunotherapy.

Keywords: Colorectal cancer (CRC); single-cell RNA sequencing (scRNA-seq); T-cell subsets; prognosis; tumor microenvironment (TME)


Submitted May 24, 2025. Accepted for publication Sep 04, 2025. Published online Oct 29, 2025.

doi: 10.21037/tcr-2025-1104


Highlight box

Key findings

• Single-cell RNA sequencing (scRNA-seq) analysis of colorectal cancer (CRC) identified eight major cell types, with T cells further subdivided into distinct subtypes exhibiting specific interactions, particularly via the MIF signaling pathway. A robust three-gene prognostic signature (TIMP1, HMMR, HIST1H1C) was constructed and validated, effectively stratifying CRC patients into high-risk and low-risk groups. High-risk patients displayed elevated immune/stromal/ESTIMATE scores, increased immunosuppressive cell infiltration, upregulated immune checkpoints, and higher Tumor Immune Dysfunction and Exclusion scores, indicating reduced immunotherapy efficacy.

What is known and what is new?

• T cell heterogeneity and the tumor microenvironment (TME) are known to influence CRC progression and immunotherapy response. scRNA-seq is increasingly used to dissect TME complexity.

• This study integrates scRNA-seq and bulk transcriptome data to decode T cell-specific heterogeneity, construct a novel T cell-derived gene signature, and comprehensively characterize the immune landscape and drug sensitivity patterns associated with the risk model.

What is the implication, and what should change now?

• The three-gene signature may serve as a clinical tool for prognostic stratification and immunotherapy guidance in CRC. Validation in prospective cohorts and functional studies are urgently needed to confirm mechanistic roles of TIMP1, HMMR, and HIST1H1C in T cell regulation and immunotherapy resistance. Clinical translation should incorporate immune microenvironment profiling to personalize therapeutic strategies.


Introduction

Colorectal cancer (CRC) is one of the most common malignancies worldwide, ranking third in terms of incidence and second in mortality according to global statistics (1). Its onset results from genetic, environmental, and lifestyle factors, including high-risk diets, obesity, physical inactivity, smoking, and excessive alcohol consumption (2,3). Moreover, individuals with a family history of CRC or inherited syndromes such as Lynch syndrome and familial adenomatous polyposis are at significantly higher risk (4,5). The prognosis for CRC patients is particularly poor when the disease is diagnosed at an advanced stage, such as with synchronous peritoneal metastases (SPM), which significantly complicates treatment and reduces survival rates (6). CRC treatment strategies are highly stage-dependent. Early stages often involve surgical resection and adjuvant chemotherapy, while advanced CRC uses systemic treatments like chemotherapy, targeted therapies, and immunotherapies (7,8). Andre et al. investigated immune checkpoint inhibitors such as pembrolizumab for microsatellite instability-high (MSI-H), demonstrating promising efficacy (9). Although CRC mortality is declining due to better screening and treatment, future trends may be less favorable (10). Thus, comprehensive investigation into CRC is crucial for guiding clinical treatment.

The tumor microenvironment (TME) consists of cellular components, such as fibroblasts, endothelial cells, and immune cells, and non-cellular components, including matrix proteins, cytokines, growth factors, and other elements (11). Among immune cells, T cells are vital in treating various cancers, including CRC. Chimeric antigen receptor (CAR) T-cell therapy, a novel immunotherapy, reprograms patient-derived T cells to produce potent antitumor activity (12,13). Furthermore, achieving permanent CAR-T-cell binding can be accomplished by reshaping the tumor immune microenvironment (TIME) (14). Recent advancements in cancer genomics have made high-throughput sequencing essential for identifying gene alterations as therapeutic targets for CRC. Single-cell sequencing offers detailed insights into cell interactions within the TME, aiding the development of precise and personalized tumor treatments (15). Nevertheless, the treatment of patients with CRC remains a complex challenge.

Specific T cell subsets play critical roles in CRC prognosis: high densities of CD8+ and Th1 cells are linked to better survival, while Tregs and exhausted T cells often indicate poor outcomes. The prognostic value of Th17 cells is context-dependent. However, current models lack single-cell resolution and multi-omics integration, limiting personalized prediction. In this study, we identified T cell feature genes using single-cell RNA sequencing (scRNA-seq), constructed and validated a prognostic model with The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO) data, identified prognosis-related genes, and characterized the CRC TIME, aiming to inform improved clinical management. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1104/rc).


Methods

Data download

The CRC scRNA-seq dataset files of GSE188711 were downloaded from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/), containing three left-sided, three right-sided CRC samples and 27,927 cells. In addition, bulk transcriptome and clinical information of CRC patients were downloaded from TCGA dataset through the UCSC Xena database (http://xena.ucsc.edu/), comprising 51 normal tissue samples and 383 CRC samples. 375 CRC patients were finally selected to obtain a training set for this study, excluding cases of incomplete survival information. Furthermore, the GSE87211 datasets as validation cohort was acquired from GEO database, which included 363 CRC samples. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Single-cell RNA-sequencing data processing

We performed scRNA-seq data analysis by using the “CreateSeuratObject” function from the Seurat package in R software (version 4.3.3) (16), converting the CRC samples into a Seurat object. We removed genes that can only be detected in three or fewer cells and low-quality cells with fewer than 200 genes. Cells with more than 20% of mitochondrial genes were also excluded. The top 2,000 genes with highly variable characteristics were selected using the “FindVariableFeatures” function. Then, the data were subjected to principal component analysis (PCA). Cell clustering analysis was conducted using the “FindNeighbors” and “FindClusters” functions, with a resolution of 0.4. The algorithm of Uniform Manifold Approximation and Projection (UMAP) was employed to visualize the data.

Marker genes and cell types identification

Differentially expressed genes (DEGs) were identified using the “FindMarkers” function, with the expression ratio for the least differential genes set at 0.25, the log2 fold change threshold of 0.25, and the P value ≤0.05. Then, T cell markers were obtained by reviewing the literature and searching the CellMarker 2.0 database (http://bio-bigdata.hrbmu.edu.cn/CellMarker/). Based on this process, T cells were clustered and annotated into distinct subtypes (17,18).

Cell-cell communication analysis

The CellChat R software package integrates gene expression data with existing knowledge of the interactions among signaling ligands, receptors, and their cofactors to model potential cell-cell communication (19). Based on the package CellChat, we excluded cell-cell communications expressed in fewer than 10 cells within certain cell groups, using the “netVisual” function to visualize the interaction patterns. Then, the “netAnalysis_distribution” function was employed to systematically quantify and effectively illustrate the contributions of each individual ligand-receptor pair to a comprehensive understanding of the signaling pathway. The “PlotGeneExpression” function was applied to visualize the expression levels of ligands and receptors associated with selected signaling pathways in T cells. Additionally, we used the “netAnalysis_teCentricity” function to calculate network centrality scores, which helped identify dominant senders, receivers, mediators, and influencers in the networks.

Pseudotime trajectories analysis

The state of a cell is continuously evolving and undergoing changes throughout the process of development (20). All T-cell subsets and their pseudo-time trajectories were analyzed using the Monocle2 package in R. Using the “newCellDataSet” function constructed a Monocle object, with the parameter expressionFamily = negbinomial.size. Moreover, we exclusively included genes that exhibited a mean expression level of 0.1 or higher for the trajectory analysis. Low-quality cells were filtered out, and only genes expressed in a minimum of 10 cells were retained. Then, the “reduceDimension” function with the parameters method = “DDRTree” and max_components = 2 was used for dimension reduction. The cell development trajectory and gene expression heatmap were visualized using the “plot_cell_trajectory” and “plot_genes_branched_heatmap” functions.

Functional enrichment analysis

To investigate the potential biological processes and functions associated with T cell clusters, we conducted a comprehensive screening of all T cell marker genes for enrichment analyses using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG). This analysis was performed with the support of various R software packages, including “clusterProfiler”, “enrichplot”, and “org.Hs.eg.db”. In addition, we implemented gene set enrichment analysis (GSEA) to uncover genes that hold significant biological relevance and to identify key signaling pathways that may play crucial roles in the regulation and functionality of T cells.

Prognostic model construction and validation

First, DEGs with a P value <0.05 and |Log2FC| >1 in the TCGA training cohort were calculated by the “limma” package, and the identified DEGs were intersected with T cell markers. Subsequently, univariate Cox regression analysis was performed to screen out genes related to prognosis. Meanwhile, we conducted the least absolute shrinkage and selection operator (LASSO) regression algorithm, with 10-fold cross-validation employed to select key genes. These genes were then used for multivariate Cox regression analysis. Additionally, the prognostic model was constructed based on the following formula: Riskscore = (coefficient1 × gene1 expression) + (coefficient2 × gene2 expression) + ... + (coefficientN × geneN expression), where gene expression denotes the expression levels of the genes and coefficient represents the coefficients of the genes calculated by multivariate Cox regression. Patients in the TCGA cohort were divided into high-risk and low-risk groups based on the median riskscore. The Kaplan-Meier (K-M) curve analysis was employed to assess and compare the overall survival (OS) rates between the two groups. The time-dependent receiver operating characteristic (ROC) curves and the corresponding area under the curve (AUC) were calculated using the “survivalROC” package, facilitating the evaluation of the prognostic predictive performance of the model. Finally, the GSE87211 cohort was utilized as an independent external cohort to validate the efficacy of the prognostic model.

Immune microenvironment analysis

We utilized the “ESTIMATE” R software package to analyze gene expression profiling in CRC patients, assessing the immune score, stromal score, ESTIMATE score, and tumor purity, thereby gaining insights into the characteristics of the TME. Meanwhile, the Tumor Immune Dysfunction and Exclusion (TIDE) score was calculated using the TIDE algorithm to evaluate the relationship between risk scores and the effectiveness of immunotherapy. Furthermore, CIBERSORT is a method used to estimate the relative proportions of immune cell types based on gene expression data (21). In this study, we employed the R package “CIBERSORT” to estimate the abundance of 22 immune cell infiltration in the high-risk and low-risk score groups. Besides, the correlation between risk scores and the expression of immune checkpoints was also investigated.

Drug sensitivity analysis

To explore the potential role of risk scores for CRC in predicting drug sensitivity, we used the R package “oncoPredict” to obtain the IC50 values of drugs from the Genomics of Drug Sensitivity in Cancer (GDSC) database. The results were then visualized by plotting box plots using the R language “ggplot2”.

Statistical analysis

The Wilcoxon rank-sum test was used to evaluate the relationship of continuous variables between the two groups. LASSO regression and Cox regression analyses were used to construct a prognostic model for CRC. Survival analysis was conducted using the Kaplan-Meier curve, with the P value calculated through the log-rank test. AUC of ROC was used to assess diagnostic performance and predictive ability. All statistical analyses were conducted using R software version 4.3.3, and P<0.05 was considered statistically significant.


Results

Identification of CRC cell subtypes

A total of 6 CRC single-cell samples (3 left-sided and 3 right-sided samples) were included for analysis in this study (Figure 1A). First, after filtering out low-quality cells, 23,698 cells were included in the subsequent analysis. Following data normalization, the top 2,000 highly variable genes were selected. Besides, we used the PCA method for dimensionality reduction. To elucidate the characteristics of each cell subtype, we conducted an analysis of the marker genes across all subpopulations. Then, fifteen cell clusters were annotated using the UMAP algorithm in GSE188711 (Figure 1B). These clusters were divided into eight cell types, including T cells, B cells, plasma cells, myeloid cells, mast cells, fibroblast cells, endothelial cells and epithelial cells (Figure 1C). Among them, we found that T cells were present in four subpopulations and had high cell numbers, which suggested that T cells might have played a pivotal role in the invasion and metastasis of CRC. Figure 1D showed the presence of various cell types in 6 CRC samples. The expression of key marker genes for each cell type was visualized through bubble plots (Figure 1E). Based on the marker genes identified in the literature, T cells were further classified and annotated. In addition, T cells were classified into 9 subclusters and annotated as regulatory T cell, effector memory T cell, central memory T cell, T helper 17 cell and effector T cell (Figure 2A,2B). The T cells highly expressed FOXP3, RTKN2, CD8B, GZMK, IL7R, CCR7, IL17A, CTSH, and GNLY (Figure 2C). Figure 2D illustrated the specific distribution of the aforementioned nine genes within T cell clusters.

Figure 1 Integration and clustering of CRC single-cell data (GSE188711). (A) Six CRC samples were visualized using the UMAP algorithm. (B) The GSE188711 dataset was divided into 15 cell clusters. (C) Eight cell types were identified based on marker genes. (D) The cell types were detected in various samples. (E) Dot plot showing expression levels of marker genes for each cell subtype. CRC, colorectal cancer; UMAP, Uniform Manifold Approximation and Projection.
Figure 2 Identification of T-cell marker genes by scRNA-seq analysis. (A) UMAP of the nine cell clusters. (B) All five cell types were annotated using marker genes based on literature and CellMarker. (C) Dot plot showed the marker genes of T cell subtypes. (D) Feature plots depicting the marker gene expressions among the five cell types. scRNA-seq, single-cell RNA sequencing; UMAP, Uniform Manifold Approximation and Projection.

Cell-cell interaction analysis concerning T cells

In order to explore the mechanisms of interaction between T cells and other cell types, we investigated cell-cell communication networks to predict the interactions between different cell types through specific pathways and ligand-receptor pairs. We observed that the frequency and intensity of interactions were high between myeloid and endothelial cells, fibroblasts and T cells, as well as endothelial cells and T cells (Figure 3A,3B). The findings indicated that ligand-receptor-mediated cellular interactions primarily occurred within the MIF signaling pathway, involving MIF-CD74 + CXCR4 and MIF-CD74 + CD44 (Figure 3C,3D). We found that MIF-CD74 + CXCR4 is essential to the communication network of the MIF signaling pathway (Figure 3E). Moreover, the expressions of ACKR3 and CD44 genes were also linked to this signaling pathway (Figure 3F). Figure 3G demonstrated the involvement of T cells in the MIF signaling pathway.

Figure 3 Cell-cell communication analysis associated with T cells. (A) The number of interactions within the cell-cell communication network. (B) The interaction weights/strength within the cell-cell communication network. (C) The cell-cell communication network among different cell types regarding MIF signaling pathway. (D) Bubble diagram displaying the communication probabilities of ligand-receptor interactions among T cells and different cell types. (E) Cell-cell communication network revealed that MIF − (CD74 + CXCR4) are pivotal in the communication network. (F) Violin plot illustrating the expression of associated genes in the MIF signaling pathway. (G) Heatmap displaying the primary senders, receivers, mediators, and influencers in the MIF signaling pathway as inferred from network centrality scores.

Trajectory of T cells maturation

Pseudo-time analysis of T cells was conducted using the Monocle 2 algorithm to further confirm the developmental stages of T cell subsets. The results indicated that the eight clusters can be roughly divided into nine differentiation states (Figure 4A-4D). Figure 4E presented the trajectory axis of cell development, with darker colors indicating earlier developmental stages. The results showed that the cells in clusters 3 and 5 are in the early stages of development, while those in clusters 0 and 1 are in the terminal stages of development. Additionally, we identified four categories of genes that change over developmental time and nine T cell marker genes with dynamic expression (Figure 4F,4G).

Figure 4 Pseudotime analysis of T cells in CRC. (A) Trajectory plots showing the distribution of various clusters in T cells. (B) Different differentiation states of T cells. (C) Five types of T cell subsets and their distribution. (D) Differentiation of nine subtypes in the trajectory curve. (E) Displaying the developmental timeline of T cell subsets. (F) Heatmap of differentially expressed gene expression over pseudotime. (G) The dynamic expression of nine marker genes throughout pseudotime. CRC, colorectal cancer; MIF, macrophage migration inhibitory factor.

Enrichment analysis concerning T cells

To reveal potential biological functions and pathways associated with marker genes in T cells, we conducted GO, KEGG, and GAEA enrichment analyses. The results of GO enrichment showed that the marker genes were enriched in the regulation of immune effector process, regulation of T cell activation, leukocyte cell-cell adhesion and other elements of the biological process (BP). Cellular components (CC) were mainly associated with microtubule, kinetochore, midbody, kinesin complex, intercellular bridge. The molecular functions (MF) were enriched in the tubulin binding, histone binding, immune receptor activity, MHC protein complex binding, and CCR chemokine receptor binding (Figure 5A). The KEGG results showed that T cell-related marker genes were enriched in the Th17 cell differentiation, cell adhesion molecules, natural killer cell mediated cytotoxicity, and so on (Figure 5B). Besides, the GSEA results also indicated immune receptor activity, regulation of cell killing, and leukocyte-mediated cytotoxicity (Figure 5C).

Figure 5 Overview of functional enrichment analysis associated with T cell marker genes. (A) GO enrichment analysis related to T cells marker genes. (B) KEGG enrichment analysis related to T cells marker genes. (C) GSEA enrichment analysis related to T cells marker genes. BP, biological process; CC, cellular components; GO, Gene Ontology; GSEA, gene set enrichment analysis; KEGG, Kyoto Encyclopedia of Genes and Genomes; MF, molecular functions.

Risk model related to T cells

First, gene expression matrix in the TCGA dataset was intersected with the 4,440 T cells marker genes from the single-cell data (Figure 6A). We performed a univariate Cox regression analysis to identify potential prognostic DEGs for CRC in the TCGA cohort. A total of 86 genes were significantly associated with OS (P<0.01). Subsequently, the Lasso analysis identified 4 genes based on the optimal lambda value and their corresponding coefficients (Figure 6B,6C). Ultimately, 3 genes were obtained as independent prognostic genes through multivariate Cox regression analysis, including TIMP1, HHMR, and HIST1H1C (Figure 6D). Based on their coefficients, Riskscore = (−0.3547743 × HHMR expression) + (0.3324005 × TIMP1 expression) + (0.2301199 × HIST1H1C expression). The patients were divided into high-risk and low-risk groups according to the median value of Riskscore. Kaplan-Meier survival analysis showed that patients with high-risk group had significantly lower OS than those with low-risk group (Figure 6E). The heatmap of gene expression revealed that HMMR was highly expressed in the low-risk group, while TIMP1 and HIST1H1C exhibited higher expression levels in the high-risk group (Figure 6F). Furthermore, the AUC values for 1-, 2-, 3-, 4-, and 5-year surpassed 0.67, suggesting a better efficacy of the risk model (Figure 6G). To further determine the validity of the risk model, we conducted additional validation analyses in the GSE87211 cohort. The results revealed that the high-risk group had significantly poorer survival outcomes compared to the low-risk group. This finding not only confirms the widespread applicability of the risk model but also emphasizes the importance of identifying high-risk patients in clinical practice (Figure 6H-6K). In addition, a nomogram and decision curve analysis (DCA) were established to account for clinical heterogeneity and risk scores for potential clinical application (Figure S1).

Figure 6 Construction and validation of the prognostic risk model for T cell marker genes. (A) Intersection of CRC-related genes. (B) LASSO coefficient path diagram for 86 genes. (C) Cross-validation for parameter optimization in LASSO regression. (D) Forest plot of optimal prognostic genes based on multivariate Cox regression analysis from the training cohort. (E) Kaplan-Meier survival curve from the training cohort. (F) Risk ternary plot of the training cohort. (G) The AUC of the prediction for 1-, 2-, 3-, 4-, and 5-year survival rates using ROC analysis on the training cohort. (H) Multivariate Cox regression analysis from the validation cohort. (I) Kaplan-Meier survival curve from the validation cohort. (J) Risk ternary plot of the validation cohort. (K) The AUC of the prediction for 1-, 2-, 3-, 4-, and 5-year survival rates using ROC analysis on the validation cohort. AUC, area under the curve; CI, confidence interval; CRC, colorectal cancer; DEGs, differentially expressed genes; LASSO, least absolute shrinkage and selection operator; ROC, receiver operating characteristic.

Correlation of CRC with immune cell infiltration and checkpoints

To analyze the clinical significance of immune cell infiltration in CRC, we used R software package ESTIMATE to calculate TME scores. According to the TME score, we found that the immune score, stromal score, and estimate score of the high-risk group were significantly higher than those in the low-risk group (Figure 7A). Subsequently, the TIDE, exclusion, and dysfunction scores were higher in the high-risk group, while the MSI score was higher in the low-risk group, indicating a greater likelihood of immune escape among the high-risk group (Figure 7B). In addition, we used the CIBERSORT tool to predict the proportion of 22 tumor-infiltrating immune cells. The results indicated that T cells CD4 memory resting, T cells CD4 memory activated, and T cells follicular helper had higher expression in the low-risk group, whereas Macrophages M0 and NK cells activated had higher expression in the high-risk group. Plasma cells and resting memory CD4 T cells of the low-risk group were associated with a poor prognosis in CRC patients (Figure 7C). Then, we conducted a comprehensive analysis of the differences in immune checkpoints between high-risk and low-risk groups. The results demonstrated that LDHA, YTHDF1, PVR, and LDHB had higher expression in the low-risk group (P<0.05). Furthermore, the high-risk group of NRP1 was related to the poor prognosis of CRC patients (Figure 7D).

Figure 7 Correlation of the T cells prognostic risk model with immune cell infiltration and immune checkpoints in the TCGA cohort. (A) The boxplot depicting the proportion of stromal and immune cells in CRC tissues. (B) The differences in TIDE scores between the low-risk group and the high-risk group. (C) The infiltration levels of 22 immune cells were compared between the high-risk group and the low-risk group. (D) The levels of immune checkpoints were compared between the high-risk group and the low-risk group. ns, P>0.05; *, P<0.05; **, P<0.01; ***, P<0.001; ****, P<0.0001. CRC, colorectal cancer; TCGA, The Cancer Genome Atlas; TIDE, Tumor Immune Dysfunction and Exclusion.

Drug sensitivity in the low-risk and high-risk groups

To predict the sensitivity of the most commonly used chemotherapy and targeted drugs in CRC, we conducted a drug analysis on the low-risk and high-risk groups using the GDSC database. Alpelisib, lapatinib, vinorelbine, paclitaxel and oxaliplatin showed higher sensitivity in the high-risk group (P=4.1e−05, P<2.22e−16, P=0.037, P=0.00093, and P=5.3e−12, respectively). Additionally, in the high-risk group, dabrafenib, trametinib, selumetinib and dactolisib were less sensitive than in the low-risk group (P=0.00018, P=2.4e−05, P=8.9e−07, and P=0.047, respectively) (Figure 8). These results revealed the potential of riskscore in predicting drug sensitivity in CRC patients.

Figure 8 Screening of therapeutic agents for CRC based on risk models. Drug sensitivity in low-risk and high-risk groups, including dabrafenib, trametinib, selumetinib, alpelisib, dactolisib, lapatinib, paclitaxel, vinorelbine, oxaliplatin. CRC, colorectal cancer.

Discussion

CRC is a common and increasingly prevalent malignant tumor worldwide. Previous studies have shown that genetic factors significantly influence its development and progression, closely linking them to cancer pathogenesis, diagnosis, prognosis, and the immune microenvironment (22-24). Currently, scRNA-seq is widely used to identify T cell subsets within the TME and to investigate tumor heterogeneity (25,26). In this study, we analyzed scRNA-seq data from six GEO samples, including 27,927 cells from CRC patients. By integrating scRNA-seq and bulk transcriptome data, we identified T cell marker genes, analyzed their expression, and built a robust prognostic risk model.

In this study, we analyzed the GSE188711 scRNA-seq dataset to investigate cancer heterogeneity, identifying eight cell types. T cells were further subdivided into subtypes according to marker genes. Cell-cell interaction analysis revealed strong interactions between T cells and other subtypes, with the MIF signaling pathway being the most significant. Pseudotime trajectory analysis showed that central memory T cells are primarily at the initial point of development, while T helper 17 cells, regulatory T cells, and effector memory T cells are located at the terminal point. Our analysis of cell-cell communication further revealed that these terminally differentiated T cell subsets—namely, Th17, regulatory T cells (Tregs), and effector memory T cells—exhibited robust interactions with myeloid cells, particularly macrophages. The most significant ligand-receptor pairs mediating this cross-talk were identified within the Macrophage Migration Inhibitory Factor (MIF) signaling pathway, specifically the MIF-(CD74 + CXCR4) and MIF-(CD74 + CD44) pairs, with MIF–CD74+CXCR4 being the most pivotal. GO/KEGG analysis indicated enrichment in the cell cycle, Th17 cell differentiation, and the intestinal immune network for IgA production, suggesting these pathways are crucial for the proliferation and progression of CRC and may serve as potential therapeutic targets. Guo et al. found that loss of cyclin A2 (CCNA2) increases colonic epithelial cell proliferation and dysplasia, promoting cancer development, with high CCNA2 expression correlating to better prognosis (27). The Th17 pathway has also been linked to colorectal tumor development (28,29), with Wong et al. showing that Th1 and Th17 cell infiltration can enhance tumorigenesis via gut microbiota from cancer patients (30). Th17 cells can inhibit tumor growth by recruiting immune effector cells and activating CD8+ T cells (31). Additionally, IgA-producing cells, while serving as a primary immune defense, are considered immunosuppressive due to high levels of PD-L1, IL-10, TGF-β1, and IgA expression (32). Zhong et al. demonstrated that reducing IgA+ cells inhibits cancer progression (33).

To assess the link between T cells and CRC prognosis, we constructed a three-gene model (TIMP1, HMMR, HIST1H1C) using univariate Cox regression and LASSO in the TCGA cohort. This model was further validated in the GEO cohort. Tissue inhibitor of metalloproteinase-1 (TIMP-1) is a significant glycoprotein involved in regulating the immune microenvironment and inhibiting apoptotic pathways. As a key member of the TIMP family, it inhibits matrix metalloproteinases (MMPs) and is crucial for extracellular matrix remodeling (34,35). Aberrant TIMP1 expression is linked to poor prognosis in various tumors, including gastric and breast cancers (36,37), and may serve as a diagnostic or prognostic biomarker for CRC (38). Wang et al. showed that high expression of TIMP1 may promote the formation of an immunosuppressive microenvironment by inhibiting the recruitment or function of cytotoxic T cells, thereby leading to immune evasion and tumor progression (39). Detecting TIMP-1 could enhance early diagnosis and therapeutic decision-making, potentially improving clinical outcomes. Hyaluronic acid-mediated motility receptor (HMMR), also known as RHAMM, is overexpressed in several cancers, including colorectal, and is associated with poor prognosis (40). High HMMR levels correlate with tumor grade, lymph node invasion, and distant metastasis, suggesting its role in cancer progression (41). HMMR-positive tumor budding regions show increased CD8+ T cell infiltration and PD-1/PD-L1 upregulation, indicating HMMR modulates the TME by boosting cytotoxic T cell activity and activating immunosuppressive pathways (42). Histone cluster 1 H1 family member C (HIST1H1C) maintains T cell quiescence by regulating chromatin structure and repressive epigenetic marks; its loss disrupts immune homeostasis and impacts tumor immunity and immunotherapy response (43). Recent research has established that HIST1H1C is related to prognosis in resectable pancreatic cancer (44), suggesting its potential as a prognostic biomarker. In conclusion, TIMP1, HMMR, and HIST1H1C are important prognostic markers for CRC, which have opened up new avenues for targeted therapy in the future.

The TME is crucial in the development and metastasis of CRC, with complex interactions among immune cells, tumor cells, stromal cells, and surrounding tissues significantly influencing cancer progression and treatment response (45). First, this study categorized all samples into low-risk and high-risk groups according to the calculated risk score. We observed a significant elevation in the immune score, stromal score, and ESTIMATE score in the high-risk group compared to the low-risk group. Simultaneously, TIDE scores indicated that high-risk patients are less likely to benefit from immunotherapy. Further evaluation of the infiltration levels of 22 immune cell subtypes revealed distinct immunophenotypes between the risk groups. The findings revealed that T cells CD4 memory resting, T cells CD4 memory activated, and T cells follicular helper were more expressed in the low-risk group, while Macrophages M0 and NK cells activated were more expressed in the high-risk group. There is evidence that M0 macrophages are associated with reduced immune activity, distant metastasis, and poor prognosis in digestive system tumors (46,47). Furthermore, one study suggested that Natural killer (NK) cells are key lymphocytes involved in tumor and viral infections and can kill tumor cells (48), the increased abundance of activated NK cells in the high-risk group may suggest a complex, possibly suppressed cytotoxic response in the aggressive TME. We also systematically examined the expression profiles of key immune checkpoints and identified significant differences between the high-risk and low-risk groups. Specifically, LDHA, YTHDF1, PVR, and LDHB were highly expressed in the high-risk group. Moreover, NRP1 was related to poor survival prognosis in CRC patients. Neurociliary protein 1 (NRP1) is a multifunctional transmembrane protein with angiogenesis, tumorigenesis, and immune system function. At present, numerous studies have confirmed that NRP1 expression serves as a potential and useful biomarker in various tumors, including renal cell carcinoma and hepatocellular carcinoma (49,50). Additionally, the intracellular complex formed by NRP1 persistently activates the FAK/p130Cas signaling pathway, thereby promoting the migration, invasion, and metastasis of CRC cells (51). These comprehensive immunophenotyping results not only highlight the distinct immune microenvironmental landscapes between the two risk groups but also underscore the potential of immune-based stratification for prognostic prediction and therapeutic decision-making. Besides, we proposed a translational framework to enhance personalized treatment in CRC by leveraging immune cell infiltration profiles to inform immunotherapy selection, directing high-risk patients with elevated TIDE scores toward combination or alternative regimens; and utilizing drug sensitivity data from the GDSC database to tailor chemotherapy and targeted therapy. This approach might facilitate clinical implementation of our prognostic model, enabling individualized therapies and improved CRC outcomes. Ultimately, our research identified potential drug targets from the GDSC database based on the prognostic model, showing significant differences in responses to oxaliplatin, MAPK pathway inhibitors (trametinib, selumetinib), PI3K/mTOR pathway inhibitors (alpelisib, dactolisib), the BRAF inhibitor dabrafenib, the tyrosine kinase inhibitor lapatinib, and microtubule inhibitors (vinorelbine, paclitaxel) between high-risk and low-risk groups.

Limitations

Although the findings of this study enhance our understanding of the T cell prognostic model in CRC and its related TME, there are still several limitations. First, this T cell prognostic model was developed using a public dataset, and its predictive capability requires further validation in large-scale prospective clinical studies. Second, there were no experimental investigations to explore the potential molecular mechanisms that underlie the prognostic models in predicting patient prognosis and response to immunotherapy. Third, the limited number of scRNA-seq samples, the restricted data volume, heterogeneity in treatment protocols between the TCGA training set and the GEO validation set, and the absence of critical clinical variables in public databases may introduce potential biases. Therefore, the applicability of our research findings may be somewhat constrained.


Conclusions

In conclusion, we explored the T cell profile and TME of CRC using scRNA-seq and bulk RNA transcriptomes, and revealed critical potential prognostic genes. Subsequently, the heterogeneity of T cells with respect to functional enrichment, cell differentiation trajectories, and intercellular communication was initially elucidated. Furthermore, a prognostic risk model consisting of three T cell feature genes was developed to forecast the clinical outcomes of CRC patients, which showed excellent prognostic value. These results emphasize TIMP1, HMMR, and HIST1H1C as significant diagnostic and prognostic markers for CRC, demonstrating a wide array of biological functions. Our research also enhances our understanding of the regulation of tumor-infiltrating cells in CRC. We anticipate that the integrated analysis of scRNA-seq and bulk RNA transcriptomes will facilitate the advancement of targeted immunotherapy for CRC. The findings of this study require further validation through experimental and clinical practices.


Acknowledgments

We express our sincere gratitude to the researchers who provided the original open single-cell data. Additionally, we appreciate the research teams that played a significant role in generating data from CRC samples from the TCGA and GEO databases.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1104/rc

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1104/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1104/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
  2. Arnold M, Abnet CC, Neale RE, et al. Global burden of 5 major types of gastrointestinal cancer. Gastroenterology 2020;159:335-49.e15. [Crossref] [PubMed]
  3. Ruan X, Che T, Chen X, et al. Mendelian randomisation analysis for intestinal disease: achievement and future. eGastroenterology 2024;2:e100058. [Crossref] [PubMed]
  4. International Mismatch Repair Consortium. Variation in the risk of colorectal cancer in families with Lynch syndrome: a retrospective cohort study. Lancet Oncol 2021;22:1014-22. [Crossref] [PubMed]
  5. Karstensen JG, Bülow S, Højen H, et al. Cancer in Patients With Familial Adenomatous Polyposis: A Nationwide Danish Cohort Study With Matched Controls. Gastroenterology 2023;165:573-581.e3. [Crossref] [PubMed]
  6. Qin X, Yang Z, Li Y, et al. Treatment and prognosis of colorectal cancer with synchronous peritoneal metastases: 11-year single institute experience. eGastroenterology 2023;1:e100016. [Crossref] [PubMed]
  7. Benson AB, Venook AP, Al-Hawary MM, et al. NCCN Guidelines Insights: Rectal Cancer, Version 6.2020. J Natl Compr Canc Netw 2020;18:806-15. [Crossref] [PubMed]
  8. Yoshino T, Arnold D, Taniguchi H, et al. Pan-Asian adapted ESMO consensus guidelines for the management of patients with metastatic colorectal cancer: a JSMO-ESMO initiative endorsed by CSCO, KACO, MOS, SSO and TOS. Ann Oncol 2018;29:44-70. [Crossref] [PubMed]
  9. André T, Shiu KK, Kim TW, et al. Pembrolizumab in Microsatellite-Instability-High Advanced Colorectal Cancer. N Engl J Med 2020;383:2207-18. [Crossref] [PubMed]
  10. Siegel RL, Wagle NS, Cercek A, et al. Colorectal cancer statistics, 2023. CA Cancer J Clin 2023;73:233-54. [Crossref] [PubMed]
  11. Zubair H, Khan MA, Anand S, et al. Modulation of the tumor microenvironment by natural agents: implications for cancer prevention and therapy. Semin Cancer Biol 2022;80:237-55. [Crossref] [PubMed]
  12. Waldman AD, Fritz JM, Lenardo MJ. A guide to cancer immunotherapy: from T cell basic science to clinical practice. Nat Rev Immunol 2020;20:651-68. [Crossref] [PubMed]
  13. Aparicio C, Belver M, Enríquez L, et al. Cell Therapy for Colorectal Cancer: The Promise of Chimeric Antigen Receptor (CAR)-T Cells. Int J Mol Sci 2021;22:11781. [Crossref] [PubMed]
  14. Chuang L, Qifeng J, Shaolei Y. The tumor immune microenvironment and T-cell-related immunotherapies in colorectal cancer. Discov Oncol 2024;15:244. [Crossref] [PubMed]
  15. Chen S, Zhou Z, Li Y, et al. Application of single-cell sequencing to the research of tumor microenvironment. Front Immunol 2023;14:1285540. [Crossref] [PubMed]
  16. Hao Y, Hao S, Andersen-Nissen E, et al. Integrated analysis of multimodal single-cell data. Cell 2021;184:3573-3587.e29. [Crossref] [PubMed]
  17. Zhang L, Yu X, Zheng L, et al. Lineage tracking reveals dynamic relationships of T cells in colorectal cancer. Nature 2018;564:268-72. [Crossref] [PubMed]
  18. Hu C, Li T, Xu Y, et al. CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data. Nucleic Acids Res 2023;51:D870-6. [Crossref] [PubMed]
  19. Jin S, Guerrero-Juarez CF, Zhang L, et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun 2021;12:1088. [Crossref] [PubMed]
  20. Haghverdi L, Büttner M, Wolf FA, et al. Diffusion pseudotime robustly reconstructs lineage branching. Nat Methods 2016;13:845-8. [Crossref] [PubMed]
  21. Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 2015;12:453-7. [Crossref] [PubMed]
  22. Wang Y, Shi T, Song X, et al. Gene fusion neoantigens: Emerging targets for cancer immunotherapy. Cancer Lett 2021;506:45-54. [Crossref] [PubMed]
  23. Ardizzone A, Calabrese G, Campolo M, et al. Role of miRNA-19a in Cancer Diagnosis and Poor Prognosis. Int J Mol Sci 2021;22:4697. [Crossref] [PubMed]
  24. Heide T, Househam J, Cresswell GD, et al. The co-evolution of the genome and epigenome in colorectal cancer. Nature 2022;611:733-43. [Crossref] [PubMed]
  25. Ding S, Chen X, Shen K. Single-cell RNA sequencing in breast cancer: Understanding tumor heterogeneity and paving roads to individualized therapy. Cancer Commun (Lond) 2020;40:329-44. [Crossref] [PubMed]
  26. Zhuang J, Qu Z, Chu J, et al. Single-cell transcriptome analysis reveals T population heterogeneity and functions in tumor microenvironment of colorectal cancer metastases. Heliyon 2023;9:e17119. [Crossref] [PubMed]
  27. Guo Y, Gabola M, Lattanzio R, et al. Cyclin A2 maintains colon homeostasis and is a prognostic factor in colorectal cancer. J Clin Invest 2021;131:e131517. [Crossref] [PubMed]
  28. Cui G, Li Z, Florholmen J, et al. Dynamic stromal cellular reaction throughout human colorectal adenoma-carcinoma sequence: A role of TH17/IL-17A. Biomed Pharmacother 2021;140:111761. [Crossref] [PubMed]
  29. Xing C, Duan T, Li L, et al. T(H)17 cells regulate chemokine expression in epithelial cells through C/EBPβ and dictate host sensitivity to colitis and cancer immunity. Sci Adv 2025;11:eads3530. [Crossref] [PubMed]
  30. Wong SH, Zhao L, Zhang X, et al. Gavage of Fecal Samples From Patients With Colorectal Cancer Promotes Intestinal Carcinogenesis in Germ-Free and Conventional Mice. Gastroenterology 2017;153:1621-1633.e6. [Crossref] [PubMed]
  31. Cerboni S, Gehrmann U, Preite S, et al. Cytokine-regulated Th17 plasticity in human health and diseases. Immunology 2021;163:3-18. [Crossref] [PubMed]
  32. Shalapour S, Font-Burgada J, Di Caro G, et al. Immunosuppressive plasma cells impede T-cell-dependent immunogenic chemotherapy. Nature 2015;521:94-8. [Crossref] [PubMed]
  33. Zhong Z, Zhang H, Nan K, et al. Fasting-Mimicking Diet Drives Antitumor Immunity against Colorectal Cancer by Reducing IgA-Producing Cells. Cancer Res 2023;83:3529-43. [Crossref] [PubMed]
  34. Ma B, Ueda H, Okamoto K, et al. TIMP1 promotes cell proliferation and invasion capability of right-sided colon cancers via the FAK/Akt signaling pathway. Cancer Sci 2022;113:4244-57. [Crossref] [PubMed]
  35. Rao VS, Gu Q, Tzschentke S, et al. Extravesicular TIMP-1 is a non-invasive independent prognostic marker and potential therapeutic target in colorectal liver metastases. Oncogene 2022;41:1809-20. [Crossref] [PubMed]
  36. Zheng M, Wang P, Wang Y, et al. Clinicopathological and prognostic significance of TIMP1 expression in gastric cancer: a systematic review and meta-analysis. Expert Rev Anticancer Ther 2024;24:1169-76. [Crossref] [PubMed]
  37. Yang E, Ding Q, Fan X, et al. Machine learning modeling and prognostic value analysis of invasion-related genes in cutaneous melanoma. Comput Biol Med 2023;162:107089. [Crossref] [PubMed]
  38. Qiu X, Quan G, Ou W, et al. Unraveling TIMP1: a multifaceted biomarker in colorectal cancer. Front Genet 2023;14:1265137. [Crossref] [PubMed]
  39. Wang DW, Su F, Yang LJ, et al. Bioinformatics Analysis and Identification of Potential Genes Associated with Pathogenesis and Prognosis of Gastric Cancer. Curr Med Sci 2022;42:357-72. [Crossref] [PubMed]
  40. Tang YP, Yin YX, Xie MZ, et al. Systematic Analysis of the Clinical Significance of Hyaluronan-Mediated Motility Receptor in Colorectal Cancer. Front Mol Biosci 2021;8:733271. [Crossref] [PubMed]
  41. Zhu Y, Sun L, Yu J, et al. Identification of biomarkers in colon cancer based on bioinformatic analysis. Transl Cancer Res 2020;9:4879-95. [Crossref] [PubMed]
  42. Burren S, Reche K, Blank A, et al. RHAMM in liver metastases of stage IV colorectal cancer with mismatch-repair proficient status correlates with tumor budding, cytotoxic T-cells and PD-1/PD-L1. Pathol Res Pract 2021;223:153486. [Crossref] [PubMed]
  43. Yusufova N, Kloetgen A, Teater M, et al. Histone H1 loss drives lymphoma by disrupting 3D chromatin architecture. Nature 2021;589:299-305. [Crossref] [PubMed]
  44. Wu C, Wu Z, Tian B. Five gene signatures were identified in the prediction of overall survival in resectable pancreatic cancer. BMC Surg 2020;20:207. [Crossref] [PubMed]
  45. Shah DD, Chorawala MR, Raghani NR, et al. Tumor microenvironment: recent advances in understanding and its role in modulating cancer therapies. Med Oncol 2025;42:117. [Crossref] [PubMed]
  46. Nie K, Zheng Z, Wen Y, et al. Construction and validation of a TP53-associated immune prognostic model for gastric cancer. Genomics 2020;112:4788-95. [Crossref] [PubMed]
  47. Chang Z, Huang R, Fu W, et al. The Construction and Analysis of ceRNA Network and Patterns of Immune Infiltration in Colon Adenocarcinoma Metastasis. Front Cell Dev Biol 2020;8:688. [Crossref] [PubMed]
  48. Pesini C, Lanuza PM, Pardo J, et al. Expansion and activation of NK cells supported by accessory cells. Phenotypic and functional characterization. Methods Cell Biol 2025;196:237-50. [Crossref] [PubMed]
  49. Morin E, Lindskog C, Johansson M, et al. Perivascular Neuropilin-1 expression is an independent marker of improved survival in renal cell carcinoma. J Pathol 2020;250:387-96. [Crossref] [PubMed]
  50. Fernández-Palanca P, Payo-Serafín T, Fondevila F, et al. Neuropilin-1 as a Potential Biomarker of Prognosis and Invasive-Related Parameters in Liver and Colorectal Cancer: A Systematic Review and Meta-Analysis of Human Studies. Cancers (Basel) 2022;14:3455. [Crossref] [PubMed]
  51. Huang X, Ye Q, Chen M, et al. N-glycosylation-defective splice variants of neuropilin-1 promote metastasis by activating endosomal signals. Nat Commun 2019;10:3708. [Crossref] [PubMed]
Cite this article as: Liu P, Wang Y, Huang S, Luo R, Xie R, Peng J, Wang R, Wang P, Shi X, Zhang W, Shi L, Zhou X, Tang X. Integrative single-cell and bulk transcriptomes analyses reveal T cell-related prognostic risk model and tumor immune microenvironment modulation in colorectal cancer. Transl Cancer Res 2025;14(10):7119-7135. doi: 10.21037/tcr-2025-1104

Download Citation