Identification and prognostic assessment of clonal and functional heterogeneity in head and neck squamous cell carcinoma
Original Article

Identification and prognostic assessment of clonal and functional heterogeneity in head and neck squamous cell carcinoma

Yibo Guo1,2#, Xiuyin Wu3#, Yi Xu4#, Yu Zhang1,2, Zheqi Liu1, Tong Ji1,5, Jie Hao6, Yang Wang1,7

1Department of Oral and Maxillofacial Surgery, Zhongshan Hospital, Fudan University, Shanghai, China; 2Department of Stomatology, Zhongshan Hospital, Fudan University, Shanghai, China; 3Department of Stomatology, People’s Hospital Affiliated to Shandong First Medical University, Jinan, China; 4Shanghai Jing’an District Dental Disease Prevention and Control Institute, Shanghai, China; 5Shanghai Stomatological Hospital & School of Stomatology, Fudan University, Shanghai, China; 6Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai, China; 7Department of Oral and Maxillofacial-Head and Neck Oncology, Fengcheng Hospital of Fengxian District, Shanghai, China

Contributions: (I) Conception and design: Y Wang, Y Guo; (II) Administrative support: Y Wang, T Ji; (III) Provision of study materials or patients: Y Wang, J Hao, T Ji; (IV) Collection and assembly of data: Y Guo, X Wu, Y Zhang; (V) Data analysis and interpretation: Y Guo, Y Xu, Z Liu; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work as co-first authors.

Correspondence to: Jie Hao, PhD. Institute of Clinical Science, Zhongshan Hospital, Fudan University, No. 180, Fenglin Road, Xuhui District, Shanghai 200032, China. Email: jhao@fudan.edu.cn; Yang Wang, MD. Department of Oral and Maxillofacial Surgery, Zhongshan Hospital, Fudan University, No. 180, Fenglin Road, Xuhui District, Shanghai 200032, China; Department of Oral and Maxillofacial-Head and Neck Oncology, Fengcheng Hospital of Fengxian District, Shanghai, China. Email: wangyang_zsyy@fudan.edu.cn.

Background: Head and neck squamous cell carcinoma (HNSCC) ranks sixth in incidence and seventh in mortality among cancers in China, affecting 750,000 individuals worldwide and resulting in over 300,000 deaths each year. The intratumoral heterogeneity of HNSCC poses significant challenges for prognosis and therapeutic interventions. This study aimed to investigate clonal heterogeneity and functional regulatory factors in HNSCC and establish a robust prognostic signature.

Methods: Single-cell and bulk RNA sequencing datasets were integrated to identify clonal subpopulations and functional states in HNSCC. Comprehensive analyses, including clonality assessment, pseudotime trajectory, and functional annotation, were employed to characterize intratumoral heterogeneity. The prognostic significance of these findings was evaluated through survival analyses. Subsequently, a prognostic signature was developed and its robustness was validated across multiple independent patient cohorts.

Results: Distinct clonal subpopulations were identified within HNSCC tumors, characterized by specific transcriptional profiles associated with stemness, proliferation, and epithelial-mesenchymal transition (EMT). Increased clonal and functional heterogeneity correlated significantly with worse patient survival. A novel prognostic signature, based on heterogeneity, exhibited strong predictive capabilities across multiple cohorts. The signature’s utility was further confirmed in an internal cohort, in which the risk score was associated with the aggressiveness of the tumor phenotypes.

Conclusions: This study identifies intratumoral clonal and functional heterogeneity as key drivers of poor prognosis in HNSCC. We found that increased heterogeneity, characterized by distinct cellular states, correlates significantly with worse patient survival. Consequently, we developed and validated a novel prognostic signature derived from genes governing the intercellular communication between these heterogeneous populations. This signature provides a robust tool for patient risk stratification and highlights novel therapeutic targets to overcome treatment resistance, offering valuable insights for personalized management of HNSCC.

Keywords: Head and neck squamous cell carcinoma (HNSCC); clonal heterogeneity; single-cell sequencing; prognostic biomarkers


Submitted May 15, 2025. Accepted for publication Sep 24, 2025. Published online Nov 26, 2025.

doi: 10.21037/tcr-2025-1019


Highlight box

Key findings

• Using single-cell analysis, we identified distinct genomic clones and functional subpopulations in head and neck squamous cell carcinoma (HNSCC).

• Higher intratumoral heterogeneity was significantly associated with worse patient survival.

• A novel 6-gene prognostic signature was developed from genes governing the intercellular communication between these heterogeneous cells. The signature robustly stratified patients in independent cohorts.

What is known and what is new?

• Intratumoral heterogeneity in HNSCC is a well-known driver of poor prognosis.

• This study newly identifies intercellular communication as the mechanistic link behind heterogeneity’s adverse impact. It provides a biologically informed signature that directly measures this pathological crosstalk, moving beyond simple statistical association.

What is the implication, and what should change now?

• The signature can improve clinical risk stratification for HNSCC patients, guiding personalized therapy. The identified communication pathways are novel therapeutic targets; research should now focus on disrupting this crosstalk to overcome treatment resistance.


Introduction

Head and neck cancers are among the most common malignant tumors, including laryngeal cancer, thyroid cancer, oral cancer, nasopharyngeal cancer, paranasal sinus cancer, hypopharyngeal cancer, etc. (1). The most common pathological type is squamous cell carcinoma (2). In China, head and neck squamous cell carcinoma (HNSCC) ranks sixth in incidence and seventh in mortality (3). Globally, HNSCC affects nearly 750,000 patients, with more than 300,000 deaths each year (4). The most common risk factors are smoking, excessive alcohol consumption, and human papillomavirus (HPV) infection (5).

At present, surgery or radiotherapy is still the first-line treatment for HNSCC (6), but the response rate of monotherapy for patients has reached the bottleneck stage (7). In addition, there is no effective treatment for patients who are inoperable or unwilling to undergo surgical resection. With the continuous expansion of the understanding of the genomic landscape of head and neck cancer, the selection of appropriate treatment based on the accurate understanding of the genomic variation of individual patients can help improve the prognosis of patients.

In the process of tumor occurrence and development, the genetic material in tumor cells undergoes mutations, and the accumulation of mutations will make the primitive cells develop into different types of cells. We call these different types of tumor cell populations in the same tumor tissue clones (8), and call the composition of clones clonal structures (9). The clonal evolution model suggests that tumors originate from normal cells. Clonal diversity constitutes the heterogeneity of tumors, resulting in differences in tumor growth rate, invasion ability, sensitivity to drugs, prognosis and other aspects (10,11). Tumor heterogeneity is one of the characteristics of malignant tumors. The study has shown that the generation of new clonal structures is one of the reasons for the failure of antitumor therapy and tumor recurrence (12). The circulating tumor DNA (ctDNA) can be used as a non-invasive means to monitor the clonal structure changes of tumors in hepatocellular carcinoma (HCC) (13). In multiple myeloma, with the development of the disease, clonal screening for treatment and the change of tumor microenvironment, clonal heterogeneity will gradually become more complicated, and eventually lead to disease progression and drug resistance of tumor cells (14). Beyond the genomic diversity defined by clonal structures, tumors also exhibit a profound degree of functional heterogeneity. This means that even cells within the same genetic clone can exist in different functional states, contributing uniquely to tumor progression, invasion, and therapeutic resistance. These distinct cellular states are governed by specific functional regulatory factors, which we define as the underlying gene expression programs and signaling pathways that dictate cellular behavior. Therefore, a comprehensive understanding of HNSCC requires a dual approach: first, identifying the genomic clonal architecture, and second, deconvoluting the functional states and their regulatory factors that operate within and across these clones. This integrated perspective provides a clearer picture of the tumor ecosystem and can reveal novel therapeutic vulnerabilities. At present, the programmed death 1 (PD-1) monoclonal antibody (mAb) provides a new treatment for patients with advanced HNSCC refractory to platinum-based chemotherapy (15). However, the side effects of immunotherapy are significant, the symptoms are often relieved slowly, and the quality of life of patients will be affected. Therefore, there is still a lack of reliable markers for the diagnosis and immunotherapy of HNSCC (16). The clinical value of studying clonal heterogeneity lies in the early treatment of patients with HNSCC and the exploration of whether the intervention at the stage of low clonal heterogeneity can improve the clinical efficacy. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1019/rc).


Methods

Data collection

This study involved four datasets: single-cell transcriptome sequencing dataset GSE164690, prognosis model training set GSE65858, validation set The Cancer Genome Atlas (TCGA)-HNSC, and user self-test data. The TCGA-HNSC dataset collects transcriptome data, mutation data, copy number variation (CNV) and clinical information. The data source is Xena. GSE164690 and GSE65858 data were downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.

Overview of the data

Data sets used by the project are listed in Table S1.

Data preprocessing

The GSE164690 dataset is single-cell RNA (scRNA) data and contains 134,606 single-cell transcriptome profiles. GSE164690 data set of raw data was downloaded from https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE164690&format=file. The dataset contains CD45+, CD45, and three cell types data from peripheral blood lymphocyte (PBL). All three types of cells were collected from 18 patients. Details about the sample collection process refer to the methods section of the original source of this dataset (17).

CD45 cells are non-immune cells, which are the target cell population of this project, while CD45+ cells paired with CD45 were absent in patient numbers HN02, HN03, and HN04. Therefore, to ensure subsequent analysis, CD45+ cells from HN02, HN03, and HN04 patients were first filtered. Of note, PBL sequencing results from HN10 patient were also filtered because the read10X function could not read due to data problems with the original downloaded data. It is suspected that an error occurred when the original author uploaded to the Gene Expression Omnibus (GEO) database. As PBL cells were not the focus of this study, their exclusion did not impact the subsequent analysis.

After sample screening, the merge function of Seurat R package was used to integrate the samples. The integrated sample had 23,946 genes sequenced. The integrated expression matrix was subjected to quality control and selected based on the following criteria: cells with more than 1,000 unique molecular identifier (UMI) counts; more than 200 genes and fewer than 6,000 genes; in UMI counts, less than 20% of mitochondrial genes were expressed. From filtered cells, gene expression matrices were normalized to total UMI counts per cell using SCTransform and transformed to a natural logarithmic scale. For batch correction, we used the Harmony R package (v1.0). The algorithm was run with default parameters, which include theta =2, lambda =1, to integrate cells from different patients. After data integration and correction of expression data, cells from CD45 were extracted and entered into downstream analysis.

Tumor cell extraction

Epithelial cells specifically expressed EPCAM, and endothelial cells expressed PECAM1, but neither expressed immune cell-related genes PTPRC, MME, CD3G, CD3E, and CD79A. Using R package Seurat, based on CD45 cells, the expression distribution of the above genes in cells can be used to separate epithelial cells from immune cells. In the epithelium, after further removal of endothelial cells and fibroblasts, the remaining epithelial cells are defined as tumor cells.

Single-cell copy number variation analysis

CopyKat was used to analyze the CNV spectrum of single cells. The CNV of annotated tumor cells isolated from CD45 samples were analyzed using endothelial cells and fibroblasts in CD45 samples as reference. The parameters to CopyKat are id.type = “S”, ngene.chr =5, win.size =25, KS.cut =0.1.

Differentiation of malignant cells from non-malignant cells

The CopyKat algorithm statistically classifies each cell as “aneuploid” or “diploid” based on its CNV profile relative to the provided normal reference cells. Cells predicted as “aneuploid” were considered malignant, while those predicted as “diploid” were considered non-malignant epithelial cells for the purpose of this study.

Cloning inference

In malignant cells, based on CNV of malignant cells, hierarchical clustering was performed using the hclust function with Euclidean distance and Ward.D2 linkage to infer clonal structures.

Functional enrichment

To quantify pathway activities at the single-cell level, we utilized the scGSVA R package. We performed functional enrichment using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway gene sets obtained from the Molecular Signatures Database (MSigDB, v7.4). The analysis was run with default parameters.

Dimensionality reduction and cell clustering

We visualized cell clusters and trajectories using the standard Monocle workflow. Monocle internally handles all normalization required for dimensionality reduction, visualization, and differential expression by controlling size factors for variability in the efficiency of cross-cellular library construction. After estimating the library size factors for each cell by estimating SizeFactors and the expression dispersion of each gene in the data set by estimating dispersions. In terms of dispersion, the top 1,500 highly variable genes were selected for clustering.

The expression values of these 1,500 genes in each cell were log-transformed by Monocle’s data preprocessing function [preprocess CellDataSet (CDS)] and projected onto the top 25 principal components (PCs). These low-dimensional coordinates are then used to initialize the nonlinear manifold learning algorithm implemented in Monocle 3, called “UMAP” (reduce dimension). This allows us to visualize the data in two or three dimensions. Specifically, we use the “cosine distance” metric to project onto two components, setting the parameters n_neighbors =50 and min_dist =0.1.

The Louvain method is used to detect cell clusters in the two-dimensional representation of our dataset (partition cells). This resulted in 25 clusters (Louvain communities). Cells are then clustered into supergroups using a method derived from approximate graph abstraction, and for each supergroup, cell trajectories are drawn on top of the projection using Monocle’s reverse graph embedding algorithm, which is derived from Simple PPT (learn Graph)

Analysis of intercellular communication

To infer and visualize intercellular communication networks, we employed the CellChat R package (v1.1.3). We used the curated human ligand-receptor interaction database, CellChatDB.human, for the analysis. The standard workflow was followed, which involves identifying over-expressed ligands and receptors in each cell group and then modeling communication probability based on the law of mass action.

Construction of a marker system for immune infiltrating cells to evaluate the prognosis of HNSCC

To build a prognostic signature from the 67 key pathway genes, we performed least absolute shrinkage and selection operator (LASSO) Cox regression analysis on the GSE65858 training set using the cv.glmnet function from the glmnet R package. The optimal regularization parameter, λ, was determined via 10-fold cross-validation, selecting the value that gave the minimum mean cross-validated error (lambda.min). Genes with non-zero coefficients at the optimal λ were included in the final prognostic model. The risk score for each patient was then calculated as the linear combination of the expression levels of these genes, weighted by their respective LASSO coefficients.

RiskScore=i=1nexpressionofgeneiLASSOcoefficientofgenei

Kaplan-Meier (K-M) survival analysis

K-M analysis was performed using the survival process of the R package, ggsurvplot was used for plotting, default parameters were used for analysis, and median values were used for grouping.

Statistical analysis

All statistical tests were performed with R (version 4.0). Differences were analyzed using the Wilcoxon test. The significance threshold is P<0.05, or P<0.01, as specified in each analysis.

Ethical statement

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.


Results

To identify clonal structures of HNSCC

After pretreatment of the original sample, tumor cells needed to be extracted from the single-cell sample from the tumor due to the influence of tumor purity. Using R package Seurat, the expression of specific marker genes based on epithelial cells, immune cells and stromal cells was first extracted, and stromal cells and immune cells in tumor samples could be eliminated. Finally, we obtained 14,587 epithelial cells, 11,195 immune cells, 7,912 endothelial cells and 4,684 fibroblasts from 38,378 CD45 cells (Figure 1A-1D, Table S2). Considering the cell heterogeneity in tumor tissues, 29.17% of CD45 cells were still immune cells, suggesting that although these immune cells did not express CD45, their gene expression profiles were still highly similar to those of immune cells. These CD45 cells may also be an important part of the tumor microenvironment.

Figure 1 CNV and clonal structure in HNSCC populations. (A) Expression of different marker genes in different subpopulations. (B) Marker gene expression among different cell types. (C,D) Two-dimensional distribution of all cells under UMAP and TSNE algorithms. (E) CNV profiles between normal and epithelial cells. (F) Clonal structure of tumor cells. (G) Clonal frequency distribution of different tumors. (H) Cell number statistics of different clones in different tumors. CNV, copy number variation; HNSCC, head and neck squamous cell carcinoma; HPV, human papillomavirus.

Next, 12,596 CD45 endothelial cells and fibroblasts were used as normal controls, and 14,587 epithelial cells were used as tumor cells. CopyKat was utilized to analyze single-cell CNV profiles. Compared with normal diploid cells, 14,071 cells were identified as having aneuploid variants. Due to the characteristic that aneuploid were malignant tumor cells, a total of 13,948 aneuploid epithelial cells were identified as malignant cells (Figure 1E).

Based on malignant cells, we used hierarchical clustering to cluster malignant cells based on cell CNV. We identified nine different clusters. Due to the heritable nature of genomic variation, these nine types were defined as nine clones with a common ancestor (Figure 1F). The cell numbers of these 9 clones were Clone 1: 2,911, Clone 2: 1,427, Clone 3: 845, Clone 4: 1,818, Clone 5: 876, Clone 6: 3,398, Clone 7: 1,021, Clone 8: 1,240 and Clone 9: 412 (Table S3). None of the nine clones was sample-specific. In other words, at least two clones were shared between different samples (Figure 1G,1H). For example, Clone 2 was detected in all 14 samples, while Clone 9 was detected in only two samples. These results indicate that these clones are not sample unique, but are widespread in HNSCC.

In addition, we also found that the long arm of chromosome 11 was universally amplified in HPV-infected samples. However, this fragment showed deletion status in the HPV samples. These results suggest that HPV infection affects chromosome copy number status in HNSCC. Interestingly, samples in Clone 1 and Clone 3 were all HPV+, while Clone 2 had both HPV+ and HPV samples, and the remaining clones were HPV samples. Considering that the main differences between Clone 1 and Clone 3 and other clones on CNV, such as the deletion of chromosome 1, are unrelated to the status of HPV, we believe that although HPV affects the chromosome copy number of HNSCC, it is not the decisive factor of cloning status in the whole clonal evolution process (Figure 1F).

Functional status of HNSCC clones

Since the tumor cells in the tumor tissue have clonal structure, the cells among these clones may have certain functional similarity, and thus have different functional localization. Next, single cell expression profiles were used as criteria to identify the similarities between single cells and explore the functional subsets of tumor cells. Based on single cell transcriptomes, we used the monocle3 function cluster_cells to cluster all malignant tumor cells. The results showed that these tumor cells could be divided into 14 functional subpopulations according to their expression profiles (Figure 2A,2B, Figure S1A-S1C).

Figure 2 Functional status of HNSCC clones. (A,B) Distribution of functional subsets and association of clones. (C) The percentage of cells in different clones with different functional subsets. (D) The number of cells of different functional subsets in different clones. (E) Cloning and clustering of functional subgroups. The black boxes show the functional subsets of the top 5 IQR. The upper left subgraph shows the clustering of 5 functional subsets and 9 clones. Red represents the subpopulation present in the clone, and gray represents the subpopulation not present in the clone. (F) Top 30 showed the enrichment results of KEGG pathways for the five subpopulations. The pathway enrichment value is the mean of the pathway enrichment score of the subpopulation of cells. HNSCC, head and neck squamous cell carcinoma; IQR, interquartile range; KEGG, Kyoto Encyclopedia of Genes and Genomes.

Next, we examined the relationship between clones divided by genomic characteristics and functional subsets divided by gene expression. We see that no subgroup exists within a single clone. In other words, any clone contains at least 2 or more functional subsets, indicating that there are similar functional subsets within different clones (Figure 2C,2D). However, it is worth noting that although there are multiple functional subsets within different clones, the proportions among different subsets vary greatly among clones. For example, 99.57% of the cells in Clone 1 belonged to functional subgroup 1, although the clone also contained functional subgroups 3, 5, 7, and 9. Based on this, we used interquartile range (IQR) as an index to analyze the enrichment degree of different functional subsets in clones (Table S4). A high IQR indicates that the functional subsets are evenly distributed among different clones, while a low IQR indicates that the functional subsets tend to be distributed centrally in a clone. In Figure 2D, we can see that functional subgroups 3 and 9 are widely distributed in almost all clones, suggesting that these two subgroups appeared earlier, which is consistent with the results of hierarchical clustering (Figure 2E, Figure S1C). Based on this, functional subsets 3, 9, 1, 8 and 2 of the top 5 IQR were selected for subsequent analysis. These 5 functional subsets have high IQR, and we can assume that they are relatively evenly distributed among different clones. This subgroup has similar functions in different clones, and we believe that their functions have important implications for tumor growth.

Next, we performed KEGG pathway enrichment analysis on malignant cells belonging to functional subsets 3, 9, 1, 8 and 2 using scGSVA package. Using analysis of variance (ANOVA) test, we analyzed the differences in enrichment scores of 226 KEGG pathways in five subsets. Surprisingly, all KEGG pathways showed significant differences among the five subsets (adj.P<0.05; Table S5). This suggests that there may be extensive purely functional complementarity among these 5 subgroups. In this regard, we focused on the KEGG pathway with top 30 significance (Figure 2F, Figure S1D,S1E). For example, the extracellular matrix (ECM) receptor interaction pathway, which is related to communication between cells, was highly enriched in functional subgroup 1 and low enriched in functional subgroup 3, which co-existed in Clones 1, 2, 3 and 7. These results suggest that these four clones have complementary ECM receptor interaction pathways.

Evolutionary trajectories of different functional states

Using Monocle3, we inferred the evolutionary trajectories of 14 functional subgroups. Since functional subgroup 3 exists in eight clones, based on the basic principles of evolution, functional subgroup 3 is defined as the root node. Thus, we plotted the evolution trajectory of functional subgroups (Figure 3A). Next, we employed Monocle3 to analyze the expression module of differences between functional subsets and clones (Figure 3B,3C). According to the module with highly co-expressed functional subsets and clones as the standard, the module with specific co-expressed functional subsets and clones was selected for downstream analysis (Tables S6,S7).

Figure 3 Evolutionary trajectories and pathway enrichment of functional subsets. (A) Evolutionary trajectories of functional subsets. The left shows functional subgroups, the middle shows clones, and the right shows pseudotime. (B) Coexpression modules for functional subsets. (C) Cloned co-expression module. (D) KEGG pathway enrichment statistics of functional subsets and cloned co-expressed module genes. The upper half indicates the significance of pathway enrichment in functional subsets. Orange represents significant enrichment and white represents insignificant enrichment. The lower half represents the significance of pathway enrichment in the clone. Tomato color represents significant enrichment, while white represents insignificant enrichment. KEGG, Kyoto Encyclopedia of Genes and Genomes.

Next, we performed KEGG pathway enrichment analysis with high co-expression module genes for each functional subgroup and clone. P<0.05 and q<0.2 were taken as the threshold value for filtering the enrichment results (Figure S2). Pathways significantly enriched in both clones and functional subsets contained in clones indicate the existence of functional consistency between clones and subsets (Figure 3D). Finally, we found that Clone 1 and functional subgroup 1 had identical functional regulators (126 genes, Table S8). Clone 2 and functional subsets 11 and 7 had constant regulators of functional regulators (1,441 genes, available online: https://cdn.amegroups.cn/static/public/tcr-2025-1019-1.xlsx). Clone 8 and functional subsets 11 and 7 had constant regulators of functional regulators (970 genes, available online: https://cdn.amegroups.cn/static/public/tcr-2025-1019-2.xlsx).

Cell communication between populations and clones with different functional states

Next, CellChat was used to analyze cell communication between different functional subsets and between clones. First, for functional subsets, we found that all functional subsets were closely related to each other (Figure 4A, Figure S3). Among the 14 functional subsets, there were 13 pathways for cell interaction (Figure 4B). Using the relationship between cophenetic and the number of patterns, we determined two patterns for the functional subgroup in outgoing (Figure 4C). In these two patterns, the relationship between corresponding functional subsets and expression patterns and cellular communication pathways is shown in Figure 4D. In the same way, we performed the same analysis on the same clones. We found that all clones were closely related to each other (Figure 4E, Figure S4). Among the 9 clones, there were 12 pathways for cell interaction (Figure 4F). Using the relationship between cophenetic and the number of patterns, we determined two patterns for the clones in outgoing (Figure 4G, Figure S5). In these two modes, the relationship between corresponding functional subsets, expression patterns and cell communication pathways is shown in Figure 4H.

Figure 4 Cell-to-cell communication and prognostic value of genes in key communication signaling pathways. (A-D) Cell to cell communication between functional subsets. (E-H) Cell-cell communication between clones. (I,J) LASSO related parameters. (K) LASSO coefficient of the six basic factors. (L) Prognostic differences between high- and low-RiskScore groups in the GSE65858 dataset. (M) 1-, 2-, and 3-year AUROC curves for the GSE65858 dataset. (N) Prognostic differences between high- and low-RiskScore groups in the TCGA-HNSC dataset. (O) AUROC curves for 1-, 2-, and 3-year in the TCGA-HNSC dataset. AUC, area under the curve; AUROC, area under the receiver operating characteristic curve; CI, confidence interval; HR, hazard ratio; LASSO, least absolute shrinkage and selection operator; TCGA-HNSC, The Cancer Genome Atlas-head and neck squamous cell carcinoma.

Next, we extracted the key outgoing subsets based on the maximum ratio principle of outgoing/incoming. For the functional subgroup, the key outgoing subgroup was subgroup 1 and 2, the corresponding pattern was Pattern 2, and the corresponding pathways were CD99, APP, MHC-I, DESMOSOME, EPHA, NECTIN and MPZ. For the clone, the key outgoing subgroup is Clone 1, the corresponding pattern is Pattern 2, and the corresponding pathways are CD99, APP, EPHA, NECTIN and MPZ. We set the intersection of functional subsets and clonal interaction pathways, and finally obtained the key downstream pathways: CD99, APP, EPHA, NECTIN and MPZ.

Next, we mined key pathway genes and finally obtained a gene set containing 67 genes by referring to the existing study (Table S9) (18). Further, using LASSO regression analysis, we identified six factors of the above 67 key pathway genes with significant prognostic value in the GSE65858 dataset (Figure 4I-4K). The six genes were CDH1, PVR, PTPRM, PDGFB, RAP1A and EPHA8. Significant prognostic genes related to the six key pathways reported by LASSO regression analysis were summarized above. RiskScore was calculated based on LASSO coefficients and expression levels of the six genes.

RiskScore=CDH1×(0.414)+PVR×1.03+PTPRM×0.4+PDGFB×0.48+RAP1A×2.397+EPHA8×2.53

Using RiskScore, we performed efficiency analysis and found that these six genes had significant prognostic power in the GSE65858 dataset (Figure 4L,4M). The area under the curve (AUC) of 1–3 years is higher than 0.698. In the TCGA-HNSC dataset, these six genes also had significant prognostic ability (Figure 4N,4O). The AUC of 1–3 years is above 0.545.

Finally, the center self-test data were used to verify the status of the above six prognostic genes in the user data. The database has only 6 samples, divided into 2 groups; each group has only 3 samples. The number of samples was small and the statistical test was not significant. We first calculated the RiskScore of samples, and found that although the RiskScore value between samples N and P was statistically insignificant (t-test, P=0.07), the mean RiskScore in P was higher than that in N (Figure 5A). Among the above 6 genes, except for EPHA8 expression level being 0 in all samples, only RAP1A gene expression level in P samples was significantly higher than that in N samples (Figure 5B).

Figure 5 Validation of the 6-gene prognostic signature in an internal patient cohort. (A) Risk score difference between N and P samples. (B) Comparison of the expression levels of the 6 signature genes between N and P samples. N, normal tissue; P, primary tumor tissue; TPM, transcripts per million.

Discussion

HNSCC is the seventh most common cancer, with about 700,000 new cases and 350,000 deaths in 2018 (2). HNSCC is also one of the most common cancers in China. In 2015, there were 77,000 new cases and 38,000 deaths in China (19). More than 60% of HNSCC patients were diagnosed as locally advanced stage. The prognosis of patients with locally advanced HNSCC is poor. About 50–60% of patients have local recurrence within 2 years, 20–30% of patients have distant metastasis, and the 5-year overall survival rate is less than 50% (20). At present, surgery combined with radiotherapy is still the first-line treatment for HNSCC (21). However, there is no effective treatment plan for some patients who cannot be surgically resected or are unwilling to undergo surgical resection.

Heterogeneity is one of the characteristics of malignant tumors, and clonal diversity is the cause of tumor heterogeneity. In the process of tumor development, the genetic material in tumor cells will mutate. Due to the uncertainty of the mutation process and the different microenvironment of tumor tissue, the clonal structure in tumor tissues of different individuals is often different. Clinical treatment may eliminate the clonal structure in the tumor and destroy its growth environment, but inevitably, there will be potential selective pressure on it, which will eventually favor the growth of resistant clones and lead to treatment failure.

A primary goal of this study was to bridge the conceptual gap between the complex biological phenomenon of intratumoral heterogeneity and the clinical need for a simple prognostic tool. Our approach followed a systematic, multi-step logic. We began by deconstructing HNSCC’s complexity into its fundamental genomic (clonal) and transcriptional (functional) units. We then hypothesized that the prognostic impact of this diversity is mediated through intercellular communication, which we modeled using single-cell data. By identifying the key signaling pathways that were consistently active in this cross-talk, we defined the core molecular machinery of the heterogeneous tumor ecosystem. Our final 6-gene signature was derived directly from these pathways, thereby serving as a quantitative molecular readout of the adverse, heterogeneity-driven cellular interactions.

While several prognostic signatures for HNSCC have been previously reported, the novelty of our 6-gene signature lies in its unique biological basis. Rather than being derived from bulk tissue differential expression, our model is a direct result of deconstructing the tumor’s single-cell heterogeneity. It specifically captures the molecular essence of the intercellular communication networks that are critical for coordinating the activities of diverse clonal and functional subpopulations. We believe this mechanistic foundation—linking the prognostic markers directly to the biology of tumor heterogeneity—is the signature’s primary advantage. Its consistent performance across two independent cohorts with different clinical and technical characteristics further validates its robustness. While direct statistical comparisons with other models are challenging due to cohort and platform heterogeneity, the strong biological rationale and validated performance of our signature establish it as a valuable new tool for understanding and stratifying HNSCC.

In conclusion, our findings have significant clinical implications and open new avenues for future research. The proposed 6-gene signature offers a promising and biologically-informed tool for the risk stratification of HNSCC patients, potentially helping to identify individuals who may benefit from more aggressive treatment or closer monitoring. More importantly, by identifying the critical role of intercellular communication in driving the poor prognosis associated with heterogeneity, our study highlights this process as a potential therapeutic vulnerability. Future strategies aimed at disrupting these specific signaling pathways could represent a novel approach to overcoming therapy resistance and improving clinical outcomes for HNSCC patients. Further validation of this signature in prospective clinical trials is warranted to confirm its clinical utility.


Conclusions

In this study, we demonstrate that intratumoral clonal and functional heterogeneity are key drivers of poor prognosis in HNSCC. The adverse clinical impact of this heterogeneity is mediated by a complex network of intercellular communication. Based on these findings, we developed and validated a novel 6-gene signature derived from core communication pathways, which serves as a robust tool for patient risk stratification. These results not only provide a promising biomarker for clinical use but also highlight the intercellular communication network as a potential therapeutic target to overcome treatment resistance in HNSCC.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1019/rc

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1019/prf

Funding: This work was supported by the following grants: National Natural Science Foundation of China (No. 82472922 to Y.W., Nos. 82173147 and 82372862 to T.J., No. 82403363 to Y.Z.), Shanghai Natural Science Foundation of China (No. 21ZR1453900 to Y.W.).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1019/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Mody MD, Rocco JW, Yom SS, et al. Head and neck cancer. Lancet 2021;398:2289-99. [Crossref] [PubMed]
  2. Johnson DE, Burtness B, Leemans CR, et al. Head and neck squamous cell carcinoma. Nat Rev Dis Primers 2020;6:92. [Crossref] [PubMed]
  3. Qi J, Li M, Wang L, et al. National and subnational trends in cancer burden in China, 2005-20: an analysis of national mortality surveillance data. Lancet Public Health 2023;8:e943-e55. [Crossref] [PubMed]
  4. Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
  5. Shamseddine AA, Burman B, Lee NY, et al. Tumor Immunity and Immunotherapy for HPV-Related Cancers. Cancer Discov 2021;11:1896-912. [Crossref] [PubMed]
  6. Wintergerst L, Selmansberger M, Maihoefer C, et al. A prognostic mRNA expression signature of four 16q24.3 genes in radio(chemo)therapy-treated head and neck squamous cell carcinoma (HNSCC). Mol Oncol 2018;12:2085-101. [Crossref] [PubMed]
  7. Pulte D, Brenner H. Changes in survival in head and neck cancers in the late 20th and early 21st century: a period analysis. Oncologist 2010;15:994-1001. [Crossref] [PubMed]
  8. Fowler JC, King C, Bryant C, et al. Selection of Oncogenic Mutant Clones in Normal Human Skin Varies with Body Site. Cancer Discov 2021;11:340-61. [Crossref] [PubMed]
  9. Gillis S, Roth A. PyClone-VI: scalable inference of clonal population structures using whole genome data. BMC Bioinformatics 2020;21:571. [Crossref] [PubMed]
  10. Morita K, Wang F, Jahn K, et al. Clonal evolution of acute myeloid leukemia revealed by high-throughput single-cell genomics. Nat Commun 2020;11:5327. [Crossref] [PubMed]
  11. Tanner G, Westhead DR, Droop A, et al. Benchmarking pipelines for subclonal deconvolution of bulk tumour sequencing data. Nat Commun 2021;12:6396. [Crossref] [PubMed]
  12. Chen G, Cai Z, Li Z, et al. Clonal evolution in long-term follow-up patients with hepatocellular carcinoma. Int J Cancer 2018;143:2862-70. [Crossref] [PubMed]
  13. Kaseb AO, Sánchez NS, Sen S, et al. Molecular Profiling of Hepatocellular Carcinoma Using Circulating Cell-Free DNA. Clin Cancer Res 2019;25:6107-18. [Crossref] [PubMed]
  14. Mishima Y, Mishima Y, Shirouchi Y, et al. The clonal evolution during long-term clinical course of multiple myeloma. Int J Hematol 2021;113:279-84. [Crossref] [PubMed]
  15. Chen SW, Li SH, Shi DB, et al. Expression of PD-1/PD-L1 in head and neck squamous cell carcinoma and its clinical significance. Int J Biol Markers 2019;34:398-405. [Crossref] [PubMed]
  16. Yin G, Guo W, Duan H, et al. Role of PD-1/PD-L1 inhibitors in the treatment of recurrent/metastatic head and neck squamous cell carcinoma: A systematic review and meta-analysis. Clin Otolaryngol 2021;46:1013-20. [Crossref] [PubMed]
  17. Kürten CHL, Kulkarni A, Cillo AR, et al. Investigating immune and non-immune cell interactions in head and neck tumors by single-cell RNA sequencing. Nat Commun 2021;12:7338. [Crossref] [PubMed]
  18. Schaefer CF, Anthony K, Krupa S, et al. PID: the Pathway Interaction Database. Nucleic Acids Res 2009;37:D674-9. [Crossref] [PubMed]
  19. Zheng R, Zhang S, Zeng H, et al. Cancer incidence and mortality in China, 2016. J Natl Cancer Cent 2022;2:1-9. [Crossref] [PubMed]
  20. Xu X, Li R, Zhang L, et al. Identification of factors related to immunotherapy efficacy and prognosis in patients with advanced head and neck squamous cell carcinoma. Diagn Pathol 2021;16:110. [Crossref] [PubMed]
  21. Lee NY, Ferris RL, Psyrri A, et al. Avelumab plus standard-of-care chemoradiotherapy versus chemoradiotherapy alone in patients with locally advanced squamous cell carcinoma of the head and neck: a randomised, double-blind, placebo-controlled, multicentre, phase 3 trial. Lancet Oncol 2021;22:450-62. [Crossref] [PubMed]
Cite this article as: Guo Y, Wu X, Xu Y, Zhang Y, Liu Z, Ji T, Hao J, Wang Y. Identification and prognostic assessment of clonal and functional heterogeneity in head and neck squamous cell carcinoma. Transl Cancer Res 2025;14(11):7758-7772. doi: 10.21037/tcr-2025-1019

Download Citation