Identification of differentiation markers and immune microenvironment in head and neck squamous cell carcinoma using machine learning combined with single-cell analysis
Original Article

Identification of differentiation markers and immune microenvironment in head and neck squamous cell carcinoma using machine learning combined with single-cell analysis

Zishanbai Zhang# ORCID logo, Yue Li#, Miao Wang, Yixin Jing, Honglian Hu, Menglin Shi, Yiming Ding, Xiaohong Chen ORCID logo

Department of Otolaryngology, Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, China

Contributions: (I) Conception and design: Z Zhang, Y Li, Y Ding; (II) Administrative support: Y Ding, X Chen; (III) Provision of study materials or patients: Z Zhang, Y Li, Y Jing, H Hu, M Shi; (IV) Collection and assembly of data: Z Zhang, Y Li; (V) Data analysis and interpretation: Z Zhang, Y Ding, M Wang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Xiaohong Chen, PhD; Yiming Ding, PhD. Department of Otolaryngology, Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, No. 1 Dongjiaominxiang Street, Dongcheng District, Beijing 100005, China. Email: trchxh@163.com; dyment@126.com.

Background: Head and neck squamous cell carcinoma (HNSCC) is the sixth most common malignant tumor globally. Histopathological grading classifies HNSCC into well-differentiated and poorly differentiated types. Poorly differentiated tumors tend to exhibit greater aggressiveness, with higher risks of invasion, metastasis, and mortality. However, current methods for assessing differentiation lack reliable molecular markers for objective diagnosis. Furthermore, the differences in immune microenvironment between well-differentiated and poorly differentiated tumors are not fully understood. In this study, we integrate machine learning and single-cell transcriptomic analysis to identify molecular markers closely associated with differentiation and explore the immune microenvironment differences across various differentiation states.

Methods: We obtained transcriptomic data, including pathological differentiation and survival annotations, from The Cancer Genome Atlas-Head and Neck Squamous Cell Carcinoma (TCGA-HNSC) dataset. Weighted gene co-expression network analysis (WGCNA) was employed to identify gene modules associated with tumor differentiation. Pseudotime trajectory analysis was performed using the Monocle tool on single-cell RNA sequencing data from the Gene Expression Omnibus (GEO) database. Branch Expression Analysis Modeling (BEAM) analysis was subsequently conducted to pinpoint gene changes at differentiation branch points. The integration of these analyses led to the identification of differentiation-related genes and the development of a differentiation prediction model. Finally, key differentiation-related genes and immune microenvironment differences were validated through western blot, quantitative polymerase chain reaction, immunohistochemistry, spheroid formation assays, and multiplex immunofluorescence.

Results: This study identified cellular retinoic acid-binding protein 2 (CRABP2) as a key gene associated with well-differentiated HNSCC. We found that low expression of CRABP2 may facilitate the formation of poorly differentiated HNSCC and contribute to the development of an immune-suppressive microenvironment. Tumor immune microenvironment remodeling was characterized by increased infiltration of regulatory T cells (Tregs) and reduced neutrophil infiltration, which may be linked to the invasiveness and poorer prognosis of poorly differentiated tumors.

Conclusions: Our findings highlight significant molecular differences between well-differentiated and poorly differentiated HNSCC, with low CRABP2 expression identified as a critical driver of poorly differentiated HNSCC and a central factor in the immune-suppressive tumor microenvironment. For the first time, we report that poorly differentiated HNSCC are associated with an immune-suppressive microenvironment, characterized by increased Treg infiltration and decreased neutrophil infiltration, which may contribute to their worse prognosis. This study identifies novel marker genes associated with HNSCC differentiation and offers new insights into the mechanisms driving tumor differentiation. These results pave the way for developing personalized therapeutic strategies targeting differentiation pathways and immune modulation, aiming to improve treatment outcomes for HNSCC patients.

Keywords: Head and neck squamous cell carcinoma (HNSCC); tumor differentiation markers; immune microenvironment; single-cell transcriptomics; machine learning


Submitted Aug 06, 2025. Accepted for publication Sep 28, 2025. Published online Dec 29, 2025.

doi: 10.21037/tcr-2025-1723


Highlight box

Key findings

• Cellular retinoic acid-binding protein 2 (CRABP2) was identified as a key regulator of tumor differentiation in head and neck squamous cell carcinoma (HNSCC) using single-cell analysis and transcriptomic machine learning.

• High CRABP2 expression promotes well-differentiated phenotypes, while low expression drives poor differentiation.

• Poorly differentiated HNSCC exhibits an immunosuppressive microenvironment with increased regulatory T cell (Treg) and decreased neutrophil infiltration, contributing to poorer prognosis.

What is known and what is new?

• Differentiation status is a major prognostic factor in HNSCC, yet its molecular regulatory mechanisms remain largely unclear.

• This study is the first to identify CRABP2 as a potential driver of differentiation in HNSCC and to establish a link between tumor differentiation and immune microenvironmental changes. Poor differentiation is associated with stronger immunosuppression.

What is the implication, and what should change now?

• Findings reveal new biological insights into HNSCC differentiation and propose CRABP2 as a prognostic biomarker and therapeutic target.

• Differentiation-related molecular profiling may improve personalized treatment and prognostic accuracy.

• Functional validation of CRABP2 and development of differentiation-targeted or immune-modulating therapies are warranted to improve patient outcomes.


Introduction

Head and neck squamous cell carcinoma (HNSCC) is a malignant tumor originating from the mucosal epithelial cells of the oral cavity, pharynx, larynx, and sinuses. It is the sixth most common cancer globally and the most prevalent malignant tumor in the head and neck region. Beyond mortality directly caused by HNSCC, this cancer has the second highest suicide rate among survivors, who often experience severe psychological distress and a significant decline in quality of life (QOL) (1).

The histopathological grading of squamous cell carcinoma (SCC) was initially described by Broders (2). Current diagnostic criteria remain centered on similar principles, classifying tumors into well-differentiated and poorly differentiated categories based on the degree of keratinization and the overall similarity of the dysplastic squamous epithelium to normal squamous epithelium (3). Poorly differentiation has long been considered a poor prognostic factor in HNSCC. Compared to well-differentiated SCC, poorly differentiated SCC is associated with an increased risk of perineural invasion, metastasis, and mortality (4). Despite improvements in the 5-year survival rate, it remains significantly lower than that of patients with well-differentiated SCC (73.1% vs. 55.9%) (5). These differences in clinical outcomes suggest that well-differentiated and poorly differentiated HNSCC may harbor distinct genetic and epigenetic landscapes, contributing to their divergent biological and clinical behaviors (6).

Currently, the determination of tumor differentiation primarily relies on manual pathological diagnosis. Well-differentiated HNSCC resembles stratified epithelium, with mature cells arranged in layers exhibiting irregular keratinization, most characteristically forming “keratin pearls”. Poorly differentiated tumors, on the other hand, are characterized by immature cells with nuclear pleomorphism and atypical mitoses, displaying higher mitotic activity, fewer stromal bridges, and little to no layering or keratinization (7). Determining differentiation largely depends on the subjective judgment of pathologists, leading to significant variability. Furthermore, there are no reliable molecular markers at the genomic or proteomic level to assess tumor differentiation objectively. From a therapeutic perspective, no studies have systematically explored treatment strategies explicitly tailored to well or poorly differentiated tumors. This lack of objective diagnostic markers and differentiation-specific treatment approaches challenges accurate diagnosis and optimal therapeutic decision-making.

Cancer stem cells (CSCs), also referred to as tumor-initiating cells (TICs), are believed to be the source of malignant tumor cell propagation in various human cancers. Their presence within tumors may indicate a high malignant potential, mediate resistance to conventional chemotherapy or radiotherapy, and ultimately contribute to poor clinical outcomes (8). CSCs, which possess unique self-renewal capabilities (9), comprise only a tiny fraction (1–3%) of primary tumor cells. Some studies suggest that CSCs may also serve as markers associated with tumor grade (10). Among HNSCC, the most extensively validated CSC markers with prognostic significance include CD44, CD133, and ALDH1. HNSCC cells with high CD44 expression exhibit enhanced self-renewal capacity, and elevated CD44 levels within HNSCC tumors have been associated with increased metastatic potential and poorer prognosis (11,12). However, the potential correlation between cancer stemness and the differentiation status of HNSCC has not been clearly investigated.

At present, the mechanisms underlying the differentiation heterogeneity of tumors remain poorly understood, and no specific treatment guidelines have been established for HNSCC based on the degree of differentiation. Therefore, elucidating the molecular mechanisms governing HNSCC differentiation is of critical importance. In this study, we integrated machine learning approaches with single-cell transcriptomic analysis to identify key molecular markers associated with well-differentiated and poorly differentiated SCC. Moreover, we further investigated the differences in the immune microenvironment between distinct differentiation states, along with the underlying molecular pathways. Through this work, we aim to advance our understanding of the differentiation landscape in HNSCC and provide a novel theoretical basis for the development of personalized therapeutic strategies. We present this article in accordance with the MDAR and TRIPOD reporting checklists (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1723/rc).


Methods

HNSCC data download

We obtained a dataset of 519 HNSCC samples with corresponding pathological differentiation information and survival annotations from The Cancer Genome Atlas-Head and Neck Squamous Cell Carcinoma (TCGA-HNSC) database. Gene expression data were downloaded from the cBioPortal database (http://www.cbioportal.org) and normalized to transcripts per million (TPM) (13). To focus on the extremes of differentiation, we excluded moderately differentiated samples, retaining 61 well-differentiated samples and 132 poorly or undifferentiated samples for subsequent analyses.

Weighted gene co-expression network analysis (WGCNA) analysis

We utilized the WGCNA R package (version 1.72-5) to identify gene co-expression networks associated with different levels of differentiation in the TCGA-HNSC dataset (14,15). The workflow follows: we first defined a similarity matrix and determined the optimal soft-thresholding power using the pickSoftThreshold function. This threshold was used to select the weighting coefficient and calculate the weighted adjacency matrix. The TOMsimilarity function was applied to the adjacency matrix to compute the topological overlap matrix (TOM). Subsequently, a dissimilarity matrix based on TOM (dissTOM) was calculated for hierarchical clustering. Using dissTOM, we performed hierarchical clustering to generate a dendrogram. Dynamic tree cutting was implemented via the cutreeDynamic function, with a specified minimum module size, to identify gene modules. For each module, the module eigengene (ME) was calculated to represent the primary expression profile of the module. Gene modules were correlated with HNSCC differentiation levels to identify the most positively and negatively associated modules. Genes from the most strongly related modules were subjected to Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses to uncover their biological significance.

Differential gene expression and functional enrichment analysis

We performed differential expression analysis to identify differentially expressed genes (DEGs) using the “limma” R package (version 3.54.2) (16). Genes were considered significantly differentially expressed between the well and poorly differentiated groups if they met the criteria of false discovery rate (FDR) <0.05 and |log2fold change (FC)| >1. We used a Venn algorithm to identify overlapping genes between DEGs and differentiation-associated genes identified through WGCNA, focusing on genes most relevant to differentiation. The “clusterProfiler” R package (version 4.6.2) was GO and KEGG pathway enrichment analyses (17). The results were visualized using bar plots and bubble charts, providing an intuitive representation of enriched pathways and biological processes (BP).

Random forest (RF) classification

In this study, the randomForest package was used to construct a RF classification model based on DEGs. The dataset was randomly divided into a training set (75%) and a testing set (25%). The model was trained on the training set and independently validated on the testing set. The dependent variable was the categorical variable grade (“low” and “high”), and the independent variables were the selected candidate genes.

During model construction, the parameters mtry (the number of variables available for splitting at each node) and ntree (the number of decision trees) were considered as key factors affecting classification performance. In this study, mtry =6 and ntree =500 were set as the primary parameters, and the out-of-bag (OOB) error was used to estimate the generalization error of the model.

Furthermore, feature importance (MeanDecreaseAccuracy and MeanDecreaseGini) was calculated, and variable importance plots were generated to identify the major contributing genes. The model’s classification performance was evaluated on both the training and testing sets using receiver operating characteristic (ROC) curves and confusion matrices, with the optimal classification threshold determined by the Youden index (maximizing sensitivity + specificity −1).

Protein-protein interaction (PPI) network analysis

We utilized the Gene Multiple Association Network Integration Algorithm (GeneMANIA) database (http://genemania.org/) to analyze the relationships between genes and identify genes related to those in the input list (18). Given a set of query genes, GeneMANIA integrates extensive genomic and proteomic datasets to identify genes with similar functions. It predicts associations by weighting each functional genomic dataset based on its relevance to the query genes, providing a comprehensive network of gene-gene and PPIs.

Stemness characteristics collection and consensus clustering of HNSCC stemness subtypes

We utilized the StemChecker tool (http://stemchecker) to analyze 26 stemness gene sets (19). These gene sets are derived from one of the most comprehensive and up-to-date collections of stemness features, defined through gene expression profiles, RNA screens, transcription factor (TF) target gene sets, literature reviews, and computational summaries. To quantify the stemness enrichment scores for each HNSCC sample, we applied single-sample Gene Set Enrichment Analysis (ssGSEA) using the “GSVA” R package (version 1.46.0) (20). This method calculated the enrichment levels of the 26 stemness gene sets across all samples.

Cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORT)

We employed the CIBERSORT deconvolution algorithm to analyze bulk RNA-seq data, accurately quantifying the relative proportions of various immune cell subpopulations (21). Using the “CIBERSORT” R package, we calculated the immune cell composition for each sample and quantified the tumor microenvironment (TME) scores. We compared the differences in immune cell proportions across different groups and selected samples with a P value <0.05 for further analysis. For visualization, we utilized the “ggplot2” R package (version 3.5.1) to generate plots illustrating differences in immune cell abundance and related analytical results (22).

Construction of the differentiation risk score and nomogram prediction model

A differentiation risk score was established using multivariate Cox regression analysis. The nomogram was developed using the rms R package (23), integrating the risk score with clinicopathological characteristics to predict the 1-, 3-, and 5-year survival probabilities of patients with HNSCC.

HNSCC single-cell RNA sequencing (scRNA-seq) data download, processing, and cell type identification

Three HNSCC scRNA-Seq datasets [accession numbers GSE181919 (24), GSE185965 (25), and GSE206332 (26)], comprising a total of 41 samples, were obtained from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/). First, human papillomavirus (HPV)-positive samples were excluded. The scRNA-seq data were then filtered by requiring that each gene be expressed in at least three cells and that each cell express at least 200 genes. Subsequently, the proportions of mitochondrial and ribosomal RNA (rRNA) genes were calculated using the PercentageFeatureSet function.

Different filtering thresholds were applied to each sample: for GSM5627944 and GSM6251296, each cell was required to express between 200 and 9,000 genes; for GSM5678434 and GSM6251294, the range was set between 200 and 7,500 genes; for GSM5678435, the range was between 200 and 5,000 genes; and for GSM6251297, GSM6251299, GSM6251300, and GSM6251303, the range was set between 200 and 8,000 genes. Additionally, the mitochondrial content for all samples was kept below 20%. After filtering, a total of 4,763 cells were retained. Subsequently, the data from the four samples were merged using the merge function. The merged dataset was normalized using the NormalizeData function, and highly variable genes were identified using the FindVariableFeatures function, with variability determined based on variance stabilizing transformation (“vst”). The top 2,000 features with the highest variability were selected. The data were scaled using the ScaleData function, with z-score transformation applied to each feature. Principal component analysis (PCA) was then performed to identify anchors for dimensionality reduction. The number of principal components (PCs) was determined using the following criteria: (I) cumulative contribution of PCs exceeded 90%; (II) individual PC contribution to variance was less than 5%; and (III) the difference between consecutive PCs was less than 0.1%. Based on these criteria, we selected 14 dimensions. Next, batch effects were corrected using the harmony function, followed by clustering analysis using the FindNeighbors and FindClusters functions with a resolution of 0.4 (27,28).

Finally, we identified marker genes for each cell cluster using the FindAllMarkers function. Human cell marker genes were downloaded from the CellMarker database (http://biocc.hrbmu.edu.cn/CellMarker/) (29), and marker genes specific to laryngeal tissues and cells were selected. The cell clusters were renamed accordingly using the RenameIdents function.

Pseudotime analysis

We utilized the Monocle2 R package (version 2.26.0) to determine cell differentiation trajectories based on highly variable genes (30). A cds object for pseudotime trajectory analysis was generated using the newCellDataset function with the parameter expressionFamily = negbinomial.size. Genes with an average expression value greater than 0.1 detected in at least 10 cells were included for pseudotime trajectory analysis. DEGs were identified using the differential gene testing function, and genes with a q-value <1e−6 were selected for dimensionality reduction. Dimensionality reduction was performed using the reduceDimension function with the parameters reduced_method = “DDRTree” and max_components =2. Cells were ordered, and the trajectory was constructed using the orderCells function. Genes were visualized using the plot_pseudotime_heatmap function. To analyze branching in single-cell differentiation trajectories, we employed the Branch Expression Analysis Modeling (BEAM) function to identify genes with significantly different expression patterns between two branches. The branching genes were visualized using the heatmap R package (version 1.0.12), which generated heatmaps of the differentially expressed branching genes (31).

Cellular trajectory reconstruction analysis using gene counts and expression (CytoTRACE)

To further elucidate the stemness differences among epithelial cells, we performed CytoTRACE analysis using the CytoTRACE R package (version 0.3.3) to evaluate cell stemness (32). CytoTRACE introduces a novel framework for quantifying cellular differentiation potential, leveraging gene counts to significantly improve differentiation assessment at the single-cell level. Higher scores indicate greater stemness and position cells along a trajectory corresponding to the BP of interest—in our case, cellular differentiation.

Cell-cell communication analysis

To infer and analyze intercellular communication, we employed CellChat (version 1.1.0) (33), a public repository of ligands, receptors, cofactors, and their interactions. It was designed to discover novel cell-cell communication mechanisms and construct intercellular communication networks.

Antibodies and reagents

The antibodies used in this study include anti-cellular retinoic acid-binding protein 2 (CRABP2) (10225-1-AP, Proteintech, Wuhan, China), anti-CD44 (37259S, Cell Signaling Technology, Danvers, MA, USA), anti-Involucrin (28462-1-AP, Proteintech), anti-forkhead box protein 3 (FOXP3) (ab20034, Abcam, Cambridge, UK), anti-CD4 (ab133616, Abcam, Cambridge, UK). These antibodies were utilized for western blotting (WB) and immunohistochemistry (IHC) analyses.

HNSCC samples

The human tissue samples analyzed in this study included 5 cases of primary well-differentiated HNSCC and 5 cases of poorly differentiated HNSCC. These samples were collected from the Department of Thyroid and Head and Neck Surgery at Beijing Tongren Hospital, Capital Medical University. Data on age, sex, tumor grade, and tumor-node-metastasis (TNM) staging were extracted from patient medical records (Table S1).

Cell lines

Fadu cells were obtained from the American Type Culture Collection (ATCC) and were cultured for no more than 15 passages. The cells were maintained in MEM supplemented with 10% fetal bovine serum (FBS) at 37 ℃ in a 5% CO2 atmosphere. Mycoplasma contamination was tested annually using polymerase chain reaction (PCR), and the morphology, growth characteristics, and absence of mycoplasma in the cells were routinely monitored.

RNA interference (RNAi)

All small interfering RNA (siRNA) transfections were performed using Lipofectamine RNAi MAX (Invitrogen, Carlsbad, CA, USA) following the manufacturer’s instructions. The siRNA concentration was maintained at 10 nM. Cells were harvested 72–96 hours post-transfection to meet the specific requirements of each experiment. For RNAi studies, at least four different siRNA sequences were tested for each gene. The two most effective sequences were selected for further experiments based on knockdown efficiency. Details of the siRNA sequences are provided in Table S2.

Quantitative real-time PCR (qPCR)

Total RNA was extracted from cells using TRIzol reagent (Invitrogen), and cDNA was synthesized using the reverse transcription system (Roche, Basel, Switzerland). Gene expression was quantified using qPCR on a LightCycler 480 real-time PCR system with Power SYBR Green PCR Master Mix (Roche), and UBC was used as the internal control. The primer sequences used for qPCR are listed in Table S3.

Tissue specimens and immunohistochemical staining

Tissue samples were obtained from surgical resection specimens of HNSCC patients. After surgical removal, the samples were frozen in liquid nitrogen and stored at −80 ℃ until human HNSCC tissue sections were prepared. HNSCC samples were embedded in paraffin and sectioned into 5-µm slices. After deparaffinization in xylene and rehydration through graded ethanol, the sections underwent antigen retrieval in an alkaline retrieval solution at high temperature, followed by blocking endogenous peroxidase activity with 3% hydrogen peroxide. The treated sections were then incubated overnight at 4 ℃ with primary antibodies against CRABP2, washed with PBS, and incubated for 30 minutes at 37 ℃ with horseradish peroxidase-conjugated secondary antibodies (KIT-9710, Mxb Biotechnologies, Fuzhou, China). Staining was performed using a DAB substrate kit (DAB-0031, Mxb Biotechnologies) and hematoxylin. Image taken through a microscope. Image quality was assessed, and any background with uneven illumination was corrected using Image-Pro Plus software. The staining intensity and extent of nuclear positivity were evaluated using Image-Pro Plus software to determine the scores for stained sections.

Sphere formation assay

HNSCC single cells were resuspended in sphere-forming MEM medium and seeded into ultra-low attachment 6-well plates (2,000 cells/20 µL). After 3 days of culture, the spheres were counted to evaluate the sphere-forming ability of HNSCC cells.

Multiplex immunofluorescence (mIHC)

Formalin-fixed, paraffin-embedded (FFPE) tumor samples from three well-differentiated and three poorly differentiated HNSCC patients were subjected to mIHC staining. The paraffin sections were deparaffinized in xylene and rehydrated in absolute ethanol. Antigen retrieval was performed using a citrate buffer solution, followed by the removal of endogenous peroxidase activity using a 3% hydrogen peroxide solution. After blocking with 3% bovine serum albumin (BSA) (GC305006, GC305006, Servicebio, Wuhan, China), the sections were incubated overnight at 4 ℃ with primary antibodies [CIAP2 (GB111051, GB111051, Servicebio, Wuhan, China), FOXP3 (GB112325), CD4 (GB11064)]. The tissue sections were then incubated with horseradish peroxidase (HRP)-labeled secondary antibodies corresponding to the species of the primary antibodies at room temperature for 50 minutes. Subsequently, the sections were incubated with fluorescent dyes iF488, iF647, and iF546. After adding 4’,6-diamidino-2-phenylindole (DAPI) staining solution, the sections were incubated in the dark at room temperature for 10 minutes. An autofluorescence quencher was applied for 5 minutes, and the slides were mounted using an anti-fade mounting medium.

Scanning was performed using the Pannoramic MIDI system (3DHISTECH, Budapest, Hungary), and quantitative analysis, including the number of positive cells, positive area, and colocalization, was conducted using the HALO platform (Indica Labs, Albuquerque, NM USA). The percentage of positive cells was defined as the number of positive cells divided by the total number of cells, while the positive cell density was defined as the number of positive cells per tissue area. The colocalization-positive cell ratio was calculated as the number of colocalized cells divided by the total number of cells, and the colocalization-positive cell density was defined as the number of colocalized cells per tissue area. The colocalization-positive cell ratio in CD4 cells was determined as the number of colocalized cells divided by the total number of CD4-positive cells, representing the proportion of CD4-positive cells colocalized with FOXP3.

Hematoxylin and eosin (HE) staining

The sections were deparaffinized and rehydrated in dewaxing solution and absolute ethanol. The sections were treated with a high-definition constant-staining pretreatment solution for 1 minute, followed by staining with hematoxylin for 3–5 minutes. Differentiation was performed using a differentiation solution, and bluing was achieved with a bluing reagent. The sections were dehydrated in 95% ethanol for 1 minute, then counterstained with eosin for 15 seconds. Subsequently, the sections were dehydrated in absolute ethanol and butanol I, cleared in xylene, and mounted with neutral resin. Microscopic examination and image acquisition were performed for further analysis.

Statistical analysis

Data from triplicate biological experiments were presented as mean ± standard deviation with error bars. A two-tailed t-test or analysis of variance (ANOVA) was used to compare data between groups. Statistical significance was defined as P<0.05. All statistical analyses were performed using SPSS software (Statistical Product and Service Solutions).

Ethical statement

This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Medical Ethics Committee of Beijing Tongren Hospital (approval No. TREC2023-KY011) and informed consent was taken from all the patients.


Results

WGCNA identifies HNSCC differentiation-related modules and hub genes

To explore the molecular mechanisms underlying differentiation differences in HNSCC, this study employed WGCNA to identify genes that are most closely associated with differentiation in the TCGA-HNSC cohort. First, the optimal soft-thresholding power β was set to 6 to ensure a scale-free network structure (scale-free R2=0.85) (Figure S1). Subsequently, the minimum gene count for each module was set to 30, resulting in 35 modules that clustered genes with similar expression patterns. Among the 35 modules, we found that the green module exhibited the strongest positive correlation with well-differentiation (ME =0.66, P=9e−25). Meanwhile, the purple module exhibited the strongest negative correlation with well-differentiation, and thus the strongest positive correlation with poor differentiation (ME =−0.47, P=2e−11) (Figure 1A-1C). Therefore, the green module was identified as most associated with well-differentiation, containing 1,110 candidate hub genes. Similarly, the purple module was recognized as the module most associated with poorly differentiation, containing 3,230 candidate hub genes.

Figure 1 Identification of differentiation-associated hub genes via WGCNA. (A,B) Heatmaps showing the correlation between MEs and clinical traits, including tumor differentiation status. (C) Scatter plots illustrating the relationship between MM and GS for the green and purple modules, both associated with differentiation. (D,E) GO and KEGG enrichment analyses for genes in the green (D) and purple (E) modules. BP, biological processes; CC, cellular components; Cor, correlation coefficient; GnRH, gonadotropin-releasing hormone; GS, gene significance; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; ME, module eigengenes; MF, molecular functions; MM, module membership; PPAR, peroxisome proliferator-activated receptor; WGCNA, weighted gene co-expression network analysis.

We further explored the biological functions of the genes in the green and purple modules through GO and KEGG pathway enrichment analyses. For the green module associated with well-differentiation, the key enriched GO pathways in BP, cellular components (CC), and molecular functions (MF) were epidermal development, cornified envelope, and structural constituent of skin epidermis. KEGG analysis revealed that the green module was primarily involved in gonadotropin-releasing hormone (GnRH) and peroxisome proliferator-activated receptor (PPAR) signaling pathways (Figure 1D). For the purple module associated with poorly differentiation, the primary enriched GO terms in BP, CC, and MF were nuclear division, chromosomal region and tubulin binding. KEGG analysis indicated that the purple module was mainly involved in the P53 pathway and cell cycle (Figure 1E).

Identification of core genes associated with differentiation using DEGs and RF

To further investigate the core genes influencing differentiation in HNSCC, we conducted a DEG analysis to compare gene expression differences between well-differentiated and poorly differentiated samples. Using DESeq2 analysis with the thresholds of |log2FC| ≥1 and FDR <0.05, we successfully identified 793 DEGs. Based on the log2FC values, 544 genes were labeled as upregulated and 226 genes as downregulated (Figure 2A,2B). To further refine the genes related to HNSCC differentiation, we intersected the results of WGCNA with the DEG analysis. Specifically, we associated upregulated genes with the green module and downregulated genes with the purple module, resulting in 502 genes most relevant to differentiation (Figure 2C). To understand the biological functions of these core genes, we performed GO and KEGG pathway annotation analyses on the 502 genes most associated with differentiation. The main GO pathways enriched in BP, CC, and MF included epidermal development, stratum corneum, and peptidase regulatory activity. Additionally, KEGG analysis indicated that differentiation-related genes were primarily involved in estrogen signaling and GnRH signaling pathways (Figure 2D, Figure S2).

Figure 2 Screening and characterization of differentiation-associated genes. (A) Heatmap showing differential gene expression between well-differentiated and poorly differentiated tumor groups. (B) Volcano plot of DEGs; gray dots denote non-significant genes, red dots indicate genes upregulated in well-differentiated tumors, and blue dots represent those upregulated in poorly differentiated tumors. (C) Venn diagrams showing the overlap between DEGs and genes from the green (well-differentiated) and purple (poorly differentiated) WGCNA modules. (D) GO and KEGG enrichment analyses of core gene sets associated with differentiation. (E) Top 30 hub genes identified by random survival forest analysis, ranked by relative importance. (F) ROC curves of the random forest model: red for the training set and black for the testing set. BP, biological processes; CC, cellular components; DEGs, differentially expressed genes; FC, fold change; FDR, false discovery rate; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; MF, molecular functions; ROC, receiver operating characteristic; WGCNA, weighted gene co-expression network analysis.

To further validate the impact of these core genes on differentiation status, we utilized RF, a more mature ensemble classification technique compared to traditional decision trees. In our training cohort, we used 500 trees to predict the differentiation status of patients with HNSCC. Based on the 502 core differentiation genes, we performed RF analysis to assess the impact of each gene on HNSCC differentiation and further identify genes with significant effects on differentiation. The data was divided into training and testing sets, and ultimately, in the testing set, we identified the top 30 genes most strongly associated with differentiation (Figure 2E). The model demonstrated excellent sensitivity and specificity in the training cohort. It also performed exceptionally well in the testing cohort, achieving a validation AUC of 1.000 and a training AUC of 0.986 (Figure 2F). These results suggest that our RF model can efficiently and accurately identify the degree of differentiation in HNSCC and pinpoint the core genes closely associated with HNSCC differentiation. Moreover, this model provides a powerful tool for further investigating the molecular mechanisms underlying differentiation in HNSCC. These genes and their associated pathways may serve as important biomarkers and therapeutic targets for the development of differentiation-related treatment strategies in HNSCC.

Establishing the differentiation trajectory of HNSCC epithelial cells and identifying the core differentiation gene CRABP2 via single-cell transcriptomic analysis

To investigate differentiation-related genes in HNSCC, we utilized scRNA-seq data from the GSE206332, GSE185965, and GSE181919 datasets available in the GEO database. We performed cell annotation for the GSE206332 and GSE185965 datasets, while the GSE181919 dataset retained its original annotations. Initially, we conducted data preprocessing and used the Harmony algorithm to remove batch effects. The UMAP plot in Figure 3A illustrates the clustering of cells from 9 different samples. We then applied the FindNeighbors and FindClusters functions for cell clustering, which successfully resulted in the identification of 15 distinct clusters, annotated according to characteristic genes (Figure 3B, Figure S3A,S3B,S3C).

Figure 3 Single-cell transcriptomic profiling and validation of CRABP2. (A,B) UMAP plots showing clustering of cells from nine mixed samples (A) and annotation of 11 distinct cell types (B). (C) UMAP plot of epithelial and malignant cells across all samples. (D) Left: differentiation trajectory of epithelial and malignant cells colored by cell state. Middle: Pseudotime trajectory of epithelial cells. Right: stemness score along the epithelial cell trajectory. (E) Differentially expressed genes between cell fate 1 and fate 2. (F) Dynamic expression patterns of DEGs along pseudotime in well- versus poorly differentiated groups. (G) IHC of CRABP2 in well- and poorly differentiated HNSCC tissue samples (n=5 each). Top: representative IHC images (magnification, ×10). Bottom: quantification of stained area and intensity using Image-Pro Plus software. (H) Western blot (left) and RT-PCR (right) showing expression of CRABP2 and differentiation markers (CD44, IVL) following CRABP2 knockdown in FADU cells. (I) Representative images (left) and quantification (right) of cell sphere formation in NC and siCRABP2-treated cells (magnification, ×20). *, P<0.05; **, P<0.01; ***, P<0.001. AOD, average optical density; ; DEGs, differentially expressed genes; HNSCC, head and neck squamous cell carcinoma; IHC, immunohistochemistry; NC, negative control; RT-PCR, reverse transcription polymerase chain reaction; UMAP, Uniform Manifold Approximation and Projection.

Next, we extracted epithelial cells from these datasets and merged them with epithelial and tumor cells from the GSE181919 dataset, ensuring batch effect removal (Figure 3C). To explore the heterogeneity of epithelial cells in HNSCC and investigate their differentiation trajectory, we conducted single-cell pseudotime trajectory analysis. This analysis revealed how epithelial cells transition from normal mucosa to cancerous epithelium (Figure 3D). We found that normal epithelial cells predominantly clustered in State5, so we selected State5 as the starting point for the differentiation trajectory (Figure S3C). In this trajectory, two major branches emerged at Node 2, leading to distinct terminal cell fates. Using HNSCC transcriptomic data with differentiation levels, we further validated that poorly differentiated HNSCC exhibits higher stemness (Figure S3D). Subsequently, we applied the CytoTRACE algorithm to the single-cell dataset to assess cellular stemness, helping to determine the differentiation status of the cells. CytoTRACE analysis indicated that cells in State5 exhibited the lowest stemness, supporting our hypothesis that State5 represents the starting point of differentiation. Moreover, CytoTRACE analysis revealed that cell fate 1 (State4) had significantly higher stemness than cell fate 2 (State1) (Figure 3D, Figure S3C). Based on these findings, we propose that, driven by genetic factors at Node 2, normal epithelial cells differentiate into two distinct cell states with varying stemness, which correspond to well differentiation and poorly differentiation HNSCC in clinical settings. To validate this hypothesis, we further evaluated the expression of well-established differentiation-related genes in HNSCC. Our analysis showed that poorly differentiation markers, including CD44 and BMI, were highly expressed in State1, while the well differentiation marker IVL exhibited higher expression in State4 (Figure 3E). These results provide additional evidence supporting our conclusion that normal epithelial cells at Node 2 differentiate into two branches: one branch differentiates into well differentiation (State1), and the other into poorly differentiation (State4).

Building on this, we calculated the differential genes between the two branches after Node 2 to identify potential core genes that regulate differentiation (Figure 3E). In addition to the previously identified differentiation-related genes CD44, BMI1, and IVL, CRABP2 emerged as a key differential gene. CRABP2 not only appeared in our transcriptome-based identification of core differentiation genes but also exhibited an expression pattern consistent with our transcriptomic observations (Figure 3F). Therefore, we propose that CRABP2 may be a core gene regulating HNSCC differentiation.

To validate this hypothesis, we performed IHC to assess the expression levels of CRABP2 in well-differentiated (5 cases) and poorly differentiated (5 cases) HNSCC tissues. Image-Pro Plus analysis revealed that CRABP2 expression was upregulated in well-differentiated HNSCC tissues (Figure 3G), consistent with our earlier findings. Subsequently, to further verify these results, we used two different siRNA groups to knock down CRABP2 in Fadu cells. We then assessed the expression of differentiation-related genes, including stemness marker CD44 and keratinization marker IVL, using qPCR and WB. The results showed that knocking down CRABP2 significantly downregulated IVL expression while upregulating CD44 expression (Figure 3H). Additionally, the cell sphere formation assay demonstrated a significant increase in cellular stemness following CRABP2 knockdown (Figure 3I).

CellChat analysis of immune microenvironment differences in HNSCC with varying levels of differentiation

We utilized CellChat to comprehensively evaluate immune cell interactions between well-differentiated and poorly differentiated tumor tissues and normal tissues in HNSCC. This analysis, based on cell communication frequency and signal intensity, aimed to uncover differences in the immune microenvironment between high and low differentiated HNSCC. Compared to well-differentiated cells (State1), poorly differentiated cells (State4) displayed a significant increase in interactions with T cells, B cells, and mast cells. Furthermore, the signaling intensity of these interactions was also markedly enhanced (Figure 4A-4C). These results indicate that poorly differentiated tumor cells exhibit more active interactions with immune cells, which may be crucial for immune evasion and tumor progression. To further explore the activation of these signaling pathways, we analyzed their activation levels across various pathways. We observed that the TGF-β signaling pathways were significantly activated in CellChat communication between poorly differentiated cells (State4) and T cells (Figure 4D). Moreover, the respective receptors of TGF-β were widely activated in the CellChat signaling pathways between State4 cells and T cells (Figure 4E).

Figure 4 Intercellular communication and TGF-β pathway dynamics in the tumor microenvironment. (A-C) Quantification of cell-cell interactions between T cells (A), B cells (B), and mast cells (C) and other cell types. (D) TGF-β pathway interaction map across different cell types. (E) Expression of ligand-receptor pairs within the TGF-β signaling pathway across cell types. (F) UMAP plot of T-cell subclusters. (G) Interaction strength (left) and number (right) of interactions between Treg cells and other cell populations. (H,I) TGF-β pathway interaction networks (H) and ligand-receptor expression (I) in well-differentiated, poorly differentiated, and T-cell subpopulations. (J) Violin plots illustrating immune cell composition in well- (red) versus poorly differentiated (blue) tumor tissues. (K) Top: mIHC images (magnification, ×10) showing CD4+ and FOXP3+ cells in well- and poorly differentiated HNSCC samples with corresponding H&E staining (magnification, ×10). Bottom: quantitative analysis of positive area, cell count, and colocalization using Image-Pro Plus. *, P<0.05; **, P<0.01; ***, P<0.001. Commun. Prob., communication probability; H&E, hematoxylin and eosin; HNSCC, head and neck squamous cell carcinoma; mIHC, multiplex immunohistochemistry; NK, natural killer; ns, not significant; Treg, regulatory T; UMAP, Uniform Manifold Approximation and Projection.

Given that TGF-β is recognized as crucial pathways for the activation of regulatory T cells (Tregs), we conducted further analysis on T cell subtypes (Figure 4F, Figure S4). Our analysis revealed that the interaction and signaling strength between poorly differentiated tumor tissues (State4) and immunosuppressive CD4+FOXP3+ Treg cells were significantly enhanced (Figure 4G). This suggests that poorly differentiated tumors may enhance tumor growth by activating Treg cells, which suppress the activation and proliferation of other immune cells, thereby creating an immunosuppressive microenvironment that promotes tumor growth and metastatic potential. We further analyzed the activation of TGF-β signaling pathways in State1 and State4 cells interacting with Treg cells (Figure 4H). The results demonstrated that TGF-β receptors were extensively activated in the CellChat communication between State4 cells and Treg cells (Figure 4I,4J).

To further explore the relationship between CRABP2 expression and TGF-β signaling activity, we performed a pan-cancer correlation analysis across multiple solid tumors, including HNSCC, lung squamous cell carcinoma, esophageal squamous cell carcinoma (ESCC), and colorectal cancer (Figure S5). The results consistently showed that CRABP2 expression was positively correlated with the expression of key TGF-β pathway factors, such as TGFB1, across these tumor types, although the correlation strength varied among cancers. These findings suggest that the association between CRABP2 and TGF-β signaling is not restricted to HNSCC but may represent a more generalizable feature across multiple squamous and non-squamous solid tumors. This supports the hypothesis that CRABP2 may contribute to shaping an immunosuppressive microenvironment, at least in part, through modulation of the TGF-β pathway.

To validate the reliability of these findings, we combined transcriptome data for an immune cell subtype composition analysis, comparing the immune cell composition in tumor tissues with different differentiation states (well-differentiated vs. poorly differentiated). Consistent with single-cell data, the analysis revealed that Tregs were significantly enriched in poorly differentiated tumors compared to well-differentiated tumors (P=0.01). Additionally, within myeloid cells, eosinophils were significantly enriched in well-differentiated tumors (P=0.005), while neutrophils were significantly reduced (P<0.001). The proportion of activated mast cells was also significantly lower in well-differentiated tumors (P=0.006). These findings were in strong agreement with the single-cell data. To validate this finding, we performed mIHC analysis (Figure 4K). The results show that the proportion of Tregs in poorly differentiated HNSCC is significantly higher than in well-differentiated HNSCC.

In conclusion, we discovered that CRABP2 not only plays a role in promoting tumor dedifferentiation but may also shape an immunosuppressive microenvironment by activating the TGF-β pathways, thereby facilitating the activation and recruitment of Treg cells. This process could be a significant factor contributing to the faster progression and shorter survival of poorly differentiated tumors.

Construction and evaluation of the survival prediction model

We further developed a nomogram model that incorporates clinical characteristics, including CRABP2 expression, M stage, risk score, and age (Figure 5A). Among these variables, CRABP2 expression had the most significant impact on survival prediction, further emphasizing the importance of this gene in the prognostic assessment of HNSCC. As shown in Figure 5B, the calibration curves for 1-, 3-, and 5-year survival rates closely align with the ideal reference line, providing strong evidence for the exceptional predictive accuracy of this nomogram.

Figure 5 Prognostic modeling of differentiation-associated genes. (A) Nomogram integrating gene-based risk score, M stage (M0/M1), and patient age to estimate 1-, 3-, and 5-year overall survival probabilities. (B) Calibration curves evaluating the predictive performance of the nomogram at 1 year (red), 3 years (green), and 5 years (blue). CIs, confidence intervals; M, metastasis; ORs, odds ratios; OS, overall survival.

In summary, the nomogram model that integrates CRABP2 expression and clinical factors offers a precise predictive tool for the prognostic evaluation of HNSCC, demonstrating significant clinical application value.


Discussion

Given the significant prognostic differences between patients with well-differentiated HNSCC and those with poorly differentiated HNSCC, investigating the molecular mechanisms underlying differentiation changes in HNSCC is of critical clinical importance for personalized treatment and improving patient outcomes. However, to date, no studies have specifically focused on the molecular mechanisms of differentiation in HNSCC. In this study, we employed single-cell analysis and transcriptomic machine learning methods for the first time to identify that the differentiation process in HNSCC may be driven by the CRABP2, influencing the extent of differentiation as normal epithelial cells transform into tumor cells. Our findings indicate that high CRABP2 expression promotes HNSCC differentiation toward a well-differentiated phenotype, whereas low CRABP2 expression tends to drive the tumor toward a poorly differentiated state. Furthermore, this study is the first to report that poorly differentiated HNSCC is associated with an immunosuppressive microenvironment when compared to well-differentiated HNSCC. This immunosuppressive environment, characterized by relatively higher Treg infiltration and lower neutrophil infiltration, appears to be associated with the poorer prognosis of poorly differentiated tumors, although direct functional evidence is still lacking and further studies are needed to validate this link.

CRABP2 is a lipid-binding protein of which its primary function is to bind and transport retinoic acid (RA) to the retinoic acid receptors (RAR/RXR) in the cell nucleus, thereby activating the transcription of downstream target genes (34). RA can induce the differentiation of mouse F9 embryocarcinoma (EC) cells into various types of germ layer-like cells, including primitive, parietal, and visceral ectoderm-like cells (35). RA is essential for the normal keratinization process of various epithelial cells (34). Because CRABP2 facilitates the transport of RA to RARs, it plays a crucial role in coordinating these effects. CRABP2 inhibits tumorigenesis by cooperating with RAR and enhancing HuR-mediated transcript stability (36).

Several studies have shown that CRABP2 exhibits anticancer activity, with its expression downregulated in various cancers, including prostate cancer (36), HNSCC (37), and astrocytic gliomas (38). In ESCC, CRABP2 expression is significantly associated with pathological type, TNM staging, tumor size, depth of infiltration, and cellular differentiation status, and negatively regulates the metastatic potential of esophageal tumor cells through epithelial-mesenchymal transition (EMT) (36,39). Other studies have shown that CRABP2 enhances apoptosis in breast cancer cells (40). Feng et al. have shown that CRABP2 can inhibit metastasis and invasion in breast cancer cells (41). In HNSCC, it has been found that patients with negative CRABP2 expression have a higher risk of local recurrence or distant metastasis compared to patients with positive CRABP2 expression (37). Although the role of CRABP2 in various cancers has been widely studied, its specific function in tumor differentiation has not been clearly characterized to date. Our research demonstrates that the expression level of CRABP2 is closely associated with the degree of differentiation in HNSCC. We have found that high expression of CRABP2 may be one of the key factors in maintaining the well-differentiated state of HNSCC, helping to sustain tumor cells in a high-differentiation status. In contrast, low expression of CRABP2 may promote tumor cells’ transition to a poorly differentiated and more invasive state by modulating the TME and cell signaling pathways. This finding indicates that CRABP2 not only plays a crucial role in tumor initiation and progression but may also serve as an important molecular marker for assessing the differentiation status of HNSCC. Furthermore, the expression level of CRABP2 may offer valuable insights into patient prognosis and survival predictions, thus holding potential clinical significance.

The TME is a complex ecosystem composed of various immune cell types that influence tumor cell elimination or immune evasion processes during tumor progression (42,43). Recent studies have revealed that the extent of immune cell infiltration within the TME is closely associated with tumor development, metastasis, and patient prognosis (44). Despite significant advances in TME research, there remains a limited understanding of the specific immune microenvironment differences across various types of HNSCC, especially between well-differentiated and poorly differentiated HNSCC. This area of research has been relatively underexplored. In our study, we specifically focused on the immune microenvironment differences between poorly differentiated HNSCC and well-differentiated HNSCC. Our results suggest that the immune microenvironment of poorly differentiated HNSCC shows distinct features from that of well-differentiated HNSCC. Specifically, we found that in poorly differentiated HNSCC tumors, the number of immunosuppressive Tregs was significantly higher, whereas the number of neutrophils, which are involved in tumor cell killing, was notably reduced. Tregs are a subset of T cells defined by the expression of the TF Foxp3, and they are thought to promote tumor progression by limiting the activity of immune effector cells (45). Tregs suppress immune responses by secreting inhibitory cytokines such as TGF-β, which directly inhibit the proliferation and activity of CD8+ T cells and natural killer (NK) cells, thereby potentially facilitating tumor immune evasion. Additionally, Tregs further diminish immune cell function by inhibiting IL-2 production or increasing its consumption, thereby strengthening their immunosuppressive effects (46,47). TGF-β has been implicated in this process as it not only directly inhibits the activity of immune effector cells but also cooperates with TNF-α to induce the differentiation of cancer-associated fibroblasts (CAFs) and Tregs, forming a positive feedback loop that impairs anti-tumor immunity and ultimately promotes tumor growth (48). In our research, we also observed that compared to well-differentiated HNSCC, the TGF-β pathway was significantly activated in the immune microenvironment of poorly differentiated HNSCC. This finding suggests a possible role for TGF-β in the formation of an immunosuppressive microenvironment in poorly differentiated HNSCC. These results not only reveal the mechanisms underlying the formation of an immunosuppressive microenvironment in poorly differentiated HNSCC but also suggest that Tregs and TGF-β may be involved in this process.

By further exploring the interactions between immune cells in the TME, especially the functions and mechanisms of Tregs, we can uncover new strategies and potential targets for immunotherapy in HNSCC. For example, poorly differentiated HNSCC patients may benefit from combined ipilimumab treatment. Studies have shown that ipilimumab can deplete Tregs and restore NK cell function, thereby enhancing the immune system’s ability to eliminate tumors. In the future, immunotherapy strategies targeting Tregs or the TGF-β signaling pathway may become a crucial breakthrough for improving the treatment outcomes of poorly differentiated HNSCC, improving patient survival, and overcoming immune evasion by tumors (46). These therapeutic approaches not only enhance the immune system’s ability to target and attack tumors but also hold promise for providing more personalized and precise treatment options for poorly differentiated HNSCC patients. Therefore, further exploration of immune regulatory mechanisms in the TME, particularly the roles of Tregs and TGF-β pathways, will open new avenues for immunotherapy in HNSCC and provide patients with more survival opportunities.

Our study is the first to uncover the molecular mechanisms associated with the differentiation of HNSCC and to analyze the differences in the immune microenvironment between well-differentiated and poorly differentiated HNSCC. However, there are several limitations that need to be addressed. The main conclusions of this study are based on bioinformatics analysis. Although we supported our hypothesis with in vitro experiments, further in vivo mouse model studies will provide a more comprehensive understanding and help reveal the specific molecular effects of differentiation and the immune microenvironment at a deeper level. Our findings indicate that CRABP2 is closely associated with the differentiation status of HNSCC. Nevertheless, further mechanistic studies are needed to elucidate its downstream molecular pathways, together with clinical validation, to determine their practical value in guiding therapeutic decision-making. This study has several limitations. First, the HPV status exerts a profound impact on the prognosis, differentiation patterns, and immune responses of HNSCC (49,50). Previous studies have shown that HPV-negative tumors are associated with poorer clinical outcomes, a higher frequency of CSCs, and a greater propensity for spontaneous dedifferentiation, with stemness markers more enriched in HPV-negative tumor populations (51,52). Based on these observations, our study primarily focused on HPV-negative HNSCC. Accordingly, HPV-positive samples were excluded from the single-cell analysis, and HPV-negative cell lines were used in functional validation experiments. Future work should extend these investigations to HPV-positive samples and cell lines in order to identify potential stemness-associated driver genes in HPV-positive HNSCC and to enable a systematic comparison between HPV-positive and HPV-negative tumors. Second, the validation cohort was relatively small. Although our findings were consistent across both single-cell and transcriptomic analyses, the limited sample size may reduce statistical power and restrict the generalizability of the results. This limitation is mainly due to the challenges in obtaining sufficient high-quality samples and conducting reliable batch correction across datasets. Therefore, the validation results should be considered supportive but preliminary. Larger, multi-center cohorts and additional experimental validation will be needed to confirm the robustness and clinical relevance of our findings. Third, the lack of single-cell datasets containing detailed differentiation information may limit the statistical power of our analysis. Finally, the mechanistic validation of CRABP2 was conducted in vitro using a single HNSCC cell line. Although supported by transcriptomic and pan-cancer correlation analyses, broader functional experiments are still lacking. This is largely due to the limited access to additional authenticated cancer cell lines and resource constraints during the current stage of the project. As a result, the role of CRABP2 in tumor differentiation across other cancer types remains to be clarified. In future studies, we plan to systematically expand the analyses to multiple tumor types using TCGA datasets and perform CRABP2 knockdown experiments in additional independent cell lines to provide more comprehensive evidence. Despite these limitations, this work provides a solid foundation for future research in the field, holds important clinical implications for improving patient prognosis through personalized treatment, and offers valuable insights into the diagnosis and therapeutic strategies for both benign and malignant differentiated tumors.


Conclusions

In conclusion, our study provides novel insights into the molecular mechanisms underlying differentiation in HNSCC. Through the integration of single-cell analysis and transcriptomic machine learning, we identified CRABP2 as a key regulator potentially driving the differentiation process in HNSCC. Elevated CRABP2 expression appears to promote a well-differentiated phenotype, while its reduced expression may facilitate a transition toward poor differentiation. Moreover, we demonstrated that poorly differentiated HNSCC is characterized by an immunosuppressive TME, marked by increased Treg infiltration and reduced neutrophil presence, which may contribute to its unfavorable prognosis. These findings not only reveal new biological insights into HNSCC differentiation but also highlight CRABP2 as a potential biomarker and therapeutic target for improving individualized treatment strategies. Future studies with functional validation are warranted to further clarify the causal mechanisms underlying these associations.


Acknowledgments

The authors sincerely appreciate Ziming Mao and You Qun for all the happy and lovely moments.


Footnote

Reporting Checklist: The authors have completed the MDAR and TRIPOD reporting checklists. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1723/rc

Data Sharing Statement: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1723/dss

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1723/prf

Funding: This work was supported by the Beijing Hospital Management Center “Dengfeng” Talent Cultivation Program (Program No. DFL20220201).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1723/coif) and report that this work was supported by the Beijing Hospital Management Center “Dengfeng” Talent Cultivation Program (Program No. DFL20220201). The authors have no other conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Medical Ethics Committee of Beijing Tongren Hospital (Approval No. TREC2023-KY011) and informed consent was taken from all the patients.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Osazuwa-Peters N, Simpson MC, Zhao L, et al. Suicide risk among cancer survivors: Head and neck versus other cancers. Cancer 2018;124:4072-9. [Crossref] [PubMed]
  2. Broders A. Practical points on the microscopic grading of carcinoma. NY State J Med 1932;32:667-71.
  3. Brougham ND, Dennett ER, Cameron R, et al. The incidence of metastasis from cutaneous squamous cell carcinoma and the impact of its risk factors. J Surg Oncol 2012;106:811-5. [Crossref] [PubMed]
  4. Roland NJ, Caslin AW, Nash J, et al. Value of grading squamous cell carcinoma of the head and neck. Head Neck 1992;14:224-9. [Crossref] [PubMed]
  5. Mehta V, Yu GP, Schantz SP. Population-based analysis of oral and oropharyngeal carcinoma: changing trends of histopathologic differentiation, survival and patient demographics. Laryngoscope 2010;120:2203-12. [Crossref] [PubMed]
  6. Chen P, Yu W, Huang J, et al. Matched-pair analysis of survival in patients with poorly differentiated versus well-differentiated glottic squamous cell carcinoma. Oncotarget 2017;8:14770-6. [Crossref] [PubMed]
  7. Fletcher CDM. Diagnostic Histopathology of Tumors. New York: Churchill Livingstone; 1995.
  8. Gupta PB, Chaffer CL, Weinberg RA. Cancer stem cells: mirage or reality? Nat Med 2009;15:1010-2. [Crossref] [PubMed]
  9. Krishnamurthy S, Dong Z, Vodopyanov D, et al. Endothelial cell-initiated signaling promotes the survival and self-renewal of cancer stem cells. Cancer Res 2010;70:9969-78. [Crossref] [PubMed]
  10. Thon N, Damianoff K, Hegermann J, et al. Presence of pluripotent CD133+ cells correlates with malignancy of gliomas. Mol Cell Neurosci 2010;43:51-9. [Crossref] [PubMed]
  11. Faber A, Barth C, Hörmann K, et al. CD44 as a stem cell marker in head and neck squamous cell carcinoma. Oncol Rep 2011;26:321-6. [Crossref] [PubMed]
  12. Yu SS, Cirillo N. The molecular markers of cancer stem cells in head and neck tumors. J Cell Physiol 2020;235:65-73. [Crossref] [PubMed]
  13. Cerami E, Gao J, Dogrusoz U, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2012;2:401-4. [Crossref] [PubMed]
  14. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008;9:559. [Crossref] [PubMed]
  15. Yip AM, Horvath S. Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinformatics 2007;8:22. [Crossref] [PubMed]
  16. Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43:e47. [Crossref] [PubMed]
  17. Yu G, Wang LG, Han Y, et al. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 2012;16:284-7. [Crossref] [PubMed]
  18. Warde-Farley D, Donaldson SL, Comes O, et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res 2010;38:W214-20. [Crossref] [PubMed]
  19. Pinto JP, Kalathur RK, Oliveira DV, et al. StemChecker: a web-based tool to discover and explore stemness signatures in gene sets. Nucleic Acids Res 2015;43:W72-7. [Crossref] [PubMed]
  20. Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 2013;14:7. [Crossref] [PubMed]
  21. Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 2015;12:453-7. [Crossref] [PubMed]
  22. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York; 2016.
  23. Núñez E, Steyerberg EW, Núñez J. Regression modeling strategies. Rev Esp Cardiol 2011;64:501-7. [Crossref] [PubMed]
  24. Choi JH, Lee BS, Jang JY, et al. Single-cell transcriptome profiling of the stepwise progression of head and neck cancer. Nat Commun 2023;14:1055. [Crossref] [PubMed]
  25. Chung CH, Lin CY, Chen CY, et al. Ferroptosis Signature Shapes the Immune Profiles to Enhance the Response to Immune Checkpoint Inhibitors in Head and Neck Cancer. Adv Sci (Weinh) 2023;10:e2204514. [Crossref] [PubMed]
  26. Sun Y, Chen S, Lu Y, et al. Single-cell transcriptomic analyses of tumor microenvironment and molecular reprograming landscape of metastatic laryngeal squamous cell carcinoma. Commun Biol 2024;7:63. [Crossref] [PubMed]
  27. Stuart T, Butler A, Hoffman P, et al. Comprehensive Integration of Single-Cell Data. Cell 2019;177:1888-1902.e21. [Crossref] [PubMed]
  28. Korsunsky I, Millard N, Fan J, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 2019;16:1289-96. [Crossref] [PubMed]
  29. Zhang X, Lan Y, Xu J, et al. CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res 2019;47:D721-8. [Crossref] [PubMed]
  30. Trapnell C, Cacchiarelli D, Grimsby J, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 2014;32:381-6. [Crossref] [PubMed]
  31. Qiu X, Hill A, Packer J, et al. Single-cell mRNA quantification and differential analysis with Census. Nat Methods 2017;14:309-15. [Crossref] [PubMed]
  32. Gulati GS, Sikandar SS, Wesche DJ, et al. Single-cell transcriptional diversity is a hallmark of developmental potential. Science 2020;367:405-11. [Crossref] [PubMed]
  33. Jin S, Guerrero-Juarez CF, Zhang L, et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun 2021;12:1088. [Crossref] [PubMed]
  34. Schug TT, Berry DC, Shaw NS, et al. Opposing effects of retinoic acid on cell growth result from alternate activation of two different nuclear receptors. Cell 2007;129:723-33. [Crossref] [PubMed]
  35. Rochette-Egly C, Chambon P. F9 embryocarcinoma cells: a cell autonomous model to study the functional selectivity of RARs and RXRs in retinoid signaling. Histol Histopathol 2001;16:909-22. [Crossref] [PubMed]
  36. Jiao X, Liu R, Huang J, et al. Cellular Retinoic-Acid Binding Protein 2 in Solid Tumor. Curr Protein Pept Sci 2020;21:507-16. [Crossref] [PubMed]
  37. Calmon MF, Rodrigues RV, Kaneto CM, et al. Epigenetic silencing of CRABP2 and MX1 in head and neck tumors. Neoplasia 2009;11:1329-39. [Crossref] [PubMed]
  38. Campos B, Warta R, Chaisaingmongkol J, et al. Epigenetically mediated downregulation of the differentiation-promoting chaperon protein CRABP2 in astrocytic gliomas. Int J Cancer 2012;131:1963-8. [Crossref] [PubMed]
  39. Li M, Li C, Lu P, et al. Expression and function analysis of CRABP2 and FABP5, and their ratio in esophageal squamous cell carcinoma. Open Med (Wars) 2021;16:1444-58. [Crossref] [PubMed]
  40. Donato LJ, Noy N. Suppression of mammary carcinoma growth by retinoic acid: proapoptotic genes are targets for retinoic acid receptor and cellular retinoic acid-binding protein II signaling. Cancer Res 2005;65:8193-9. [Crossref] [PubMed]
  41. Feng X, Zhang M, Wang B, et al. CRABP2 regulates invasion and metastasis of breast cancer through hippo pathway dependent on ER status. J Exp Clin Cancer Res 2019;38:361. [Crossref] [PubMed]
  42. Bejarano L, Jordāo MJC, Joyce JA. Therapeutic Targeting of the Tumor Microenvironment. Cancer Discov 2021;11:933-59. [Crossref] [PubMed]
  43. Lei X, Lei Y, Li JK, et al. Immune cells within the tumor microenvironment: Biological functions and roles in cancer immunotherapy. Cancer Lett 2020;470:126-33. [Crossref] [PubMed]
  44. Huntington ND, Cursons J, Rautela J. The cancer-natural killer cell immunity cycle. Nat Rev Cancer 2020;20:437-54. [Crossref] [PubMed]
  45. Dadey RE, Workman CJ, Vignali DAA, Regulatory T. Cells in the Tumor Microenvironment. Adv Exp Med Biol 2020;1273:105-34. [Crossref] [PubMed]
  46. Ruffin AT, Li H, Vujanovic L, et al. Improving head and neck cancer therapies by immunomodulation of the tumour microenvironment. Nat Rev Cancer 2023;23:173-88. [Crossref] [PubMed]
  47. Choi SW, Kim JH, Hong J, et al. Mapping immunotherapy potential: spatial transcriptomics in the unraveling of tumor-immune microenvironments in head and neck squamous cell carcinoma. Front Immunol 2025;16:1568590. [Crossref] [PubMed]
  48. Li L, Yang SH, Yao Y, et al. Block of both TGF-β and IL-2 signaling impedes Neurophilin-1(+) regulatory T cell and follicular regulatory T cell development. Cell Death Dis 2016;7:e2439. [Crossref] [PubMed]
  49. Sunny G, Patnaik A, Yadav R. Emerging role of immune checkpoint inhibitors in recurrent and metastatic head and neck squamous cell carcinoma. Int J Mol Immuno Oncol doi: 10.25259/IJMIO_22_2025.
  50. Becker AS, Wieder N, Zonnur S, et al. CMTM6 status predicts survival in head and neck squamous cell carcinoma and correlates with PD-L1 expression. Discov Oncol 2024;15:745. [Crossref] [PubMed]
  51. Vlashi E, Chen AM, Boyrie S, et al. Radiation-Induced Dedifferentiation of Head and Neck Cancer Cells Into Cancer Stem Cells Depends on Human Papillomavirus Status. Int J Radiat Oncol Biol Phys 2016;94:1198-206. [Crossref] [PubMed]
  52. Olivero C, Lanfredini S, Borgogna C, et al. HPV-Induced Field Cancerisation: Transformation of Adult Tissue Stem Cell Into Cancer Stem Cell. Front Microbiol 2018;9:546. [Crossref] [PubMed]
Cite this article as: Zhang Z, Li Y, Wang M, Jing Y, Hu H, Shi M, Ding Y, Chen X. Identification of differentiation markers and immune microenvironment in head and neck squamous cell carcinoma using machine learning combined with single-cell analysis. Transl Cancer Res 2025;14(12):8579-8599. doi: 10.21037/tcr-2025-1723

Download Citation