Circadian-immune-related gene signature for lung squamous cell carcinoma: machine learning and multi-omics analysis
Original Article

Circadian-immune-related gene signature for lung squamous cell carcinoma: machine learning and multi-omics analysis

Yiheng Lu1,2,3#, Hui Li1#, Siyao Wang1,2#, Fan Yang2, Shuyan Xiao2, Yining Liu2, Qiaowei Liu1*, Xiao Han1*, Yi Hu1,4* ORCID logo

1Senior Department of Oncology, Chinese PLA General Hospital, Beijing, China; 2Medical School of Chinese PLA, Beijing, China; 3Department of Oncology, Hainan Hospital of Chinese People’s Liberation Army General Hospital, Sanya, China; 4Department of Oncology, The First Medical Center, Chinese PLA General Hospital, Beijing, China

Contributions: (I) Conception and design: Y Lu, H Li; (II) Administrative support: Q Liu, X Han, Y Hu; (III) Provision of study materials or patients: None; (IV) Collection and assembly of data: Y Lu, F Yang, S Xiao, Y Liu; (V) Data analysis and interpretation: Y Lu, H Li, S Wang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work as co-first authors.

*These authors contributed equally to this work.

Correspondence to: Qiaowei Liu. Senior Department of Oncology, Chinese PLA General Hospital, Beijing 100853, China. Email: dr_liuqiaowei@126.com; Xiao Han. Senior Department of Oncology, Chinese PLA General Hospital, Beijing 100853, China. Email:hanXiaoplagh@126.com; Yi Hu. Senior Department of Oncology, Chinese PLA General Hospital, Beijing 100853, China; Department of Oncology, The First Medical Center, Chinese PLA General Hospital, Beijing, China. Email: huyi301zlxb@sina.com.

Background: Circadian disruption promotes tumor progression and therapy resistance, but its prognostic value and impact on the tumor immune microenvironment in lung squamous cell carcinoma (LUSC) remain unclear. This study aimed to develop a circadian-immune-related gene signature to improve LUSC risk stratification and therapy guidance.

Methods: We integrated multi-omics data from LUSC patients (n=494) across The Cancer Genome Atlas and Gene Expression Omnibus (GEO) database GSE73403 (n=69). A circadian-immune-related gene prognostic signature (CIGPS) was constructed from 1,677 circadian rhythm-related genes through weighted gene co-expression network analysis (WGCNA) and ten machine learning algorithms (StepCox + GBM optimized) and externally validated. Bioinformatics analyses assessed the tumor immune microenvironment, mutation landscape, and therapy response. Single-cell RNA sequencing data (GSE148071) explored gene expression at cellular resolution.

Results: A six-gene CIGPS was developed, comprising CLDN1, FGG, ALPL, TREM1, NKX2-1, and CLDN5. This signature stratified patients into high- and low-risk groups with divergent overall survival in both cohorts. Unsupervised clustering revealed C1/C2 subtypes: C1 displayed an immune-hot yet excluded phenotype with higher CD8+ T cell infiltration but elevated physical barriers; C2 exhibited an immune-cold phenotype with DNA replication and epigenetic machinery activation. Both subtypes demonstrated distinct therapeutic vulnerabilities and divergent drug sensitivity profiles. Single-cell profiling demonstrated heterogeneous spatial distribution of signature genes across distinct tumor microenvironmental compartments.

Conclusions: The CIGPS serves as a potent prognostic tool that elucidates the heterogeneous tumor immune microenvironment in LUSC. It holds significant promise for guiding risk assessment and informing personalized therapeutic strategies based on distinct molecular subtypes.

Keywords: Lung squamous cell carcinoma (LUSC); circadian rhythm; prognostic signature; tumor immune microenvironment; multi-omics


Submitted Feb 06, 2026. Accepted for publication Apr 28, 2026. Published online May 15, 2026.

doi: 10.21037/tcr-2026-1-0302


Highlight box

Key findings

• A six-gene circadian-immune signature (CIGPS) was developed via ensemble machine learning (StepCox-GBM optimized), demonstrating robust prognostic stratification across The Cancer Genome Atlas and Gene Expression Omnibus cohorts. Unsupervised clustering revealed two molecular subtypes: C1 (immune-hot with exclusion barriers) and C2 (immune-cold with activated DNA replication/epigenetic machinery). Single-cell analysis localized signature genes to distinct tumor microenvironment compartments (epithelial, myeloid, endothelial).

What is known and what is new?

• Circadian disruption and immune dysregulation individually promote lung cancer progression, yet their integrative prognostic value in lung squamous cell carcinoma (LUSC) remains unexplored.

• This study establishes the first circadian-immune integrated prognostic model for LUSC, revealing that circadian-immune crosstalk operates through multicellular networks rather than tumor cell-autonomous mechanisms. The identification of subtype-specific drug vulnerabilities provides a novel framework for precision oncology.

What is the implication, and what should change now?

• The CIGPS offers a clinically applicable tool for risk stratification and treatment selection, potentially guiding the choice between cytotoxic chemotherapy and targeted agents based on molecular subtype. Integration of circadian biology into prognostic assessment may improve precision in personalized management.

• Prospective clinical trials validating CIGPS-guided patient stratification and chronomodulated therapeutic schedules are warranted. Further experimental studies are needed to elucidate the causal mechanisms linking circadian clock disruption to immune microenvironment remodeling in LUSC.


Introduction

Lung squamous cell carcinoma (LUSC), the second most common histological subtype of non-small cell lung cancer (NSCLC), accounts for approximately 25–30% of all lung cancer cases and represents a distinct and aggressive malignancy characterized by its unique molecular profile and strong association with smoking history (1,2). Despite advances in surgical resection, cytotoxic chemotherapy, and the recent emergence of immunotherapy, the 5-year survival rate for LUSC patients remains suboptimal, particularly for those presenting with locally advanced or metastatic disease (3,4). These clinical challenges underscore the urgent imperative to discover and validate novel biomarkers serving as refined stratification tools that reliably identify LUSC subpopulations most likely to derive substantial benefit from intensive therapeutic approaches, thereby furnishing critical prognostic and predictive insights essential for advancing personalized management strategies (5).

Circadian rhythms, evolutionarily conserved ~24-hour oscillators maintained by the CLOCK-BMAL1/PER-CRY transcriptional-translational feedback loop (6,7), govern critical cellular functions including cell cycle, DNA repair, and immune response through downstream clock-controlled genes (CCGs). Disruption of these circadian rhythm-related genes (CRGs) promotes tumorigenesis and therapeutic resistance across malignancies (8-10).

Circadian rhythm-dependent immunoregulation has emerged as a critical determinant of cancer progression. Accumulating evidence demonstrates that the circadian clock modulates multiple immune-related pathways and gene expression across various immune cell subsets (8,11,12). In murine models, intrinsic circadian clocks within CD8+ T cells and dendritic cells (DCs) govern their tumor infiltration patterns and effector functions, with treatment timing significantly influencing tumor growth dynamics (8,12). Furthermore, while immune checkpoint inhibitors (ICIs) have revolutionized cancer therapy, their efficacy is profoundly moderated by dosing time, with morning infusions associated with significantly superior progression-free and overall survival (OS) compared to late afternoon administration (13,14). Taken together, these findings underscore the substantial value of circadian-immune regulatory mechanisms in cancer. However, the prognostic significance of circadian-immune related genes (CIRGs) in LUSC remains largely unexplored.

In this study, we aimed to construct a robust prognostic signature by integrating circadian rhythm and immune-related genes in LUSC. Through machine learning-based integration of multi-omics data, we developed and validated a 6-gene circadian-immune-related gene prognostic signature (CIGPS). We further leveraged this signature to identify distinct molecular subtypes, characterize the tumor immune microenvironment and mutational landscapes, and systematically predict therapeutic responses. Additionally, functional enrichment analyses, single-cell transcriptomic profiling, and nomogram construction were performed to enhance the clinical interpretability and utility of the model. Collectively, this study provides a readily translatable framework for circadian-based risk stratification and personalized therapeutic decision-making in LUSC. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2026-1-0302/rc).


Methods

Data acquisition and preprocessing

This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. Figure 1 illustrates the overall study design. For bulk transcriptomic analyses, level-3 RNA-seq count data, somatic mutation calls, and clinical metadata for LUSC were obtained from The Cancer Genome Atlas (TCGA) via the TCGAbiolinks R package (v2.25.3) (15). The TCGA-LUSC cohort initially comprised 501 patients. After excluding seven cases with incomplete survival data, 494 patients served as the discovery set for model construction. Of the 494 patients with complete survival data, 487 (98.6%) were treatment-naïve at the time of primary surgical resection, whereas 7 patients (1.4%) had documented prior systemic therapy before sample collection. The tumor samples analyzed therefore overwhelmingly represent baseline gene expression profiles. The independent GEO dataset GSE73403 (n=69) served as the external validation cohort. Clinicopathological variables encompassing OS, vital status, age, sex and TNM stage were extracted and harmonized across cohorts. To dissect tumor heterogeneity at single-cell resolution, we retrieved scRNA-seq profiles from 6 primary LUSC tissue samples deposited in GSE148071 (16). Raw expression matrices were processed using the Seurat pipeline (see below). Finally, a reference list of 1,677 circadian rhythm-associated genes was curated from the Circadian Gene Database (CGDB, http://cgdb.biocuckoo.org, accessed July, 2025) (17) (available online: https://cdn.amegroups.cn/static/public/tcr-2026-1-0302-1.xlsx).

Figure 1 Schematic overview of the study design. This integrative bioinformatics framework leveraged transcriptomic profiles from three independent cohorts: TCGA-LUSC (n=494) as the discovery set for model construction, and GEO (GSE73403, n=69) for external validation, alongside single-cell RNA sequencing data (GSE148071) for tumor heterogeneity dissection. Circadian rhythm-related genes (n=1,677) were curated from the CGDB and subjected to WGCNA integrated with immune infiltration profiling to identify prognostic circadian-immune candidates. An ensemble machine learning approach was subsequently employed to construct and optimize the six-gene CIGPS. Comprehensive downstream investigations encompassed consensus clustering-based molecular subtyping, clinicopathological correlation, TMB quantification, immune microenvironment characterization, therapeutic response prediction, and single-cell resolution analysis of circadian-immune crosstalk. CGDB, Circadian Gene Database; CIGPS, circadian-immune-related gene prognostic signature; GEO, Gene Expression Omnibus; LUSC, lung squamous cell carcinoma; TCGA, The Cancer Genome Atlas; TMB, tumor mutational burden; WGCNA, weighted gene co-expression network analysis.

Key prognostic gene screening and model construction

Co-expression network construction and module identification

We employed WGCNA (v1.72) (18) to dissect tumor microenvironment (TME) architecture and identify immune-associated co-expression modules within the TCGA-LUSC cohort. Network construction utilized the 85% most variable genes, with soft-thresholding power (β) optimized to achieve scale-free topology (R2>0.85). Initial gene modules (minimum 100 genes) were delineated via hierarchical clustering of the topological overlap matrix (TOM) followed by dynamic branch cutting. ESTIMATE algorithm-derived scores (Immune, Stromal, and composite) were subsequently computed for each sample to quantify TME composition. Module-trait correlation analysis identified the eigengene most strongly associated with immune infiltration as the candidate microenvironment module for downstream circadian-immune integration. Highly correlated modules (r>0.3) were consolidated prior to final trait analysis.

Transcriptomic profiling

To identify aberrantly expressed transcripts in LUSC, we conducted differential expression analysis comparing tumor specimens with histologically normal lung parenchyma using the limma R package (version 3.54) (19). Transcripts exhibiting |log2 fold change| ≥1.0 with a false discovery rate (FDR)-adjusted P value <0.05 were designated as significantly differentially expressed genes (DEGs) (available online: https://cdn.amegroups.cn/static/public/tcr-2026-1-0302-2.xlsx).

Integrative selection of circadian-immune candidate gene

The definitive candidate gene pool was established through intersection analysis, retaining only those genes common to: (I) the WGCNA module exhibiting strongest phenotypic correlation, (II) the LUSC-related DEG repertoire identified above, and (III) the curated circadian rhythm gene compendium. This integrative filtering strategy yielded the final candidate set for prognostic signature construction.

Machine learning-based construction of CIGPS

Univariate Cox regression analysis (P<0.05) against OS in the TCGA discovery cohort served as the initial filter for prognostic candidate genes. Significant hits were then integrated into a comprehensive machine learning ensemble comprising ten distinct algorithms, from which the CIGPS was derived. This algorithmic approach was cross-validated to optimize predictive accuracy while mitigating risks of both overfitting and model under-complexity (underfitting) (20).

This comparative modeling approach generated 117 distinct predictive signatures through 10-fold cross-validation within the TCGA training set. Model discrimination was evaluated using the concordance index (C-index) averaged across both the training data and the pooled external validation cohort (GSE73403). The algorithmic combination demonstrating superior generalizability and predictive accuracy—StepCox (bidirectional) coupled with GBM—was selected as the definitive modeling strategy.This final model applies bidirectional StepCox regression for initial feature selection followed by GBM ensemble learning, capturing complex non-linear relationships between gene expression and survival while maintaining robustness against overfitting through shrinkage and subsampling.

Prognostic model performance assessment and validation

Survival and ROC analysis

Risk scores were calculated for each patient in the TCGA and GEO cohorts using the established CIGPS formula. Subsequently, patients were stratified into high-risk and low-risk categories based on the median risk score derived from the training cohort. Prognostic discrimination between risk strata was evaluated using Kaplan-Meier survival analysis with log-rank testing. Missing values were imputed using the predictive mean matching (PMM) algorithm (single imputation) with the ‘mice’ package in R. The predictive accuracy of the continuous risk score was further quantified through time-dependent receiver operating characteristic (ROC) curve analysis at 1-, 3-, and 5-year time points, with model performance expressed as the area under the ROC curve (AUC).

Independent prognostic value assessment

To determine whether the CIGPS-derived risk score provides prognostic information beyond established clinical parameters, we performed univariate and multivariate Cox proportional hazards regression analyses. Variables with >50% missing data were excluded to ensure model stability. Clinical covariates including age at diagnosis, sex and pathologic stage were incorporated alongside the risk score to evaluate its independent prognostic significance across cohorts.

Nomogram construction

Integrating the CIGPS-derived risk score with other independent clinicopathological covariates, we developed a comprehensive nomogram for individualized prediction of 1-, 3-, and 5-year OS probabilities using the rms package in R (21). Predictive discrimination was quantified via the concordance index (C-index), while model calibration was assessed through calibration curves comparing predicted survival probabilities against observed outcomes.

Consensus clustering and functional mechanism analysis

Molecular subtype delineation

Unsupervised consensus clustering was performed on the CIGPS expression matrix using ConsensusClusterPlus (v1.66.0) to identify intrinsic molecular subtypes. K-means clustering with Pearson distance metrics was employed across 80% resampling iterations (n=10), with the optimal cluster number (k) determined via cumulative distribution function (CDF) plots and average consensus scores.

Functional enrichment analysis

Biological pathway implications were interrogated through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses (clusterProfiler v4.10.1 (22). Differential pathway activity between subtypes was further characterized via Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA) utilizing Hallmark gene sets.

Somatic mutation analysis

Somatic mutation landscapes were characterized using mutation annotation format (MAF) data retrieved from cBioPortal. Tumor mutation burden (TMB) quantification and mutational spectrum comparisons between subtypes were performed using maftools (v2.18.0) (23) based on TCGA-LUSC somatic mutation profiles.

Circadian rhythm analysis

The circadian phases of individual samples from the TCGA-LUSC cohort were inferred using the TimeTeller algorithm (24). Phase differences between molecular subtypes (Cluster 1 vs. Cluster 2) and risk stratifications (high-risk vs. low-risk) were calculated, with prediction uncertainties recorded to assess estimate reliability.

Tumor immune microenvironment analysis

We employed CIBERSORT deconvolution (25) on bulk transcriptomic profiles to infer the relative abundances of 22 distinct immune cell subsets. Complementarily, ESTIMATE algorithm-derived metrics (26) quantified immune and stromal infiltration levels, yielding composite immune scores, stromal scores, and tumor purity estimates for each sample.

TIDE (Tumor Immune Dysfunction and Exclusion) scores (27) were computed to assess potential immune evasion mechanisms. Immunophenoscores (IPS) specific to LUSC were retrieved from The Cancer Immunome Atlas (TCIA) (28) to predict responsiveness to CTLA-4 and PD-1 blockade. Furthermore, differential expression patterns of antigen presentation machinery (HLA molecules), immunogenic cell death (ICD) modulators, and immune checkpoint genes (ICGs) were interrogated to delineate subtype-specific immunotherapy vulnerabilities.

Drug sensitivity prediction

The R package oncoPredict (version 1.2) (29), based on the GDSC2 database, was used to predict the half-maximal inhibitory concentration (IC50) for 198 FDA-approved drugs for each patient. To enhance clinical interpretability, we focused our presentation on a representative set of 16 compounds: eight standard-of-care therapies in lung cancer management (cisplatin, docetaxel, paclitaxel, gemcitabine, vinorelbine, erlotinib, gefitinib, and afatinib) and eight additional agents exhibiting the most pronounced differential sensitivity between C1 and C2 subtypes (including selumetinib and ribociclib). Differential drug sensitivity for these 16 compounds is visualized in Figure S1.

Single-cell transcriptomic analysis

Quality control and cell type annotation

Publicly available scRNA-seq profiles (GSE148071) were retrieved from the Gene Expression Omnibus (GEO) repository. Preprocessing and analytical workflows were implemented using Seurat (v5.3.0) (30). Cells were subjected to stringent quality control to exclude low-quality cells, doublets, and erythrocyte contaminants based on: (I) transcript capture depth (nCount_RNA between 500 and 20,000); (II) gene detection sensitivity (nFeature_RNA between 500 and 10,000); (III) mitochondrial content (<20%); and (IV) hemoglobin gene expression (<5%). Additionally, genes detected in fewer than 100 cells were excluded, and doublet artifacts were identified and removed using scDblFinder (31). Following quality control, dimensionality reduction was performed via principal component analysis (PCA). Graph-based clustering conducted on the top 30 principal components employed a resolution parameter of 0.6, selected through systematic multi-resolution analysis (range, 0.2–2.0) balancing technical metrics (silhouette score: 0.111; modularity: 0.879) against biological interpretability. This parameterization yielded 18 distinct clusters corresponding to major immune (T cells, B cells, NK cells, macrophages, Macrophages) and stromal (epithelial, endothelial, Fibroblasts) populations. Cell identities were assigned by mapping clusters against the Human Primary Cell Atlas (HPCA) reference using SingleR (v2.4.1), validated through canonical lineage marker expression.

Key gene expression profiling

The expression patterns of CIGPS constituent genes across cellular subpopulations were interrogated via UMAP-based visualization.

Immunohistochemical (IHC) validation

Protein expression patterns of CIGPS constituent genes were validated using IHC staining data retrieved from The Human Protein Atlas (HPA, https://www.proteinatlas.org) Representative staining images were systematically compared between normal lung parenchyma and LUSC tissues for signature genes to corroborate transcriptomic findings at the protein level.

Statistical analysis

All computational analyses were executed using R software (version 4.3.3) (32) . Continuous variables were compared between two groups using either Student’s t-test (parametric) or Wilcoxon rank-sum test (non-parametric), while Kruskal-Wallis analysis was employed for multi-group comparisons. Survival distributions were evaluated using the log-rank test. Associations between continuous parameters were quantified via Spearman’s rank correlation. A two-sided P value <0.05 was considered statistically significant unless otherwise specified.


Results

Identification of circadian rhythm-related prognostic candidate genes

The baseline clinicopathological characteristics of the TCGA-LUSC discovery cohort and the GSE73403 validation cohort are summarized in Table 1. Comprehensive bioinformatics screening was conducted to identify circadian-immune candidates associated with tumor immune microenvironment (Figure 2). A compendium of 1,677 circadian rhythm-related genes (CRGs) was curated from the CGDB database. WGCNA performed on the TCGA-LUSC discovery cohort identified co-expression modules tightly linked to immune infiltration patterns. Network topology analysis established a soft-thresholding power of β=4 (Figure 2A), yielding 9 distinct co-expression modules (Figure 2C). The MEbrown module exhibited the strongest positive correlation with ESTIMATE immune scores (Figure 2D), with robust module preservation evidenced by significant correlation between gene significance (GS) and module membership (MM) (r=0.92, P<1×10−200; Figure 2B). To establish a stringent candidate gene set, three-way intersection analysis was performed incorporating: (I) the immune-associated WGCNA module (MEbrown), (II) DEGs between LUSC and adjacent normal tissues, and (III) the curated CGDB gene list. This integrative filtering strategy, constrained to genes consistently detected across both the TCGA train and GEO validation cohorts (GSE73403), yielded 258 robust circadian-immune candidates for subsequent machine learning-based prognostic modeling (Figure S2).

Table 1

Baseline clinicopathological characteristics and data availability of the TCGA-LUSC cohort (n=494) and GSE73403 cohort (n=69)

Characteristic Low-risk, N=247 High-risk, N=247 P value
TCGA-LUSC cohort (N=494)
   Age (years) 69 [61–74] 69 [63–74] 0.45
    Missing 2 (0.8) 3 (1.2)
   Age group 0.55
    ≤ median 127 (51.8) 133 (54.5)
    > median 118 (48.2) 111 (45.5)
    Missing 2 (0.8) 3 (1.2)
   Gender 0.30
    Female 59 (23.9) 69 (27.9)
    Male 188 (76.1) 178 (72.1)
   Pathological stage 0.24
    I–II 205 (83.7) 195 (79.6)
    III–IV 40 (16.3) 50 (20.4)
    Missing 2 (0.8) 2 (0.8)
   T stage 0.69
    T1 59 (23.9) 55 (22.3)
    T2 142 (57.5) 145 (58.7)
    T3 37 (15.0) 33 (13.4)
    T4 9 (3.6) 14 (5.7)
   N stage 0.72
    N0 154 (62.9) 162 (66.7)
    N1 69 (28.2) 58 (23.9)
    N2 20 (8.2) 20 (8.2)
    N3 2 (0.8) 3 (1.2)
    Missing 2 (0.8) 4 (1.6)
   M stage >0.99
    M0 198 (98.5) 208 (98.1)
    M1 3 (1.5) 4 (1.9)
    Missing 46 (19) 35 (14)
   Smoking history 0.66
    Non-smoker 8 (3.3) 10 (4.1)
    Smoker 231 (96.7) 233 (95.9)
    Missing 8 (3.2) 4 (1.6)
   ECOG performance status 0.046
    0 32 (30.5) 37 (43.0)
    1 51 (48.6) 40 (46.5)
    2 20 (19.0) 6 (7.0)
    3 2 (1.9) 3 (3.5)
    Missing 142 (57.5) 161 (65.2)
   Chemotherapy 0.71
    No 53 (40.8) 47 (43.1)
    Yes 77 (59.2) 62 (56.9)
    Missing 117 (47) 138 (56)
   Radiation therapy 0.12
    No 73 (70.9) 56 (60.2)
    Yes 30 (29.1) 37 (39.8)
    Missing 144 (58.3) 154 (62.3)
   Targeted mutation 0.24
    Wildtype 159 (100.0) 150 (98.7)
    Mutated 0 2 (1.3)
    Missing 88 (35.6) 95 (38.5)
   Pre-sampling treatment >0.99
    No 244 (98.8) 243 (98.4)
    Yes 3 (1.2) 4 (1.6)
GEO validation cohort (GSE73403) (N=69)
   Age (years) 59 [52–66] 62 [54–69] 0.34
   Age group 0.23
    ≤ median 20 (58.8) 15 (42.9)
    > median 14 (41.2) 20 (57.1)
   Gender 0.36
    Female 3 (8.8) 1 (2.9)
    Male 31 (91.2) 34 (97.1)
   Pathological stage 0.21
    I–II 20 (58.8) 26 (74.3)
    III–IV 14 (41.2) 9 (25.7)
   T stage 0.27
    T1 1 (2.9) 3 (8.6)
    T2 19 (55.9) 23 (65.7)
    T3 11 (32.4) 9 (25.7)
    T4 3 (8.8) 0
   N stage 0.74
    N0 16 (47.1) 19 (54.3)
    N1 8 (23.5) 9 (25.7)
    N2 10 (29.4) 7 (20.0)
    N3 0 0
   M stage >0.99
    M0 34 (100.0) 35 (100.0)
    M1 0 0
   Smoking hisory 0.34
    Non-smoker 7 (20.6) 4 (11.4)
    Smoker 27 (79.4) 31 (88.6)

Continuous variables are presented as median [Q1–Q3] and compared by Wilcoxon rank-sum test; Categorical variables are presented as n (%) and compared by Pearson’s Chi-squared test or Fisher’s exact test. Missing values are counted but excluded from percentage denominators. LUSC, lung squamous cell carcinoma; M, metastasis; N, node; T, tumor; TCGA, The Cancer Genome Atlas.

Figure 2 Construction of the circadian-immune candidate gene set. (A) Scale-free topology assessment for soft-thresholding power (β) selection in WGCNA. (B) Correlation between GS and MM within the MEbrown module (immune-associated). (C) Hierarchical cluster dendrogram of co-expression modules identified in the TCGA-LUSC cohort. (D) Module-trait relationship heatmap displaying correlations between module eigengenes and ESTIMATE-derived microenvironment scores (Immune, Stromal, and ESTIMATE scores). (E) Forest plot of univariate Cox regression analysis for CIGPS genes significantly associated with overall survival. *, P<0.05; **, P<0.01; ***, P<0.001 by univariate Cox regression. CI, confidence interval; CIGPS, circadian-immune-related gene prognostic signature; GS, gene significance; LUSC, lung squamous cell carcinoma; MM, module membership; TCGA, The Cancer Genome Atlas; WGCNA, weighted gene co-expression network analysis.

Construction and validation of the CIGPS

An ensemble machine learning framework integrating ten distinct algorithms was applied to the circadian-immune candidates to derive the optimal prognostic model, yielding a six-genes (CLDN1, FGG, ALPL, TREM1, NKX2-1, and CLDN5) constituting the CIGPS (Figure S3A, Table S1).

Risk scores were calculated for all TCGA-LUSC patients, with stratification into high-risk (HRG) and low-risk groups (LRG) based on the median value from the discovery cohort. Kaplan-Meier survival analysis revealed significantly inferior OS in the HRG compared to the LRG (P<0.001; Figure 3A). External validation confirmed the prognostic robustness of the CIGPS across independent cohorts. In the GSE73403 dataset, significant survival discrimination persisted between high-risk and low-risk groups (P<0.01; Figure 3B), with time-dependent AUCs of 0.861, 0.647, and 0.681 for 1-, 3-, and 5-year OS, respectively (Figure S3B).

Figure 3 Construction and evaluation of the CIGPS. (A) Kaplan-Meier survival curves comparing overall survival between high-risk and low-risk groups in the TCGA-LUSC cohort. (B,C) External validation of the CIGPS in the GSE73403 (B) cohort. (C) Risk score distribution, survival status, and heatmap of the six signature genes across risk groups in the TCGA cohort. (D) Time-dependent ROC curves predicting 1-, 3-, and 5-year overall survival in the TCGA cohort. (E) Performance comparison of 117 machine learning models based on average C-index across training and validation cohorts. AUC, area under the curve; CI, confidence interval; CIGPS, circadian-immune-related gene prognostic signature; FPR, false positive rate; LUSC, lung squamous cell carcinoma; OS, overall survival; TCGA, The Cancer Genome Atlas; TPR, true positive rate.

Risk score distributions demonstrated a positive correlation between escalating risk scores and mortality events (Figure 3C). Time-dependent ROC analysis confirmed robust predictive accuracy for 1-, 3-, and 5-year OS (AUC =0.722, 0.781, and 0.792, respectively; Figure 3D). Among 117 algorithmic combinations, The StepCox (bidirectional)-GBM hybrid demonstrated the highest averaged C-index across both the training data and the pooled external validation cohort (Figure 3E) and was selected as the definitive modeling strategy.

Multivariate Cox proportional hazards regression, adjusting for age, sex, and pathologic stage, identified the CIGPS-derived risk score as an independent prognostic indicator (HR =8.60, 95% CI: 5.92–12.49, P<0.001) after adjustment for clinical covariates (Figure 4A-4D).

Figure 4 Prognostic nomogram construction and validation. (A-C) Time-dependent ROC curves assessing predictive accuracy for 1-, 3-, and 5-year overall survival in the TCGA-LUSC discovery cohort (A,B) and pooled external validation sets (C). (D) Forest plot of multivariate Cox regression analysis depicting the independent prognostic value of the CIGPS risk score relative to clinicopathological covariates. (E) Prognostic nomogram integrating the CIGPS risk score with age and TNM stage for individualized survival prediction. (F) Calibration curves evaluating agreement between nomogram-predicted and observed 1-, 3-, and 5-year survival probabilities. (G,H) DCA quantifying clinical net benefit of the nomogram compared to alternative management strategies across threshold probabilities. ***, P<0.001. CIGPS, circadian-immune-related gene prognostic signature; DCA, decision curve analysis; FPR, false positive rate; LUSC, lung squamous cell carcinoma; TCGA, The Cancer Genome Atlas; TPR, true positive rate.

To further exclude the potential confounding effect of prior therapy on gene expression profiles, we performed a sensitivity analysis excluding the 7 patients with documented preoperative systemic treatment. The prognostic discrimination of the CIGPS remained virtually unchanged in this treatment-naïve cohort (n=487; C-index: 0.716 vs. 0.717 in the full cohort), with persistent significant survival separation between risk groups (log-rank P<0.001; Figure S3C).

To facilitate individualized risk assessment, a prognostic nomogram integrating the CIGPS risk score with key clinicopathological variables (age, gender and pathologic stage) was constructed (Figure 4E). Calibration curves demonstrated strong concordance between predicted and observed survival probabilities (Figure 4F). Decision curve analysis (DCA) further indicated superior clinical net benefit of the combined nomogram relative to reliance on individual clinical parameters alone (Figure 4G).

We examined the expression of CIGPS genes using the HPA database, focusing on immunohistochemistry images of protein expression in LUSC samples compared to normal lung tissue (Figure S4A). Among these genes, all exhibited protein expression levels in normal tissue that were greater than or equal to those observed in LUSC tissue except CLDN1. This finding is consistent with the transcriptional trends identified in the TCGA-LUSC dataset (Figure S4B).

Molecular subtyping via consensus clustering reveals distinct CIGPS-based phenotypes

Consensus clustering of the 6-gene CIGPS expression profiles delineated two distinct molecular subtypes within the TCGA-LUSC cohort [Cluster 1 (C1), n=320; Cluster 2 (C2), n=181]. Optimal partitioning at k=2 was supported by CDF and consensus matrix analyses (Figure 5A,5B; Figure S5A-S5D). Principal component analysis (PCA) confirmed marked transcriptomic divergence between C1 and C2 subtypes (Figure 5C). At the molecular level, all six signature genes exhibited significant differential expression between subtypes (P<0.01; Figure 5D), substantiating the biological coherence of this classification. Sankey diagram analysis revealed robust concordance between molecular subtypes and CIGPS-defined risk stratification, with the low-risk group predominantly mapping to C1 and the high-risk group to C2 in the TCGA-LUSC cohort (Figure 5E). Clinically, C1 patients demonstrated significantly prolonged OS compared with C2 (P<0.001; Figure 5F).

Figure 5 Molecular subtyping based on the 6-gene CIGPS. (A) Consensus clustering matrix for k=2 in the TCGA-LUSC cohort demonstrating cluster stability. (B) CDF plots for determining the optimal cluster number (k=2). (C) Three-dimensional PCA visualization showing distinct separation between C1 and C2 subtypes. (D) Boxplot comparing CIGPS gene expression levels between C1 and C2 subtypes in the TCGA cohort. (E) Sankey diagram illustrating concordance between molecular subtypes, risk groups (LRG/HRG), and clinicopathological features. (F) Kaplan-Meier survival curves comparing overall survival between C1 and C2 molecular subtypes among TCGA cohort. **, P<0.01; ***, P<0.001. PCA, principal component analysis. CDF, cumulative distribution function; CIGPS, circadian-immune-related gene prognostic signature; HRG, high-risk group; LRG, low-risk group; LUSC, lung squamous cell carcinoma; TCGA, The Cancer Genome Atlas.

Circadian phase characteristics were inferred for TCGA-LUSC samples utilizing the TimeTeller package (24). Comparative analyses between molecular subtypes (C1 versus C2) and risk stratifications (high-risk versus low-risk groups) revealed estimated phase discrepancies of approximately one hour (Figure S6A,S6B). However, these differences were substantially smaller than the inherent prediction uncertainty (>2 hours), suggesting that the observed phase variations are likely negligible and lack biological significance.

Somatic mutation profiling of the TCGA-LUSC cohort revealed distinct mutational landscapes between C1 and C2 subtypes (Figure 6; Figure S7). The predominant somatic alterations in LUSC comprised missense mutations, nonsense mutations, frameshift deletions, splice-site mutations, frameshift insertions, and in-frame deletions, with missense variants representing the most prevalent category. Single nucleotide variants (SNVs) constituted the primary variant type, frequently accompanied by deletion (DEL) events. Among SNVs, C>T transitions predominated in C1, whereas C>A transversions were most prevalent in C2.

Figure 6 Somatic mutation landscape of CIGPS genes in TCGA-LUSC. (A,B) Comprehensive overview of somatic alteration profiles in Cluster 1 (A) and Cluster 2 (B) subtypes. (C,D) Waterfall plots depicting mutation patterns across CIGPS constituent genes in C1 (C) and C2 (D). (E,F) Transition and transversion (Ti/Tv) spectra for C1 (E) and C2 (F), with stacked bars indicating the relative proportion of each nucleotide substitution class. CIGPS, circadian-immune-related gene prognostic signature; LUSC, lung squamous cell carcinoma; TCGA, The Cancer Genome Atlas; TMB, tumor mutation burden.

All six CIGPS constituent genes exhibited somatic mutations within the TCGA-LUSC cohort. Mutational frequency varied significantly between subtypes: within C1, 15 samples (4.45%) harbored mutations in CIGPS genes, whereas C2 exhibited a comparatively elevated mutation frequency, with 10 samples (5.18%) affected. Notably, TREM1 exhibited the highest mutational prevalence among signature genes, detected in 1% of C1 samples and 3% of C2 samples (Figure 6C,6D). The mutational landscape across TCGA-LUSC samples was visualized via stacked bar plots (Figure 6E,6F). TMB analysis revealed slight numerical differences between subtypes (C1: 3.62; C2: 3.96 mutations/Mb) (Figure S7).

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of the six CIGPS constituent genes revealed significant functional enrichment in biological processes encompassing regulation of transforming growth factor-beta (TGF-β) response, tissue homeostasis maintenance, and bicellular tight junction assembly; cellular components comprising lateral plasma membrane, bicellular tight junctions, and apical junction complexes; and molecular functions associated with intronic transcription regulatory region sequence-specific DNA binding. KEGG pathway analysis further implicated leukocyte transendothelial migration and cell adhesion molecule (CAM) interactions (Figure S8A-S8D).

Gene set variation analysis (GSVA) revealed distinct immunophenotypic and proliferative profiles between molecular subtypes (Figure S8B; available online: https://cdn.amegroups.cn/static/public/tcr-2026-1-0302-3.xlsx). Specifically, C1 exhibited significant enrichment in adaptive immune cell pathways, encompassing antibody-secreting B-cell activity, cytotoxic T-lymphocyte function, and DC regulation. Conversely, C2 demonstrated predominant activation of DNA replication and epigenetic machinery, including DNA strand elongation, origin recognition complex (ORC) assembly at replication origins, and DNA methylation processes (Figure S8E,S8F).

Association of molecular subtypes with immune microenvironment and therapy response

Deconvolution of bulk transcriptomic profiles via CIBERSORT and ESTIMATE algorithms revealed marked heterogeneity in immune infiltration patterns between C1 and C2 subtypes (Figure 7A,7B). C1 exhibited elevated cytotoxic CD8+ T-cell infiltration and M1 macrophage polarization, suggestive of activated immune effector states. Conversely, C2 was characterized by resting memory CD4+ T cells, protumorigenic M2 macrophages, and neutrophil accumulation. Furthermore, Spearman correlation analysis revealed distinct associations between individual CIGPS constituent gene expression and immune cell infiltration levels (Figure 7C).

Figure 7 Immune infiltration landscapes of CIGPS-defined molecular subtypes. (A) Compositional analysis of 22 immune cell subsets illustrating distinct infiltration patterns between C1 and C2 subtypes in the TCGA-LUSC cohort. (B) Comparative analysis of differentially abundant immune populations, with violin plots depicting the distribution of infiltrating cell types. (C) Spearman correlation matrix between CIGPS constituent gene expression and immune cell infiltration levels. *, P<0.05; **, P<0.01; ***, P<0.001. CIGPS, circadian-immune-related gene prognostic signature; LUSC, lung squamous cell carcinoma; TCGA, The Cancer Genome Atlas.

ESTIMATE algorithm-derived quantification confirmed significantly higher immune scores, stromal scores, and composite ESTIMATE scores in C2, concomitant with lower tumor purity, indicating extensive immune and stromal infiltration in this subtype (Figure 8A-8D). However, detailed analysis of T-cell functional states revealed paradoxical features: C1 displayed lower T-cell dysfunction scores yet significantly higher T-cell exclusion scores, suggesting intact cytotoxic potential but substantial physical barriers to immune infiltration (Figure 8E,8F). Conversely, C2 demonstrated markedly elevated expression of human leukocyte antigen (HLA) genes (Figure 8G), indicating enhanced antigen presentation machinery, alongside differential expression patterns of ICGs and immunogenic cell death modulators (Figure S9). These findings delineate two distinct immunological contexts: C1 harbors activated cytotoxic compartments but faces exclusion-mediated immune evasion, whereas C2 exhibits an extensively infiltrated, antigen-rich but potentially immunosuppressed microenvironment.

Figure 8 Immune microenvironment characterization and evasion mechanisms. (A-D) Violin plots comparing ESTIMATE-derived metrics between C1 and C2 subtypes: (A) Immune score, (B) Stromal score, (C) ESTIMATE composite score, and (D) Tumor purity. (E) T cell dysfunction scores reflecting cytotoxic T lymphocyte functional states. (F) T cell exclusion scores indicating physical barriers to immune cell infiltration. (G) Differential expression of HLA genes between molecular subtypes. *, P<0.05; **, P<0.01; ***, P<0.001. HLA, human leukocyte antigen.

Drug sensitivity analysis was performed across all 198 agents in the GDSC2 database. Among the 16 compounds selected for focused presentation, all exhibited statistically significant differential sensitivity between C1 and C2 subtypes. Cluster 1 (C1) demonstrated significantly higher predicted sensitivity to standard first-line cytotoxic agents, including platinum-based chemotherapy (cisplatin), taxanes (docetaxel, paclitaxel), gemcitabine, and vinorelbine, as well as EGFR tyrosine kinase inhibitors (erlotinib, gefitinib, afatinib). Conversely, C2 showed significantly higher predicted sensitivity to specific targeted compounds including selumetinib and ribociclib. Differential drug sensitivity for these 16 compounds is visualized in Figure S1.

Expression and functional exploration of key genes at single-cell resolution

Examination of CIGPS constituent genes revealed heterogeneous spatial distribution patterns across major cell types (Figure 9A). Following quality control and unsupervised clustering, analysis of the scRNA-seq dataset GSE148071 resolved the LUSC TME into distinct major cell populations (Figure 9B). Feature plots further localized representative signature genes to specific microenvironmental compartments: CLDN1 exhibited specific enrichment in epithelial cell clusters, TREM1 was predominantly localized to monocyte and macrophage populations, and ALPL and CLDN5 displayed endothelial cell-restricted expression. (Figure 9A,9C-9F). These distinct compartmentalization patterns suggest that the prognostic signature captures heterogeneous biological processes encompassing tumor epithelial integrity, myeloid inflammatory states, and angiogenic features, alongside ubiquitous stromal or secretory signals.

Figure 9 Single-cell RNA sequencing analysis of CIGPS genes. (A) Dot plot showing the expression distribution of signature genes across major cell types. (B) UMAP visualization of the LUSC tumor microenvironment, colored by annotated cell types. (C-F) UMAP feature plots visualizing the expression of representative signature genes. CIGPS, circadian-immune-related gene prognostic signature; LUSC, lung squamous cell carcinoma; UMAP, Uniform Manifold Approximation and Projection.

Discussion

Summary of main findings

By interrogating the intersection of circadian regulation and immune surveillance in LUSC, we derived a compact six-gene prognostic classifier (CIGPS) through ensemble machine learning. To the best of our knowledge, this study is the first to construct and apply a CIGPS to LUSC patients. Unlike expansive transcriptional signatures that risk overfitting, this parsimonious model—comprising CLDN1, FGG, ALPL, TREM1, NKX2-1, and CLDN5—demonstrated robust discrimination of survival outcomes across independent cohorts. Beyond individual risk stratification, unsupervised partitioning revealed two biologically distinct entities: C1, characterized by cytotoxic lymphocyte enrichment coupled paradoxically with physical exclusion barriers, and C2, exhibiting predominant activation of DNA replication and epigenetic regulatory programs alongside functionally exhausted immune surveillance despite preserved antigen presentation capacity. Single-cell resolution further mapped these genes onto discrete microenvironmental niches—epithelial barriers, myeloid infiltrates, and vascular endothelium—suggesting that circadian disruption manifests through compartment-specific mechanisms in LUSC.

Biological significance of genes in CIGPS

NKX2-1 functions as a lineage-specific transcription factor and tumor suppressor significantly downregulated in LUSC; mechanistic investigations demonstrate that NKX2-1 directly binds to the AKR1B10 promoter region (-394 to -388 bp) containing the core motif CACTTGA, thereby suppressing AKR1B10 transcription (33). This transcriptional repression of AKR1B10 underlies NKX2-1-mediated suppression of tumor cell proliferation, migration, and invasion, as evidenced by functional rescue experiments. CLDN1, a tight junction protein, exhibits markedly elevated expression in LUSC compared to LUSC (34). Functionally, CLDN1 promotes cisplatin resistance in NSCLC by interacting with ULK1 and upregulating its Ser555 phosphorylation, thereby activating cytoprotective autophagy (35). Conversely, CLDN5 is an endothelial-specific tight junction protein significantly downregulated in LUSC, and its loss compromises vascular barrier integrity to facilitate tumor cell intravasation and metastasis (36,37). FGG is a coagulation component markedly upregulated in LUSC and serves as an independent prognostic factor for poor survival, promoting fibrin deposition and establishing an immunosuppressive TME (38). TREM1 is a myeloid cell receptor predominantly expressed on PMN-MDSCs in the cancer microenvironment, facilitating tumor immune evasion by amplifying immunosuppressive functions (39,40). ALPL demonstrates context-dependent roles in lung cancer biology; while its tissue expression is downregulated in LUSC, where it suppresses metastasis via the p-ERK/c-Myc/RhoA axis (41), elevated serum ALPL levels serve as an independent prognostic biomarker in NSCLC (42).

Model and clinical implications

Our analysis provided an in-depth characterization of the immune context associated with CIGPS. C1’s relative immune-”hot” designation belies a fundamental therapeutic challenge: despite abundant CD8+ T cell infiltrates, elevated exclusion scores indicate sequestration behind intact physical barriers. This “excluded” phenotype aligns with sensitivity to cytotoxic agents—docetaxel and cisplatin—suggesting that pharmacological penetration may overcome spatial immune exclusion where checkpoint inhibitors fail.

Conversely, C2 exemplifies a “modified” immune desert: while mutational burden suggests neoantigen availability and HLA upregulation indicates intact antigen presentation machinery, T cell dysfunction scores reveal exhausted or anergic cytotoxic populations. This biological state—antigen-rich yet response-poor—explains the selective vulnerability to cell cycle inhibitors (ribociclib) and MEK blockade (selumetinib) rather than immunotherapy. These findings suggest that the CIGPS may serve as a supplementary indicator for risk and treatment stratification, offering preliminary insights into subtype-specific drug sensitivities that complement existing clinicopathological assessment.

It is critical to contextualize these drug sensitivity predictions within the known limitations of the GDSC2 platform. First, this database derives from in vitro cell line screening and therefore cannot evaluate agents whose efficacy depends on the intact tumor immune microenvironment, most notably ICIs. We addressed this gap through TIDE and IPS analyses (Figure 8), which predicted differential immunotherapy responsiveness between subtypes. Second, carboplatin—commonly used as an alternative to cisplatin in clinical practice to reduce nephrotoxicity—is not profiled in GDSC2; we therefore used cisplatin as the representative platinum backbone. Third, while selumetinib and ribociclib are not currently indicated for LUSC, their subtype-selective sensitivity profiles highlight hypothesis-generating druggable vulnerabilities in C2 that may warrant exploration in preclinical models or basket trials for chemotherapy-refractory disease.

Methodological constraints in circadian inference

Circadian rhythms govern the expression of over a quarter of the human genome across tissues. In this context, the lung ranks among the tissues with a higher proportion of rhythmic transcripts, underscoring the potential significance of circadian regulation in pulmonary physiology and, by extension, in LUSC. A major obstacle in studying this relationship retrospectively is the lack of sample collection time metadata in most transcriptomic datasets, which introduces an unmeasured confounding variable. To address this, we employed a machine learning-based approach to infer circadian time and mitigate its potential bias. While studies have confirmed that circadian rhythms can be quantified from single timepoint samples of normal lung tissue, these same analyses reveal that LUSC tissues exhibit markedly dampened circadian amplitude compared to their normal counterparts. This inherent dysregulation of the core clock in tumors necessarily constrains the performance of any in-silico phase inference algorithm. This specific methodological consideration for circadian rhythm analysis constitutes one of the several limitations of our study, which we comprehensively address in the following section

Limitations and future perspectives

The present findings remain subject to constraints inherent in the retrospective utilization of publicly archived transcriptomic repositories. The absence of standardized sample acquisition protocols—particularly regarding tissue procurement timing, ischemia duration, and patient chronotype metadata—introduces unquantifiable variability that may obscure genuine circadian expression signals or, conversely, generate spurious associations. While the ensemble machine learning approach mitigates overfitting risks inherent in high-dimensional genomic data, the reliance on bulk tissue profiles necessarily masks intratumoral heterogeneity and rare cell state transitions that might critically influence prognostic trajectories.

Second, as detailed in Table 1, the clinical robustness of our prognostic model is constrained by the absence of several established prognostic variables in the TCGA-LUSC database. While we fully acknowledge that ECOG performance status, systemic inflammatory markers (e.g., LIPI, NLR), detailed treatment histories, and actionable mutation profiles carry substantial prognostic weight in lung cancer, these data were either not systematically collected or suffered from prohibitive missingness in the TCGA repository (ECOG missing in 61.3% of cases; chemotherapy and radiotherapy records missing in 51.6% and 60.3%, respectively; only <0.5% harbored documented EGFR/ALK/KRAS alterations). Inclusion of such incomplete variables would have introduced unacceptable bias, model non-convergence, and severe loss of statistical power. Consequently, our multivariable model was necessarily limited to age, sex, and pathologic stage. Future prospective cohorts with comprehensive clinical metadata are warranted to further refine the model’s translational applicability.

Third, although we delineated the expression patterns of CIGPS genes, their mechanistic contributions within the specific context of LUSC circadian biology remain correlative; functional studies utilizing CRISPR perturbation or patient-derived organoids are required to establish causality.

Fourth, while the vast majority of TCGA-LUSC samples were obtained from treatment-naïve patients undergoing primary surgical resection, a small subset (7/494, 1.4%) had received prior systemic therapy. Although this proportion is too low to introduce systematic bias—and a sensitivity analysis excluding these patients confirmed the robustness of our prognostic model (C-index: 0.716; Figure S2C)—we acknowledge that prior therapy could theoretically alter circadian and immune gene expression patterns in individual cases. Future prospective studies utilizing strictly treatment-naïve cohorts with standardized biospecimen acquisition protocols will be needed to fully exclude this potential confounder.

Additionally, our single-cell analysis, while spatially informative, lacks temporal resolution—critical for verifying whether the six genes retain circadian oscillatory capacity in malignant cells or merely represent “frozen” phase markers of disrupted clocks. Future directions should integrate temporally resolved sampling (e.g., serial biopsies across circadian phases) with spatial transcriptomics to map how circadian gating affects immune cell trafficking in C1 versus C2 microenvironments. Finally, the exploration of chronotherapeutic interventions—timed administration of cisplatin or immune checkpoint blockade stratified by CIGPS risk categories—represents a promising avenue to translate these observational findings into clinical utility.

Future investigations should prioritize prospective collection of temporally annotated biospecimens to disentangle circadian phase effects from baseline expression differences, alongside spatial transcriptomic technologies to map the physical architecture of C1/C2 microenvironments. Functional dissection of the endothelial-epithelial-macrophage crosstalk networks identified herein will be essential to translate observational associations into therapeutic targets. Ultimately, integration of the CIGPS into prospective clinical trials—evaluating its utility in predicting differential responses to chronomodulated chemotherapy versus immune checkpoint blockade—will determine its translational viability beyond bioinformatic discovery.


Conclusions

Through the integration of circadian biology with immune profiling, this study establishes a compact six-gene signature that refines prognostic risk stratification in LUSC. By delineating two molecular subtypes with divergent immune microenvironmental characteristics and drug sensitivity profiles, our findings illuminate the biological heterogeneity underlying patient outcomes beyond conventional clinical staging. The compartment-specific localization of signature constituents to epithelial, myeloid, and endothelial niches further suggests that circadian-immune interactions in LUSC operate through multicellular networks rather than tumor cell-autonomous mechanisms alone.

While prospective validation remains essential before clinical application, these results offer a conceptual framework for understanding how circadian disruption may differentially influence therapeutic vulnerabilities across patient subpopulations. The CIGPS and its associated molecular subtypes may serve as a foundation for future investigations exploring chronobiology-informed patient stratification, potentially informing the design of clinical trials that account for both molecular identity and microenvironmental context in treatment selection.


Acknowledgments

We thank the contributors of the public databases used in this research. We acknowledge the use of DeepSeek V3 for language editing and polishing of this manuscript. This AI tool was not involved in study design, data analysis, or interpretation of results.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2026-1-0302/rc

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2026-1-0302/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2026-1-0302/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Leiter A, Veluswamy RR, Wisnivesky JP. The global burden of lung cancer: current status and future trends. Nat Rev Clin Oncol 2023;20:624-39. [Crossref] [PubMed]
  2. Travis WD, Brambilla E, Burke AP, et al. Introduction to The 2015 World Health Organization Classification of Tumors of the Lung, Pleura, Thymus, and Heart. J Thorac Oncol 2015;10:1240-2. [Crossref] [PubMed]
  3. Herbst RS, Morgensztern D, Boshoff C. The biology and management of non-small cell lung cancer. Nature 2018;553:446-54. [Crossref] [PubMed]
  4. Chen Z, Fillmore CM, Hammerman PS, et al. Non-small-cell lung cancers: a heterogeneous set of diseases. Nat Rev Cancer 2014;14:535-46. [Crossref] [PubMed]
  5. Wang H, Niu X, Jin Z, et al. Immunotherapy resistance in non-small cell lung cancer: from mechanisms to therapeutic opportunities. J Exp Clin Cancer Res 2025;44:250. [Crossref] [PubMed]
  6. Takahashi JS. Transcriptional architecture of the mammalian circadian clock. Nat Rev Genet 2017;18:164-79. [Crossref] [PubMed]
  7. Sulli G, Lam MTY, Panda S. Interplay between Circadian Clock and Cancer: New Frontiers for Cancer Treatment. Trends Cancer 2019;5:475-94. [Crossref] [PubMed]
  8. Wang C, Zeng Q, Gül ZM, et al. Circadian tumor infiltration and function of CD8(+) T cells dictate immunotherapy efficacy. Cell 2024;187:2690-2702.e17. [Crossref] [PubMed]
  9. Xuan W, Khan F, James CD, et al. Circadian regulation of cancer cell and tumor microenvironment crosstalk. Trends Cell Biol 2021;31:940-50. [Crossref] [PubMed]
  10. Li Q, Xia D, Wang Z, et al. Circadian Rhythm Gene PER3 Negatively Regulates Stemness of Prostate Cancer Stem Cells via WNT/β-Catenin Signaling in Tumor Microenvironment. Front Cell Dev Biol 2021;9:656981. [Crossref] [PubMed]
  11. Nobis CC, Dubeau Laramée G, Kervezee L, et al. The circadian clock of CD8 T cells modulates their early response to vaccination and the rhythmicity of related signaling pathways. Proc Natl Acad Sci U S A 2019;116:20077-86. [Crossref] [PubMed]
  12. Wang C, Barnoud C, Cenerenti M, et al. Dendritic cells direct circadian anti-tumour immune responses. Nature 2023;614:136-43. [Crossref] [PubMed]
  13. Karaboué A, Collon T, Pavese I, et al. Time-Dependent Efficacy of Checkpoint Inhibitor Nivolumab: Results from a Pilot Study in Patients with Metastatic Non-Small-Cell Lung Cancer. Cancers (Basel) 2022;14:896. [Crossref] [PubMed]
  14. Santoni M, Molina-Cerrillo J, Massari F, et al. Re: Effect of Immunotherapy Time-of-day Infusion on Overall Survival Among Patients with Advanced Melanoma in the USA (MEMOIR): A Propensity Score-matched Analysis of a Single-centre, Longitudinal Study. Eur Urol 2022;81:623-4. [Crossref] [PubMed]
  15. Colaprico A, Silva TC, Olsen C, et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res 2016;44:e71. [Crossref] [PubMed]
  16. Fan J, Chen Y, Gong Y, et al. Single-cell RNA sequencing reveals potential therapeutic targets in the tumor microenvironment of lung squamous cell carcinoma. Sci Rep 2025;15:10374. [Crossref] [PubMed]
  17. Li S, Shui K, Zhang Y, et al. CGDB: a database of circadian genes in eukaryotes. Nucleic Acids Res 2017;45:D397-403. [Crossref] [PubMed]
  18. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008;9:559. [Crossref] [PubMed]
  19. Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43:e47. [Crossref] [PubMed]
  20. Liu H, Zhang W, Zhang Y, et al. Mime: A flexible machine-learning framework to construct and visualize models for clinical characteristics prediction and feature selection. Comput Struct Biotechnol J 2024;23:2798-810. [Crossref] [PubMed]
  21. Harrell FE Jr. Regression Modeling Strategies. New York: Springer, 2001.
  22. Yu G, Wang LG, Han Y, et al. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 2012;16:284-7. [Crossref] [PubMed]
  23. Mayakonda A, Lin DC, Assenov Y, et al. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res 2018;28:1747-56. [Crossref] [PubMed]
  24. Vlachou D, Veretennikova M, Usselmann L, et al. TimeTeller: A tool to probe the circadian clock as a multigene dynamical system. PLoS Comput Biol 2024;20:e1011779. [Crossref] [PubMed]
  25. Newman AM, Steen CB, Liu CL, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol 2019;37:773-82. [Crossref] [PubMed]
  26. Yoshihara K, Shahmoradgoli M, Martínez E, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun 2013;4:2612. [Crossref] [PubMed]
  27. Jiang P, Gu S, Pan D, et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat Med 2018;24:1550-8. [Crossref] [PubMed]
  28. Charoentong P, Finotello F, Angelova M, et al. Pan-cancer Immunogenomic Analyses Reveal Genotype-Immunophenotype Relationships and Predictors of Response to Checkpoint Blockade. Cell Rep 2017;18:248-62. [Crossref] [PubMed]
  29. Maeser D, Gruener RF, Huang RS. oncoPredict: an R package for predicting in vivo or cancer patient drug response and biomarkers from cell line screening data. Brief Bioinform 2021;22:bbab260. [Crossref] [PubMed]
  30. Hao Y, Hao S, Andersen-Nissen E, et al. Integrated analysis of multimodal single-cell data. Cell 2021;184:3573-3587.e29. [Crossref] [PubMed]
  31. Germain PL, Lun A, Garcia Meixide C, et al. Doublet identification in single-cell sequencing data using scDblFinder. F1000Res 2021;10:979. [Crossref] [PubMed]
  32. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2023.
  33. Yu D, Liu P, Gao R, et al. NKX2-1 Restricts the Growth and Metastasis of Lung Squamous Cell Carcinoma Through Transcriptive Suppression of AKR1B10. J Biochem Mol Toxicol 2025;39:e70507. [Crossref] [PubMed]
  34. Jung JH, Jung CK, Choi HJ, et al. Diagnostic utility of expression of claudins in non-small cell lung cancer: different expression profiles in squamous cell carcinomas and adenocarcinomas. Pathol Res Pract 2009;205:409-16. [Crossref] [PubMed]
  35. Zhao Z, Li J, Jiang Y, et al. CLDN1 Increases Drug Resistance of Non-Small Cell Lung Cancer by Activating Autophagy via Up-Regulation of ULK1 Phosphorylation. Med Sci Monit 2017;23:2906-16. [Crossref] [PubMed]
  36. Ma SC, Li Q, Peng JY, et al. Claudin-5 regulates blood-brain barrier permeability by modifying brain microvascular endothelial cell proliferation, migration, and adhesion to prevent lung cancer metastasis. CNS Neurosci Ther 2017;23:947-60. [Crossref] [PubMed]
  37. Selim MS, Matani BR, Henry-Ojo HO, et al. Claudin 5 Across the Vascular Landscape: From Blood-Tissue Barrier Regulation to Disease Mechanisms. Cells 2025;14:1346. [Crossref] [PubMed]
  38. Wu R, Ma R, Duan X, et al. Identification of specific prognostic markers for lung squamous cell carcinoma based on tumor progression, immune infiltration, and stem index. Front Immunol 2023;14:1236444. [Crossref] [PubMed]
  39. Ajith A, Mamouni K, Horuzsko DD, et al. Targeting TREM1 augments antitumor T cell immunity by inhibiting myeloid-derived suppressor cells and restraining anti-PD-1 resistance. J Clin Invest 2023;133:e167951. [Crossref] [PubMed]
  40. Cai Y, Li S, Li H, et al. Pan-cancer analysis reveals TREM1(+) PMN-MDSCs as critical regulators of immune suppression and tumor microenvironment remodeling. Commun Biol 2025;9:75. [Crossref] [PubMed]
  41. Lou Z, Lin W, Zhao H, et al. Alkaline phosphatase downregulation promotes lung adenocarcinoma metastasis via the c-Myc/RhoA axis. Cancer Cell Int 2021;21:217. [Crossref] [PubMed]
  42. Yang T, Cheng J, Fu S, et al. Pretreatment levels of serum alkaline phosphatase are associated with the prognosis of patients with non-small cell lung cancer receiving immune checkpoint inhibitors. Oncol Lett 2023;25:154. [Crossref] [PubMed]
Cite this article as: Lu Y, Li H, Wang S, Yang F, Xiao S, Liu Y, Liu Q, Han X, Hu Y. Circadian-immune-related gene signature for lung squamous cell carcinoma: machine learning and multi-omics analysis. Transl Cancer Res 2026;15(5):431. doi: 10.21037/tcr-2026-1-0302

Download Citation