Chromatin remodeling-driven stratification and a five-gene RiskScore define proliferative aggressiveness and therapeutic vulnerabilities in neuroblastoma

Aiguo Zhu; Xin Li; Zhengchao Jin; Yi Pan; Jian Wang

doi:10.21037/tcr-2026-1-0195

Original Article

Chromatin remodeling-driven stratification and a five-gene RiskScore define proliferative aggressiveness and therapeutic vulnerabilities in neuroblastoma

Aiguo Zhu^1,2#, Xin Li^1,2#, Zhengchao Jin^1,2, Yi Pan^1,2, Jian Wang^1,2

¹Tianjin Cancer Hospital Airport Hospital, National Clinical Research Center for Cancer, Tianjin, China; ²Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin, China

Contributions: (I) Conception and design: Z Jin, J Wang, Y Pan; (II) Administrative support: J Wang; (III) Provision of study materials or patients: A Zhu, X Li; (IV) Collection and assembly of data: A Zhu, X Li; (V) Data analysis and interpretation: A Zhu, X Li, J Wang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work as co-first authors.

Correspondence to: Zhengchao Jin, MD; Yi Pan, MD; Jian Wang, MD. Tianjin Cancer Hospital Airport Hospital, National Clinical Research Center for Cancer, Tianjin, China; Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, East Five Road, Dongli District, Tianjin 300060, China. Email: tzkg_jinzc@foxmail.com; tjphyLP@126.com; wang5862@163.com.

Background: Neuroblastoma is clinically heterogeneous, and epigenetic dysregulation—particularly chromatin remodeling—may organize aggressive tumor states and inform prognosis and therapeutic opportunities. This study aimed to define chromatin remodeling-associated neuroblastoma subtypes and develop a robust, biologically interpretable prognostic risk score derived from subtype-associated transcriptional changes, with external validation and functional characterization.

Methods: A curated chromatin remodeling gene set was analyzed in RNA sequencing (RNA-seq) cohorts. Consensus clustering defined molecular subtypes in GSE62564 (n=498). Differential expression between subtypes [adjusted P value <0.05, |log fold change (FC)| ≥1] were screened by least absolute shrinkage and selection operator (LASSO) Cox regression. Collinearity control and multivariable Cox refinement yielded a five-gene signature. The RiskScore was evaluated by Kaplan-Meier analysis and time-dependent receiver operating characteristic (ROC) in GSE62564 and externally validated in GSE181559 (n=97) and FWR144 (n=144). Pathway analyses, transcription factor (TF) activity inference, and in silico drug sensitivity prediction were performed to support biological interpretation and therapeutic hypotheses.

Results: Chromatin remodeling-based clustering stratified patients into prognostically distinct subtypes. The five-gene RiskScore robustly discriminated overall survival (OS) and achieved favorable time-dependent area under the curve (AUC) in the development cohort, remaining prognostic in both external cohorts. High-RiskScore tumors exhibited enhanced cell-cycle and DNA repair-related programs, consistent with a proliferative tumor-intrinsic aggressive phenotype. Drug sensitivity prediction suggested candidate vulnerabilities concordant with these mechanisms.

Conclusions: Chromatin remodeling programs define reproducible neuroblastoma subtypes and support a stable five-gene prognostic RiskScore with external validation, linking aggressive biology to mechanistic pathways and therapeutic hypotheses.

Keywords: Neuroblastoma; chromatin remodeling; prognostic signature; risk score; cell cycle

Submitted Jan 22, 2026. Accepted for publication Mar 10, 2026. Published online Mar 27, 2026.

doi: 10.21037/tcr-2026-1-0195

Highlight box

Key findings

• Consensus clustering of chromatin remodeling-related genes defined two neuroblastoma subtypes with distinct prognosis and proliferative transcriptional programs.

• A parsimonious five-gene RiskScore was constructed from subtype-associated differentially expressed genes using least absolute shrinkage and selection operator (λ_1se) followed by multivariable Cox modeling, and it robustly stratified overall survival in the training cohort and two external RNA-seq cohorts.

• Integrated functional and therapeutic inference linked the high-risk state to cell-cycle/E2F-MYC activity and predicted vulnerabilities enriched in cell-cycle/replication stress, DNA damage response, and epigenetic/chromatin-targeting drug classes.

What is known and what is new?

• Neuroblastoma heterogeneity and proliferation-associated programs contribute to aggressive disease, yet clinically tractable biomarkers that connect chromatin remodeling to biology and treatment opportunities remain limited.

• This study proposes a chromatin remodeling-driven stratification framework and a downstream five-gene prognostic signature derived from subtype-associated transcriptional differences, which couples prognostic risk with mechanistic hallmarks and drug-class hypotheses across multiple RNA-seq cohorts.

What is the implication, and what should change now?

• The five-gene RiskScore provides an interpretable tool for risk refinement and for prioritizing mechanism-consistent therapeutic strategies (DNA damage response/replication stress, cell-cycle, epigenetic/chromatin agents), motivating prospective validation and experimental testing for clinical translation.

Introduction

Neuroblastoma is a pediatric malignancy arising from the developing sympathetic nervous system and is characterized by pronounced clinical and biological heterogeneity (1,2). Current risk-stratification frameworks integrate clinicopathologic variables with genomic features; however, outcomes remain highly divergent, particularly among patients classified as high risk. Despite intensive multimodal therapy, including induction chemotherapy, surgery, radiotherapy, consolidation, and anti-GD2 based immunotherapy, long-term survival for high-risk neuroblastoma remains suboptimal, and relapse or refractory disease continues to account for most neuroblastoma-related deaths (3,4).

A major driver of this heterogeneity is the interplay among oncogenic drivers, lineage state, and epigenetic regulation. Beyond canonical alterations such as MYCN amplification, high-risk neuroblastoma frequently harbors aberrations in chromatin regulators (e.g., ATRX) and exhibits transcriptional programs associated with replication stress, DNA repair, and cell-cycle control (5,6). These observations support the concept that chromatin dysregulation is not merely a bystander event but may act as an organizing principle shaping tumor aggressiveness and therapeutic vulnerability.

Chromatin remodeling complexes and related epigenetic regulators modulate nucleosome positioning, enhancer accessibility, and transcriptional plasticity, thereby enabling cellular state transitions relevant to tumor initiation, progression, and metastasis. Across cancer types, perturbations of SWI/SNF (BAF) and other remodeling machineries can rewire transcriptional dependencies and influence tumor-microenvironment interactions. This provides a biological rationale for why chromatin remodeling signatures may capture aggressive phenotypes more reliably than single-gene markers (7,8).

Nevertheless, translational deployment of epigenetically informed biomarkers in neuroblastoma remains limited. Many published prognostic signatures are derived from transcriptome-wide screening without anchoring to a priori biological processes, which may compromise interpretability and cross-cohort stability. A process-guided framework is therefore needed to identify clinically meaningful subtypes using coherent gene sets, distill these signals into a parsimonious and interpretable RiskScore, and interrogate the associated pathways, and potential drug vulnerabilities to generate testable hypotheses.

In this study, we focused on a curated chromatin remodeling gene set and systematically evaluated its relevance in neuroblastoma across multiple independent RNA sequencing (RNA-seq) cohorts with overall survival (OS) endpoints. We first applied consensus-based unsupervised clustering to define chromatin remodeling associated subgroups, then derived a compact prognostic signature via regularized survival modeling followed by refinement to mitigate collinearity and preserve independent prognostic contributions. Finally, we linked the resulting RiskScore to coordinated transcriptional programs, transcription factor (TF) activity, and predicted pharmacologic sensitivities to support mechanistic interpretability and translational directions. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2026-1-0195/rc).

Methods

Study design

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study aimed to characterize chromatin remodeling-associated transcriptional heterogeneity in neuroblastoma and to develop a biologically grounded, parsimonious prognostic model. Briefly, we performed molecular subtyping using a curated chromatin remodeling gene set, identified subtype-associated differentially expressed genes to define the candidate feature space, and then applied penalized survival modeling followed by collinearity control and multivariable coefficient estimation to derive a five-gene RiskScore. The chromatin remodeling-related gene set (CRRGs; n=49) was not generated de novo in the present study. Instead, it was adopted from a previously published chromatin remodeling-focused work, where these genes were curated as core regulators involved in chromatin remodeling and epigenetic control (9). We used the identical gene list as the feature space for consensus clustering and downstream subtype-based analyses to ensure biological interpretability and reproducibility. The complete list of the 49 genes is provided in Table S1. The model was evaluated in the development cohort and externally validated in independent RNA-seq cohorts using a fixed cutoff transferred from the development cohort. Biological and translational interpretation was conducted through pathway activity inference, TF activity analysis, and in silico drug sensitivity prediction.

Datasets and clinical endpoint

RNA-seq neuroblastoma datasets were curated and processed for model development and external validation. The development cohort was GSE62564 (n=498) (10). Two independent RNA-seq cohorts were used for external validation, including GSE181559 (n=97) and FWR144 (2010fwr144; n=144) (11,12). OS was defined as the interval from diagnosis to death or last follow-up. Death was coded as event =1 and censoring as event =0. OS time was analyzed in days across all cohorts.

RNA-seq preprocessing and sample harmonization

Expression matrices were organized in a gene-by-sample format for downstream analyses. Expression values were log-transformed. Sample identifiers were harmonized between expression matrices and clinical tables, and analyses were restricted to samples with matched expression and survival information. Gene identifiers were handled in gene symbol space when available. When duplicated gene symbols occurred, standard deduplication procedures were applied to retain a single expression value per gene per sample prior to survival modeling and enrichment analyses.

Chromatin remodeling gene set

A curated chromatin remodeling associated gene set consisting of 49 genes was used as the primary feature space for molecular subtyping. This gene set served as the foundation for subtype discovery and the subsequent subtype-informed candidate gene screening.

Consensus clustering and subtype definition

Unsupervised molecular subtyping was performed in the development cohort using the expression of the 49 chromatin remodeling genes. Consensus clustering was implemented with ConsensusClusterPlus using maxK equal to six, reps equal to one thousand, pItem equal to 0.8, pFeature equal to one, Euclidean distance, average inner linkage, k-means clustering algorithm (13). The optimal number of clusters was determined based on consensus stability and interpretability using consensus matrices and stability trends across candidate K values. Dimensionality reduction methods implemented in the analysis scripts were used to visualize subtype separation.

Clinical association and survival analyses of subtypes

Associations between molecular subtypes and key clinical variables were assessed using appropriate categorical or continuous tests depending on data type. Survival differences between subtypes were evaluated using Kaplan-Meier analysis with log-rank testing. Cox proportional hazards regression was performed to evaluate the independent prognostic contribution of subtype assignment after adjustment for available clinical covariates.

Differential expression analysis between subtypes

To identify subtype-associated genes for downstream model construction, differential expression analysis was conducted between the two molecular subtypes in the development cohort using limma (14). Differentially expressed genes were defined by an adjusted P value below 0.05 using the Benjamini-Hochberg procedure and an absolute log fold change (FC) greater than or equal to one.

Construction of the prognostic model and derivation of the five-gene RiskScore

Candidate genes for model building were selected from subtype-associated differentially expressed genes (DEGs) and subjected to penalized Cox regression using least absolute shrinkage and selection operator (LASSO). The tuning parameter λ was selected using the 1-standard-error criterion (λ_1se) to favor a more parsimonious and stable model. To further reduce redundancy and mitigate multicollinearity among selected genes, collinearity screening was performed. The remaining genes were entered into a multivariable Cox regression model to estimate coefficients and define the final five-gene signature. The RiskScore for each patient was computed as the linear predictor (weighted sum) of the expression values multiplied by the multivariable Cox coefficients.

RiskScore calculation and risk stratification

For each patient, a continuous RiskScore was calculated as the linear predictor derived from the multivariable Cox model. Patients were stratified into high-risk and low-risk groups using the median RiskScore of the development cohort as the cutoff. This cutoff was subsequently fixed and transferred to external validation cohorts for consistent risk stratification.

Prognostic performance assessment in the development cohort

Prognostic discrimination was evaluated by Kaplan-Meier survival analysis comparing high-risk and low-risk groups, with statistical significance assessed by the log-rank test. Cox regression was used to estimate the effect size of RiskScore or risk group on OS. Time-dependent receiver operating characteristic (ROC) analysis was performed using timeROC at 1, 3, and 5 years, corresponding to 365, 1,095, and 1,825 days, respectively, and area under the curve (AUC) values were reported (15). RiskScore distribution and outcome patterns were visualized using RiskScore-ranked plots and event or censoring status representations. Proportional hazards assumptions were examined using Schoenfeld residual based diagnostics as implemented in the analysis scripts, and internal stability was explored using resampling-based summaries of discrimination metrics when applicable.

External validation

External validation was performed in GSE181559 and FWR144 using the coefficients derived from the development cohort. Expression matrices were processed to match gene symbols and sample identifiers, and RiskScores were calculated using the same formula. Patients were stratified using the fixed development-cohort cutoff. Prognostic performance was evaluated using Kaplan-Meier analysis and time-dependent ROC analyses in each validation cohort.

Functional interpretation and pathway activity analyses

To interpret biological processes underlying the risk phenotype, differential expression analysis was performed between high-risk and low-risk groups in the development cohort. Enrichment analyses were conducted to identify dysregulated pathways and biological processes, including Gene Ontology (GO) biological process and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. In addition, pathway activity at the sample level was inferred by single-sample gene set enrichment analysis/gene set variation analysis (ssGSEA/GSVA)-based approaches to quantify hallmark and other curated gene set activities and to examine their associations with RiskScore (16,17).

TF activity inference

TF activity was inferred from bulk expression using regulon-based methods. High-confidence TF-target interactions were obtained from DoRothEA, and TF activity scores were computed using a linear model-based approach implemented in decoupleR (18). Cell cycle related TFs were prioritized for downstream interpretation. Associations between TF activities and RiskScore were assessed using Spearman correlation, and groupwise differences between high-risk and low-risk samples were evaluated using Wilcoxon rank-sum testing with false discovery rate correction.

In silico drug sensitivity prediction and candidate prioritization

In silico drug sensitivity prediction was performed using oncoPredict with GDSC2 as the reference training resource (19). Predicted drug response metrics [e.g., half maximal inhibitory concentration (IC₅₀)] were generated at the sample level. Candidate compounds were prioritized based on differential predicted sensitivity between risk groups and consistency with the biological phenotype inferred from pathway and TF analyses. Visualization included volcano-style global screening plots and focused comparisons for representative agents.

Statistical analysis and software

All statistical tests were two-sided. Kaplan-Meier differences were evaluated using log-rank tests, and Cox proportional hazards models were fitted for univariable and multivariable analyses. Multiple testing correction was applied where relevant. Analyses were conducted in R with packages commonly used for survival analysis, differential expression, enrichment analysis, and visualization, as implemented in the accompanying scripts.

Results

Chromatin remodeling related transcriptional programs define two robust neuroblastoma subtypes associated with adverse clinical features and survival

To determine whether chromatin remodeling-associated transcriptional variation captures clinically relevant heterogeneity in neuroblastoma, we performed unsupervised consensus clustering using the curated chromatin remodeling gene set. The consensus matrix demonstrated strong within-cluster agreement and weak between-cluster similarity, supporting a stable two-class solution (Figure 1A). Concordantly, the consensus cumulative distribution function (CDF) curves suggested that clustering stability plateaued at K=2, implying limited incremental robustness with additional clusters (Figure 1B). Dimensionality reduction further showed a clear separation of the two subtypes in transcriptomic space (Figure 1C).

Figure 1 Chromatin remodeling gene set based unsupervised clustering identifies two prognostically distinct neuroblastoma subtypes. (A) Consensus clustering matrix heatmap based on the chromatin remodeling gene set, showing robust separation into two clusters. (B) CDF curves (and corresponding delta area) supporting selection of K=2 as the optimal number of clusters. (C) Dimensionality reduction visualization (t-SNE as shown) demonstrating transcriptional separation between the two clusters. (D) Kaplan-Meier OS curves comparing the two clusters (log-rank test). (E-G) Distribution of key clinicopathologic variables across clusters (including age group, INSS stage, and MYCN status as shown). P values were calculated using χ²/Fisher’s exact test for categorical variables and Wilcoxon test for continuous variables, where applicable. CDF, cumulative distribution function; CI, confidence interval; INSS, International Neuroblastoma Staging System; OS, overall survival; t-SNE, t-distributed stochastic neighbor embedding.

Survival analyses revealed that the two subtypes differed substantially in OS (Figure 1D), indicating that chromatin remodeling-linked transcriptional states carry prognostic information. We next evaluated clinicopathologic correlates and found that the subtype with inferior OS was enriched for adverse clinical features, including a higher frequency of MYCN amplification, an older age distribution, and more advanced stage composition (Figure 1E-1G). Collectively, these results establish a chromatin remodeling-anchored molecular stratification that aligns with both established risk features and patient outcomes.

A five-gene prognostic signature was derived from subtype-associated transcriptional differences via penalized selection, collinearity control, and multivariable Cox coefficient estimation

To translate subtype-level biology into an individualized prognostic tool, we implemented a feature-selection strategy that explicitly linked biology-driven contrasts to survival modeling. We first identified DEGs between the two chromatin remodeling subtypes, thereby constraining subsequent modeling to transcriptional changes most directly reflecting subtype-associated biology. Importantly, the final prognostic signature was not selected directly from the original 49 chromatin remodeling-related genes, but from downstream differentially expressed genes associated with the chromatin remodeling-defined subtypes. This DEG-restricted feature set was then subjected to LASSO-penalized Cox regression with cross-validation to reduce dimensionality while limiting overfitting. The cross-validation curve and coefficient path profiles demonstrated stable penalization behavior and convergence to a sparse candidate set at the selected λ criterion (Figure 2A-2C).

Figure 2 Development and internal evaluation of a five-gene prognostic RiskScore derived from subtype-associated transcriptomic differences. (A) LASSO-penalized Cox regression coefficient trajectories for candidate genes across log(λ). (B) Ten-fold cross-validation curve showing partial likelihood deviance and selection of optimal λ (λ_min and/or λ_1se as displayed). (C) Coefficients of genes selected by the penalized model under the chosen λ, illustrating the sparse candidate set. (D) Final five-gene prognostic signature after collinearity reduction and multivariable Cox modeling. The RiskScore was computed as a weighted linear combination of PGM2L1, ECEL1, ASPM, NCAN, and KIAA1456 using Cox coefficients (formula shown in the Results). (E) Kaplan-Meier OS curves for high- versus low-risk groups stratified by the median RiskScore in the training cohort (log-rank test). (F) Time-dependent ROC curves at 1, 3, and 5 years for the RiskScore in the training cohort. (G) RiskScore distribution and survival status plot in the training cohort, with patients ordered by RiskScore and annotated by event status. AUC, area under the curve; CI, confidence interval; HR, hazard ratio; LASSO, least absolute shrinkage and selection operator; OS, overall survival; ROC, receiver operating characteristic.

Because correlated predictors can yield unstable coefficient estimates and reduce cross-cohort transportability, we further filtered the LASSO-selected candidates to remove collinear genes and reduce redundancy. The remaining genes were entered into multivariable Cox regression to obtain final predictors and coefficients for RiskScore construction. This sequential pipeline yielded a five-gene signature comprising PGM2L1, ECEL1, ASPM, NCAN, and KIAA1456, and the multivariable Cox coefficients defined the continuous RiskScore (Figure 2D). The RiskScore was calculated as:

$\begin{array}{l} RiskScore = (- 0.5647607) \times PGM 2 L 1 + (- 0.1808020) \times ECEL 1 \\ + (0.8049430) \times ASPM + (0.1779951) \times NCAN \\ + (- 0.4127922) \times KIAA 1456 \end{array}$ [1]

Notably, this model should be interpreted as a stability-oriented prognostic readout of the downstream transcriptional consequences associated with chromatin remodeling-defined subtypes, rather than as a panel of canonical chromatin remodeling genes.

The five-gene RiskScore stratifies survival and demonstrates strong time-dependent discrimination in the training cohort

Using the five-gene coefficients, RiskScore was computed for each patient in the training cohort, and patients were stratified into high- versus low-risk groups by the training-cohort median RiskScore (cutoff =−1.584072). Kaplan-Meier analysis demonstrated a pronounced OS separation between the two groups (Figure 2E), indicating that the RiskScore captures clinically meaningful outcome heterogeneity at the individual level. Time-dependent ROC analyses further supported robust prognostic discrimination, with AUCs of 0.852 at 1 year, 0.919 at 3 years, and 0.930 at 5 years (Figure 2F). Visualization of RiskScore distribution and survival status in ordered patients showed that events and shorter survival times clustered toward the high-risk end of the RiskScore continuum (Figure 2G), consistent with a monotonic risk gradient and supporting the internal coherence of the scoring system.

External validation using a fixed training-derived cutoff supports generalizability across independent cohorts

To assess transportability in a stringent and clinically relevant manner, we applied the fixed cutoff derived from the training cohort (median RiskScore =−1.584072) to independent cohorts without re-optimizing the threshold. In GSE181559, this fixed cutoff reproduced clear OS stratification, with substantially inferior survival in the high-risk group (Figure 3A). Discrimination remained strong across time horizons, with AUCs of 0.840 (1 year), 0.929 (3 years), and 0.917 (5 years) (Figure 3B), and the ordered RiskScore/survival status plot showed outcome enrichment in the high-risk region (Figure 3C).

Figure 3 External validation of the five-gene RiskScore using a fixed cutoff transferred from the training cohort. (A) Kaplan-Meier OS curves in the GSE181559 cohort stratified using the fixed training-derived cutoff (training median RiskScore). (B) Time-dependent ROC curves at 1, 3, and 5 years in GSE181559. (C) RiskScore distribution and survival status in GSE181559, with samples ordered by RiskScore. (D) Kaplan-Meier OS curves in the FWR144 cohort stratified using the same fixed cutoff. (E) Time-dependent ROC curves at 1, 3, and 5 years in FWR144. (F) RiskScore distribution and survival status in FWR144. Across external cohorts, no threshold recalibration was performed; the training cutoff was directly applied. AUC, area under the curve; HR, hazard ratio; OS, overall survival; ROC, receiver operating characteristic.

In FWR144, the fixed cutoff also separated patients into distinct prognostic strata (Figure 3D), with time-dependent AUCs of 0.703 (1 year), 0.774 (3 years), and 0.763 (5 years) (Figure 3E), and consistent RiskScore-to-outcome concordance (Figure 3F). Together, these results indicate that the five-gene RiskScore retains prognostic value across independent cohorts and platforms when a training-defined threshold is directly transferred.

Mechanistic analyses link high RiskScore to proliferative programs, replication stress tolerance, and DNA repair activity

To interpret the biological processes captured by the RiskScore, we compared high- versus low-risk tumors using differential expression and pathway-level analyses. Gene set enrichment analysis (GSEA) of GO biological process terms demonstrated enrichment of proliferation-related processes in the high-risk group, including DNA replication, chromosome segregation, mitotic cell cycle progression, and related programs (Figure 4A). Complementary KEGG enrichment highlighted canonical pathways such as cell cycle and DNA replication (Figure 4B), reinforcing that the high-risk phenotype is characterized by proliferative activation and cell-cycle engagement.

Figure 4 Mechanistic characterization links high RiskScore to proliferative programs, DNA replication/repair activity, and cell cycle regulatory signals. (A) GSEA of GO BP terms comparing high- versus low-risk tumors; representative significantly enriched pathways are shown (dot plot). (B) GSEA of KEGG pathways comparing high- versus low-risk tumors (dot plot). (C) ssGSEA/GSVA analyses of curated hallmark pathways; representative proliferation/replication-associated signatures (e.g., E2F targets, G2M checkpoint, MYC targets, DNA repair) are displayed as groupwise comparisons (boxplots). (D) Spearman correlation of hallmark pathway activity scores with continuous RiskScore; top associated pathways are shown. (E) Inferred TF activity (e.g., via regulon-based ULM) for selected cell cycle related TFs, shown as groupwise differences and/or correlations with RiskScore (as displayed). P values were assessed by Wilcoxon test for group comparisons and Spearman correlation for continuous associations; multiple testings were adjusted where applicable. *, P<0.05; ****, P<0.0001. BP, biological process; GO, Gene Ontology; GSEA, gene set enrichment analysis; GSVA, gene set variation analysis; KEGG, Kyoto Encyclopedia of Genes and Genomes; ssGSEA, single-sample gene set enrichment analysis; TF, transcription factor; ULM, univariate linear model.

We then quantified hallmark pathway activity by ssGSEA and examined both between-group differences and associations with RiskScore. Proliferation-linked signatures, including E2F targets, G2M checkpoint, MYC targets, and DNA repair, consistently showed prominent risk-associated patterns (Figure 4C,4D), supporting a coherent mechanistic axis in which high RiskScore reflects a proliferative and replication/repair-engaged state.

To further integrate a regulatory layer, we inferred TF activity using a regulon-based approach and focused on cell cycle-associated regulators. Multiple canonical cell-cycle regulators displayed significant associations with RiskScore and/or between-group activity differences (Figure 4E), providing regulatory support for the enrichment of E2F/G2M-related programs in high-risk tumors. Because regulon-based inference reflects network-level target regulation rather than TF expression alone, specific TF behaviors may not always match expectations derived from pathway labels; thus, regulatory findings were interpreted in the context of the consistent pathway-level proliferation and replication/repair signals.

In silico drug sensitivity prediction prioritizes mechanism-aligned candidate compounds associated with RiskScore

Finally, to translate RiskScore-defined biology into actionable hypotheses, we performed in silico drug sensitivity prediction using GDSC2-based reference models and compared predicted IC₅₀ values between risk groups. A volcano-style summary highlighted compounds exhibiting large between-group differences in predicted sensitivity with statistical support (Figure 5A), enabling systematic prioritization. Because these predictions are learned separately for each compound from reference pharmacogenomic data, drugs sharing a nominal mechanism class (e.g., BET/bromodomain inhibition) may still show discordant predicted susceptibility. Therefore, drug-class annotations were used only for interpretability, and compound-level differences were interpreted cautiously. We then selected representative compounds from mechanism-relevant categories, particularly those related to cell-cycle/replication stress and DNA damage response pathways, to align therapeutic hypotheses with the proliferative and repair-linked risk phenotype. Boxplots of these candidates demonstrated consistent groupwise shifts in predicted IC₅₀ values (Figure 5B), nominating a mechanism-coherent set of candidate vulnerabilities for downstream experimental validation.

Figure 5 In silico drug sensitivity prediction highlights mechanism-aligned candidate compounds associated with RiskScore. (A) Global screening volcano plot summarizing differential predicted drug sensitivity between high- and low-risk groups [e.g., Δ median predicted IC₅₀ and −log₁₀(FDR)]; selected representative drugs are labeled. (B) Boxplots of predicted IC₅₀ values for representative drugs selected for mechanistic interpretability (e.g., cell cycle/replication stress and DNA damage response-related compounds), stratified by RiskGroup. Lower predicted IC₅₀ indicates higher predicted sensitivity. Groupwise differences were tested using Wilcoxon rank-sum test with multiple testing correction where applicable. ****, P<0.0001. DDR, DNA damage response; FDR, false discovery rate; IC₅₀, half maximal inhibitory concentration.

Discussion

Our findings support the view that chromatin remodeling-related transcriptional programs constitute an upstream organizing principle for neuroblastoma heterogeneity and clinical outcome. By first identifying robust molecular subtypes using a curated chromatin remodeling gene set and subsequently distilling subtype-associated signals into a parsimonious prognostic model, this work links biology-driven stratification with patient-level quantification. This framework addresses a frequent limitation of expression-based signatures in neuroblastoma, where models may achieve prognostic separation yet remain only weakly connected to a coherent biological axis, thereby constraining interpretability and undermining mechanistic and therapeutic extrapolation. Anchoring model construction to a defined regulatory theme strengthens biological plausibility and provides a clearer scaffold for downstream interpretation. Importantly, the final five-gene RiskScore should not be interpreted as a panel of canonical chromatin remodeling genes; rather, it represents a compact prognostic surrogate of the downstream transcriptional states associated with chromatin remodeling-defined tumor subtypes.

A central implication is that chromatin remodeling-linked transcriptional states capture clinically meaningful aggressiveness that extends beyond conventional clinicogenetic factors. Neuroblastoma exhibits pronounced inter-patient variability shaped by developmental lineage, oncogenic signaling, and epigenetic dysregulation; contemporary syntheses emphasize that transcriptional and epigenomic programs, together with hallmark genomic events, jointly govern disease trajectory and treatment response (6,7,20). In this context, the unsupervised stratification observed here suggests that chromatin remodeling programs are not merely downstream correlates of disease severity but may reflect higher-order state variables that compress complex biology into clinically actionable subgroups. Accordingly, the prognostic model derived downstream of this stratification should be viewed as a readout of a chromatin remodeling-linked tumor state, rather than as direct evidence that the model genes themselves are core chromatin remodeling machinery components. This interpretation is aligned with broader concepts in pediatric oncology, where relatively modest mutational burdens coexist with prominent transcriptional plasticity and where epigenetic regulators can function as state controllers.

The construction of the five-gene signature also carries a methodological message. Rather than emerging from a single algorithmic step, the final model results from successive constraints designed to favor stability and transportability. Penalized regression reduces dimensionality and curbs overfitting, whereas explicit collinearity control prior to multivariable Cox modeling mitigates coefficient inflation and improves cross-cohort robustness. Such sequential construction is particularly relevant in neuroblastoma, where biologically coupled modules are common, for example proliferation programs that co-vary with MYCN/E2F signaling, and where “high performance” can be inflated by internal splitting in highly structured data. The preservation of prognostic discrimination across independent RNA-seq cohorts supports the interpretation that the RiskScore captures a reproducible biological gradient downstream of chromatin remodeling-associated stratification, rather than a dataset-specific effect tied to the final five genes per se.

Mechanistically, multiple downstream analyses converge on a phenotype biased toward proliferation and genome maintenance in the high-risk stratum. In neuroblastoma, accelerated proliferation and checkpoint adaptation are frequently intertwined with MYCN-driven circuitry and E2F activity, and these states tend to align with treatment resistance and relapse propensity (5). Accordingly, the RiskScore is most plausibly interpreted as an integrated readout of the downstream proliferative state associated with chromatin remodeling-defined subtypes, rather than as a direct proxy for chromatin remodeling machinery itself or any single pathway perturbation. This framing also rationalizes why certain single-cell validations can appear visually weaker than anticipated: sparse expression matrices, dropout effects, and transient cell-cycle-linked bursts may dilute gene-level contrasts, whereas aggregation strategies, including patient-level pseudobulk and model-based residualization, can recover the signal more consistently.

The drug-sensitivity prediction component should be interpreted as hypothesis-generating rather than as evidence for immediate clinical translation. Computational mapping of patient transcriptomes to large pharmacogenomic reference panels provides a principled strategy to prioritize compounds, yet it inherits well-recognized limitations: cell lines incompletely recapitulate neuroblastoma developmental contexts, tumor-stroma interactions are absent, and predicted IC₅₀ differences may partly reflect transcriptional correlates of proliferation rather than direct target dependence. Importantly, compound-level predictions are not guaranteed to be concordant even among drugs grouped under the same nominal mechanism. For example, BET/bromodomain inhibitors may differ in scaffold, potency, and off-target effects, and their response patterns in reference cell lines are learned independently by the prediction model. Similarly, NU7441 (DNA-PK inhibitor) showed relatively higher predicted IC₅₀ in the high-risk group in our dataset, which may reflect reduced dependence on DNA-PK-mediated repair due to compensatory DNA damage response (DDR) wiring in highly proliferative tumors, or non-specific transcriptional correlates captured by the model. These observations reinforce that the drug analysis is hypothesis-generating and requires experimental validation. Nevertheless, the concordance between predicted vulnerabilities and the inferred biology is notable. In particular, the prominence of DNA damage response-related hypotheses aligns with prior neuroblastoma literature implicating DNA repair pathways and PARP-associated mechanisms as therapeutically relevant (21,22). Accordingly, the most defensible translational inference is not that any single agent will necessarily be effective, but that the high-risk transcriptional state nominates therapeutic classes, including DNA damage response modulation, replication stress targeting, or checkpoint interference, as priorities for experimental validation in neuroblastoma models.

Several limitations should be acknowledged. First, although external validation supports the reproducibility of the model across independent datasets, all cohorts analyzed in this study were retrospective and publicly available, and no prospective validation in an independent contemporary cohort was performed. In addition, the current study does not include in vitro or in vivo validation in preclinical neuroblastoma models. Therefore, the present findings should be interpreted as establishing a retrospective, hypothesis-generating prognostic framework rather than a clinically ready model. Second, because the signature was derived from expression data, it cannot fully distinguish whether the observed transcriptional pattern is driven by upstream genomic lesions, including recurrent segmental chromosomal aberrations, or by non-genetic state changes. In particular, some model genes may partially reflect chromosomal dosage effects rather than purely transcriptional regulation. Integrating matched DNA-level copy-number and transcriptomic data in future studies would help refine the relative contributions of structural genomic alterations and downstream tumor state. Finally, the pharmacogenomic prediction module is constrained by the representativeness of the training panels and should be complemented by wet-lab validation before any clinical extrapolation.

Conclusions

In summary, this study advances a coherent narrative in which chromatin remodeling-linked transcriptional states stratify neuroblastoma into reproducible phenotypes, a stability-oriented five-gene RiskScore quantifies these states at the patient level, and high-risk tumors are characterized by proliferative and repair-oriented programs, jointly motivating rational therapeutic hypotheses. By integrating subtype discovery, robustness-focused model construction, and biology-guided interpretation, this work provides an integrative scaffold that is distinct from signature-only studies and may facilitate future validation. If further strengthening is required, the most efficient additions would include orthogonal corroboration of the RiskScore cell-cycle relationship using independent proliferative indices, focused experimental testing of representative therapeutic classes in neuroblastoma models, and integration of established genomic risk features within a unified multivariable framework to further clarify incremental clinical value. At the current stage, the principal contribution of this study is to provide a biologically anchored and externally tested retrospective framework that can guide future mechanistic and prospective validation, rather than to claim immediate clinical implementation.

Acknowledgments

The authos thank all investigators who made publicly available the datasets used in this study.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2026-1-0195/rc

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2026-1-0195/prf

Funding: This work was supported by grants from the Tianjin Health Technology Project (No. 2022QN106) and Hebei Provincial Traditional Chinese Medicine Scientific Research Project (No. T2026078, to X.L.).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2026-1-0195/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Qiu B, Matthay KK. Advancing therapy for neuroblastoma. Nat Rev Clin Oncol 2022;19:515-33. [Crossref] [PubMed]
Sharma R, Yadav J, Bhat SA, et al. Emerging Trends in Neuroblastoma Diagnosis, Therapeutics, and Research. Mol Neurobiol 2025;62:6423-66. [Crossref] [PubMed]
Mao C, Poimenidou M, Craig BT. Current Knowledge and Perspectives of Immunotherapies for Neuroblastoma. Cancers (Basel) 2024;16:2865. [Crossref] [PubMed]
Balaguer J, García Hidalgo L, Hladun R, et al. Recent Evidence-Based Clinical Guide for the Use of Dinutuximab Beta in Pediatric Patients with Neuroblastoma. Target Oncol 2023;18:77-93. [Crossref] [PubMed]
Huang M, Weiss WA. Neuroblastoma and MYCN. Cold Spring Harb Perspect Med 2013;3:a014415. [Crossref] [PubMed]
Duan XF, Zhao Q. TERT-mediated and ATRX-mediated Telomere Maintenance and Neuroblastoma. J Pediatr Hematol Oncol 2018;40:1-6. [Crossref] [PubMed]
Hao F, Zhang Y, Hou J, et al. Chromatin remodeling and cancer: the critical influence of the SWI/SNF complex. Epigenetics Chromatin 2025;18:22. [Crossref] [PubMed]
Liapodimitri A, Tetens AR, Craig-Schwartz J, et al. Progress Toward Epigenetic Targeted Therapies for Childhood Cancer. Cancers (Basel) 2024;16:4149. [Crossref] [PubMed]
Feng J, Chen Z, Wang Y, et al. Identification of chromatin remodeling-related gene signature to predict the prognosis in breast cancer. Clin Exp Med 2025;25:137. [Crossref] [PubMed]
Huang M, Zeki J, Sumarsono N, et al. Epigenetic Targeting of TERT-Associated Gene Expression Signature in Human Neuroblastoma with TERT Overexpression. Cancer Res 2020;80:1024-35. [Crossref] [PubMed]
Hartlieb SA, Sieverling L, Nadler-Holly M, et al. Alternative lengthening of telomeres in childhood neuroblastoma from genome to proteome. Nat Commun 2021;12:1269. [Crossref] [PubMed]
Hagemann S, Misiak D, Bell JL, et al. IGF2BP1 induces neuroblastoma via a druggable feedforward loop with MYCN promoting 17q oncogene expression. Mol Cancer 2023;22:88. [Crossref] [PubMed]
Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 2010;26:1572-3. [Crossref] [PubMed]
Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43:e47. [Crossref] [PubMed]
Li J, Ma S. Time-dependent ROC analysis under diverse censoring patterns. Stat Med 2011;30:1266-77. [Crossref] [PubMed]
Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 2013;14:7. [Crossref] [PubMed]
Wu T, Hu E, Xu S, et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb) 2021;2:100141. [Crossref] [PubMed]
Badia-I-Mompel P. decoupleR: ensemble of computational methods to infer biological activities from omics data. Bioinform Adv 2022;2:vbac016. [Crossref] [PubMed]
Maeser D, Gruener RF, Huang RS. oncoPredict: an R package for predicting in vivo or cancer patient drug response and biomarkers from cell line screening data. Brief Bioinform 2021;22:bbab260. [Crossref] [PubMed]
Polychronopoulos PA, Bedoya-Reina OC, Johnsen JI. The Neuroblastoma Microenvironment, Heterogeneity and Immunotherapeutic Approaches. Cancers (Basel) 2024;16:1863. [Crossref] [PubMed]
Rader J, Russell MR, Hart LS, et al. Dual CDK4/CDK6 inhibition induces cell-cycle arrest and senescence in neuroblastoma. Clin Cancer Res 2013;19:6173-82. [Crossref] [PubMed]
Southgate HED, Chen L, Curtin NJ, et al. Targeting the DNA Damage Response for the Treatment of High Risk Neuroblastoma. Front Oncol 2020;10:371. [Crossref] [PubMed]

Cite this article as: Zhu A, Li X, Jin Z, Pan Y, Wang J. Chromatin remodeling-driven stratification and a five-gene RiskScore define proliferative aggressiveness and therapeutic vulnerabilities in neuroblastoma. Transl Cancer Res 2026;15(4):274. doi: 10.21037/tcr-2026-1-0195

Chromatin remodeling-driven stratification and a five-gene RiskScore define proliferative aggressiveness and therapeutic vulnerabilities in neuroblastoma

Highlight box

Introduction

Methods

Study design

Datasets and clinical endpoint

RNA-seq preprocessing and sample harmonization

Chromatin remodeling gene set

Consensus clustering and subtype definition

Clinical association and survival analyses of subtypes

Differential expression analysis between subtypes

Construction of the prognostic model and derivation of the five-gene RiskScore

RiskScore calculation and risk stratification

Prognostic performance assessment in the development cohort

External validation

Functional interpretation and pathway activity analyses

TF activity inference

In silico drug sensitivity prediction and candidate prioritization

Statistical analysis and software

Results

Chromatin remodeling related transcriptional programs define two robust neuroblastoma subtypes associated with adverse clinical features and survival

A five-gene prognostic signature was derived from subtype-associated transcriptional differences via penalized selection, collinearity control, and multivariable Cox coefficient estimation

The five-gene RiskScore stratifies survival and demonstrates strong time-dependent discrimination in the training cohort

External validation using a fixed training-derived cutoff supports generalizability across independent cohorts

Mechanistic analyses link high RiskScore to proliferative programs, replication stress tolerance, and DNA repair activity

In silico drug sensitivity prediction prioritizes mechanism-aligned candidate compounds associated with RiskScore

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share