Brain tumor detection based on transcription regulation features identified from public cerebrospinal fluid cell-free DNA sequencing data

Sixian Xia; Wei Dai; Jian Wu

doi:10.21037/tcr-2025-aw-2286

Original Article

Brain tumor detection based on transcription regulation features identified from public cerebrospinal fluid cell-free DNA sequencing data

Sixian Xia^1#, Wei Dai^2#, Jian Wu³

¹First Clinical Medical College, Nanjing Medical University, Nanjing, China; ²School of Animal Science and Food Engineering, Jinling Institute of Technology, Nanjing, China; ³Department of Bioinformatics, Nanjing Medical University, Nanjing, China

Contributions: (I) Conception and design: J Wu; (II) Administrative support: J Wu, W Dai; (III) Provision of study materials or patients: S Xia; (IV) Collection and assembly of data: S Xia; (V) Data analysis and interpretation: S Xia, W Dai; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work.

Correspondence to: Jian Wu, PhD. Department of Bioinformatics, Nanjing Medical University, Longmian Avenue 101, Nanjing 211166, China. Email: wujian@njmu.edu.cn.

Background: Characterization of the molecular features of a brain tumor is a critical step for patient treatment. Tissue-based detection methods are limited by the location of brain tumors and high intratumor heterogeneity, which also preclude repeat sampling to monitor tumor progression. Cerebrospinal fluid (CSF)-based noninvasive methods may provide an opportunity to solve these problems, but efficient markers are lacking. This study aims to develop and validate a CSF-based liquid biopsy approach to investigate the molecular characterization and transcriptional regulation features of brain tumors.

Methods: In this study, we conducted genome wide analysis of CSF cell free DNA (cfDNA) data collected from Sonic hedgehog (SHH) pathway-activated medulloblastoma (MB) patients sourced from gene expression omnibus (GEO) database, to identify genome features that differed significantly between patients with MB and those with hydrocephalus (P<0.001) using the whole genome bisulfite sequencing (WGBS) dataset.

Results: A total of 397 differential cfDNA genomic loci were identified and verified by assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) as SHH-MB specific; 114 were located in promoter regions, and related to genes specifically expressed in SHH-MB and, combined with DNA methylation state in these regions, could be used to classify the SHH-MB subtype from 763 samples. Twelve of 283 non-promoter loci were identified as super-enhancers and binding sites for transcription factors related to brain tumors were also identified in associated genomic regions. Patients with SHH-MB were then classified using these CSF cfDNA derived transcription regulation features.

Conclusions: CSF cfDNA from patients with brain tumors was used to determine transcription regulation features, which could reflect the molecular characteristics of brain tumors. Further, these features represent biomarkers with potential to identify patients with tumors. Our study provides a new application for CSF cfDNA and extends its use for investigating tumor-specific gene transcription regulation.

Keywords: Brain tumor; liquid biopsy; cerebrospinal fluid (CSF); cell-free DNA (cfDNA); transcription regulation

Submitted Oct 19, 2025. Accepted for publication Dec 26, 2025. Published online Feb 02, 2026.

doi: 10.21037/tcr-2025-aw-2286

Highlight box

Key findings

• The study successfully defined Sonic hedgehog-medulloblastoma (SHH-MB) patients specific transcriptional regulation features using sequencing data derived from cerebrospinal fluid (CSF) cell free DNA (cfDNA), demonstrating that CSF cfDNA can reflect the tumor’s transcription regulation characteristics.

What is known and what is new?

• Molecular characterization of brain tumors is critical for treatment, the CSF cfDNA can carry genetic and epigenetic information of biomarkers for brain tumors, but specific, efficient markers are lacking.

• This study identified SHH-MB-specific transcriptional regulation features using CSF cfDNA, and verified using multi-omics data, then subtype classifier analysis was constructed. Our study establishing a novel conceptual link that CSF cfDNA is not just a carrier of mutations but a direct reflector of the tumor’s active transcriptional regulatory feature.

What is the implication, and what should change now?

• Our study provides a blueprint for a less invasive, repeatable liquid biopsy for brain tumors using CSF cfDNA. The study extends the usage of CSF cfDNA into the investigation of chromatin accessibility feature of brain tumors, which opens new avenues for understanding tumor biology directly from biofluids.

• The findings require prospective validation in large, independent, and multi-center cohorts to confirm sensitivity, specificity, and robustness across diverse patient populations. Research should explore if this approach works for other MB subtypes and other brain tumor types.

Introduction

As a noninvasive molecular testing method, plasma cell free DNA (cfDNA)-based liquid biopsy has developed rapidly, and is used for the detection of several solid tumors, including lung (1,2) and liver (3,4) cancers. These applications have primarily focused on identifying mutations, copy number variations (CNVs), or methylation changes in cfDNA; however, use of such plasma-based approaches in patients with brain tumors has been impeded by low concentrations of circulating tumor DNA (ctDNA), due to blockage by the blood-brain barrier (BBB). Cerebrospinal fluid (CSF), which circulates throughout the central nervous systems (CNS), may provide more information about intracranial lesions and contain larger percentage of ctDNA compared with plasma from brain tumor patients (5). Therefore, several studies have focused on detecting and predicting brain tumors by analysis of CSF. For example, ctDNA in CSF was used to track glioma tumor evolution, based on the status of mutations, although ctDNA could only be detected in 49.4% of patients (6). Tumor-associated chromosomal CNVs analyzed in CSF cfDNA from patients with medulloblastoma (MB) can also serve as markers of measurable residual disease (MRD), and as a tumor-derived cfDNA-based method; 62% of cytologic-negative CSF samples were found to contain ctDNA, which decreased the sensitivity of detection (7). Use of CSF ctDNA-based liquid biopsy of brain tumors has also been limited by the detection rate, due to low cfDNA concentration or lack of efficient biomarkers. To overcome these shortcomings, development of new biopsy methods and markers is urgently required.

Dysregulation of gene transcription is a hallmark of cancers (8), and chromatin accessibility is an important transcription regulation mechanism that serves key roles during cancer development (9-11). The chromatin state of oncogene regulatory elements can be maintained in a closed or open state, which may lead to transcriptional dysregulation in cancers (12). Several genes related to chromatin remodeling, such as histone H3 and enhancer of zeste homolog 2 (EZH2), harbor mutations or are aberrantly expressed in brain tumors (13,14). The chromatin accessibility profiles of brain tumors are distinct, and even differ among CNS tumor subtypes (15). Additionally, in glioblastoma, a cancer stem cell population with a specific chromatin accessibility profile is associated with patient survival (16). Given these features, changes in chromatin state (open/closed) have considerable potential for application as a detection marker to predict and classify brain cancers. In our previous study, we demonstrated that chromatin accessibility could be investigated in plasma cfDNA, and used to successfully detect esophageal cancers (17).

MB, which is the most common malignant pediatric brain tumor and associated with high morbidity and mortality rates in the pediatric population (18,19), was selected as an example to investigate the possibility of detecting brain tumors based on transcription regulation features derived from cfDNA. Compared with other pediatric and adult cancers, MB tumors carry few genomic mutations (20), which hinders the application of liquid biopsies for patients with this type of tumor, and new diagnostic markers are needed. In this study, genome characteristics in CSF cfDNA data collected from patients with MB and those with hydrocephalus sourced in gene expression omnibus (GEO) database were compared, and with the openly defined MB chromatin features. A total of 397 chromatin regions were identified as exhibiting MB-specific chromatin accessibility status, and 114 of these loci were in promoter regions and related to gene expression levels. We also identified 12 super-enhancers within 283 associated non-promoter regions. MB subtypes could also be classified based on the transcription regulation features determined by CSF cfDNA analysis. Our findings provide new insights into the information available from CSF cfDNA from patients with brain tumors and broaden the application of cfDNA into analysis of transcription regulation, which can be used to analyze brain tumor molecular characteristics. We present this article in accordance with the STREGA reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2286/rc).

Methods

Data collection

CSF cfDNA sequencing data used in this study were sourced from the GEO database, a public functional genomics data repository maintained by the National Center for Biotechnology Information (NCBI). GEO archives high-throughput gene expression, epigenomic, and other functional genomics datasets submitted by the research community worldwide. CSF cfDNA data retrieved from the GEO database including five patients with Sonic hedgehog (SHH) pathway-activated MB and four with hydrocephalus patients were identified by unique GEO series accession numbers (accession No. GSE142241). The accession numbers are selected based on a systematic search of GEO using relevant keywords and predefined inclusion/exclusion criteria. Gene expression data from MB and normal brain tissues [accession No. GSE124814, gene expression array; GSE164677, RNA sequencing (RNA-seq)] were also downloaded from the GEO database. Assay for transposase accessible chromatin with high-throughput sequencing (ATAC-seq) data from MB tumor tissues were obtained from the GEO dataset (accession No. GSE240985). DNA methylation information generated using an Illumina Human Methylation 450K Bead Chip was downloaded from the GEO database (accession No. GSE85212). Detailed information on the data used is provided in Table S1.

GEO is a fully public repository. All data accessed are de-identified and consented for secondary research use under the original study’s ethical approvals. The primary data submitted to GEO were generated by the original submitting investigators, who obtained necessary patient consent and institutional review board (IRB) or ethics committee approval, as required by their institutional and national guidelines. These approvals are documented in the original published papers associated with each dataset. As this study constitutes a secondary analysis of fully de-identified, publicly available data, it is generally classified as non-human subjects research. Use of GEO data adheres to the NCBI Terms and conditions and any specific data use agreements or restrictions stipulated by the original data submitters, as noted on the respective GEO Series pages. No data with explicit prohibitions on secondary analysis were included. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Analysis of cfDNA sequencing data

Raw data were aligned to the hg19 reference genome using LiBis which were based on BSMAP (v2.90) (21), with default parameters as described in cfDNA origin paper (22) with additional steps. Briefly, the read1 and read2 fastq of pair-end sequencing samples were mapped to hg19 reference genome separately, then only paired mappable reads were kept for following analysis. Duplicate reads were filtered using sambamba (v0.7.1) (23). The hg19 reference genome was split into 1 kb windows using the bedtools makewindows function (24). cfDNA read counts in each window were calculated using bedtools intersect, and normalized to reads per million (RPM). Limma was used to identify differential genome regions, and P<0.05 was used as cutoff values to define significantly different regions (25). Genome region annotations were performed using ChIPseeker (v1.30.3) (26).

Analysis of high-throughput sequencing datasets

For ATAC-seq data, raw reads were mapped to the hg19 genome using Bowtie2 (v2.3.5.1) (27) with default settings. ATAC-seq peaks were called using MACS2 (v2.2.7.1) (28). All analyses of intersection between peaks were performed using bedtools intersect tools. Transcription factor (TF) binding sites were predicted using FIMO, according to the TF motif matrix collected in HOCOMOCO v11 core datasets (29). The Rank Ordering of Super-Enhancers (ROSE) algorithm was used to identify super-enhancers in differential genome regions (30). DNA methylation differential analysis was performed using the methyratio.py program in BSMAPz. The RNA-seq count matrix was analyzed using limma to identify the significant differential expressed genes between SHH MB tissues and para-tumor tissues, P value <0.05 and log₂ fold change >0.5 were used as cutoff value. For gene expression array analysis, SHH MB and control sample expression data were selected and significantly differentially expressed genes (DEGs) identified using the limma package, with P value <0.05 and log₂ fold change >0.5 as thresholds (31). To determine the DNA methylation state of selected genome regions for each sample, Illumina Human Methylation 450K Bead Chip probe sequences were mapped to hg19 using bowtie2, and intersection analysis conducted using the bedtools intersect program, to identify overlap between probes and genome regions. Then, β values of positions identified in each region were summed and divided by the length of regions, which were used to conduct classification analysis based on a logistic regression model constructed using glm function in R (version 4.2.0). And t-distributed stochastic neighbor embedding (t-SNE) was performed using the Rtsne package in R. Principal components analysis (PCA) was performed using the prcomp function in R.

Statistical analysis

Statistical analyses were conducted using R software (version 4.2.0) and its associated packages. For statistical analyses not described in the previous section, Student’s t-test was used. For all analyses, a two-sided P value less than 0.05 was considered statistically significant.

Results

Differences in CSF cfDNA genome distribution between patients with and without MB tumors

We hypothesized that the distribution of cfDNA through the whole genome was quite different between MB patients and hydrocephalus patients. Because the nucleosome position condition between two kinds of patients might be distinct which could perform regulate roles during the expression of disease specific genes. If correct, this should allow inferences about transcription regulation of individual genes from the distribution of cfDNA, also could be used to classify the tumor patients and to identify the specific type of MB patients (Figure 1). To investigate feature distribution throughout the whole genome of CSF cfDNA, read counts in each 1 kb window were calculated for samples from patients with MB and those with hydrocephalus. The distribution of cfDNA exhibited high diversity (Figure 2A). Differential distribution was detected both between MB and non-tumor samples, and between different patients with MB, such as the regions of chromosome X highlighted in Figure 2A. Based on these results, we inferred that patient types could be classified using the distribution of genome features in CSF cfDNA. To further investigate the differences in genome distribution, windows were annotated with genome features and normalized read counts calculated for each feature. In both the MB and control groups, the majority of reads were located in distal intergenic regions, which may serve as transcription regulation elements in MB (32,33) (Figure 2B). Additionally, read counts from distal intergenic regions differed significantly between the two groups (P<0.001) (Figure 2C). Promoter and intron regions also differed highly significantly (P<0.001) (Figure 2C). All of these genome features are closely related to gene expression regulation (34-36); hence, the results indicate that the distribution of genome features in CSF cfDNA may be useful for inferring transcription regulation characteristics. Genome regions with significantly different cfDNA read counts were further analyzed, to identify their features in patients with MB (Figure 2D). Patients with and without MB tumors were with low similarity, and could be clearly classified using these significantly different genome regions (Figure 2E), indicating that these genome features can be used as markers to identify patients with brain abnormalities. To investigate whether genome regions with significantly different read counts between the two groups could serve biological functions during tumor progression, we conducted pathway annotation. Several pathways, such as extracellular matrix organization, and degradation of the extracellular matrix, which are closely related to tumors, were enriched (37,38) (Figure 2F), indicating that these genome regions were associated with genes that have key roles in driving tumorigenesis and tumor development. Based on these findings, we inferred that the genome feature distribution in CSF cfDNA has great potential as a predictor of brain tumors.

Figure 1 The schematic workflow. Briefly, the public data sets were collected from GEO database, then the distribution feature of cfDNA derived from 3 MB and 4 HYD patients’ CSF through the whole genome were identified. The significant difference distribution genome regions of cfDNA were verified with chromatin open regions defined by ATAC-seq performed using MB patients’ tumor tissues. Then totally 397 verified genome regions were linked with gene transcription regulation, and used to classify the MB and HYD samples. The 397 genome regions were further used to predict the gene expression of MB patients and linked with the DNA methylation states of the different subtype of MB patients. Based on the differential of MB states of promoters included in these 397 genome regions, the patients with specific subtype (SHH) were identified. All datasets used in this study were also illustrated in the schematic figures. ATAC-seq, assay for transposase-accessible chromatin with high-throughput sequencing; cfDNA, cell free DNA; CSF, cerebrospinal fluid; GEO, gene expression omnibus; HYD, hydrocephalus; MB, medulloblastoma; RNA-seq, RNA sequencing; SHH, Sonic hedgehog pathway-activated MB; TSS, transcription start site; WNT, Wingless pathway-activated MB.

Figure 2 Distribution of CSF cfDNA collected from patients with MB and hydrocephalus. (A) cfDNA genome distributions in seven samples. Chromosome X is zoomed in to illustrate the order of samples. The color bar from low to high was showed. (B) Comparison of genome feature regions in cfDNA. The genome features in cfDNA regions are annotated and the percentages of each genome feature calculated. (C) Differences in genome features between patients with MB and hydrocephalus. RPM normalized read counts were calculated for each genome feature for the two group of samples and statistical analysis performed. ***, P<0.001. (D) Differential cfDNA reads in 1 kb genome windows throughout the whole genome. Read counts for each 1 kb window were normalized to 1 million reads per sample. Significantly differential genome regions are showed. SigDown: reads count lower in patients with MB. NS: no significant difference between patients with MB and those with hydrocephalus. SigUp: reads count higher in patients with MB. (E) Correlation between cfDNA samples from patients with MB and those with hydrocephalus. Similarity scores were calculated based on genome regions with differential cfDNA read counts. (F) Pathways enriched for genes in significantly differential genome regions. All original data related to this figure were obtained from the publicly accessible GEO database. MB: medulloblastoma patient sample. NT: non-tumor, hydrocephalus patient sample. cfDNA, cell free DNA; CSF, cerebrospinal fluid; GEO, gene expression omnibus; RPM, reads per million; UTR, untranslated region.

Significant differences in genome regions related to MB gene expression

As described in our previous paper, cfDNA genome feature distribution can reflect chromatin accessibility and gene expression in esophageal cancer (17). To investigate whether CSF cfDNA genome feature distribution in patients with MB can be used to reflect the transcription regulation characteristics of brain tumors, we analyzed the relationship between CSF cfDNA and genes specifically expressed in MB. First, the chromatin accessibility of differential genome regions identified in CSF cfDNA were validated. Genome regions with significantly different cfDNA read counts were compared with open chromatin regions, determined by ATAC-seq which was used as gold standard data to define chromatin opening state and active transcription genome elements. A total of 397 genome regions were verified using ATAC-seq (Figure 3A and the Supplementary Material, available at https://cdn.amegroups.cn/static/public/tcr-2025-aw-2286-1.xlsx). The fold change of ATAC-seq overlap and not-overlap genome regions were compared, which indicated that the overlap genome regions were with higher fold change (Figure 3B). To investigate whether these 397 genome regions could be used as markers to classify the tumor and non-tumor samples, we performed the PCA based on these overlapped genome regions. As shown in Figure 3C, the MB samples and non-tumor samples could be separated clearly with 2 principal components (PC1 and PC2) (Figure 3C). This result indicated that these genome regions or features extracted by PCA could be used as markers to construct the classifier to classify the samples. And the genome features of these regions were also analyzed. Promoter (28.72%) and distal intergenic (17.38%) regions accounted for nearly 50% (46.1%) of all validated regions (Figure 3D).

Figure 3 Genome regions with differential cfDNA read counts associated with gene expression. (A) Validation of cfDNA-defined genome regions using ATAC-seq data. Overlapping genome regions generated by all two methods were identified; 397 regions were detected by all methods. (B) The comparison of fold change of overlap and not-overlap genome regions. Overlap: genome regions existed in ATAC-seq defined chromatin open regions. Not-overlap: genome regions not existed in ATAC-seq defined chromatin open regions. (C) The classification of MB and NT samples using the overlap genome regions based on top 2 PCs derived from PCA analysis. MB: medulloblastoma patient sample; NT: non-tumor, hydrocephalus patient sample. (D) Genome features of the 397 genome regions. The percentages of each genome features were calculated. (E) Sankey diagram shows the fraction of each genome features between regions related to significant differential expressed genes and all 397 ATAC-seq validated regions. SDG regions: genome regions related to significant differential genes. (F) The expression of genes related to 397 genome regions with significant differential cfDNA reads. (G) The expression genes related to the genome regions located in promoter regions. Normal: gene expression value derived from para-tumor tissue RNA-seq. SHH: gene expression values derived from SHH MB tumor tissue RNA-seq. All original data related to this figure were obtained from the publicly accessible GEO database. ATAC-seq, assay for transposase-accessible chromatin with high-throughput sequencing; cfDNA, cell free DNA; GEO, gene expression omnibus; PC, principal components; PCA, PC analysis; RNA-seq, RNA sequencing; UTR, untranslated region.

The distribution of CSF cfDNA in different samples were distinct and closely related to the expression of genes (Figure 2A). And our previous results also demonstrated the differential cfDNA distribution regions could be related to the chromatin accessibility of MB patients (Figure 3A). Combined these results, we inferred that the differential regions could be used to predict the gene expression level. To validated this assumption, the genome regions were linked with DEG between MB and para-tumor tissues defined by RNA-seq. And totally 48 in 262 genome region related genes were defined as DEG. The comparison of genome features between all regions and DEG related regions showed that the promoter and 1st intron regions more closely related with gene expression (Figure 3E). Both of these genome features perform important roles during the regulation of gene expression. The gene expression levels of all genome region related DEG (Figure 3F) and promoter region related DEG (Figure 3G) were shown, which illustrated the significant difference between MB and non-tumor samples. The same results were also illustrated using gene expression array dataset (Figure S1). Based on these results, we conclude that CSF cfDNA distribution is related to chromatin accessibility and transcription regulation. Further, these features may also be related to specific gene expression levels in patients with MB.

Prediction of MB subtype using differential regions in promoter loci

The cfDNA differential distribution genome regions were showed related to chromatin accessibility and gene expression in our previous results. Based on these results, we inferred that the genome regions with different cfDNA distribution feature might be closely related with the DNA methylation state, and served roles in gene transcription regulation. As the CSF cfDNA samples analyzed in this study were subjected to whole genome bisulfite sequencing, our data are suitable for investigation of the relationship between cfDNA distribution and DNA methylation states. For each sample, there were regions with significantly higher DNA methylation levels compared with other samples (Figure 4A). For these 314 genome regions which were with significant differential DNA methylation state, the DNA methylation states of different groups were diverse (Figure 4B), especially for the promoter regions (Figure 4C). To further investigate the diversity of DNA methylation state in different samples, the similarity of samples in each group were calculated using the methylation state of promoter regions. As showed in Figure 4D,4E, the similarity in both MB and non-tumor samples were low. With this result, further investigation of the utility of these regions for classifying cancer types was conducted. As the promoter-located regions were identified using CSF cfDNA from patients with SSH MB, we hypothesized that they could distinguish SHH type MB samples from three other types of MB. Classification of MB subtypes was performed using a dataset containing 763 MB samples (SSH, n=223; wingless pathway-activated, n=70; Group 3, n=144; and Group 4, n=326), generated using an Illumina 450K DNA methylation array. As shown in Figure 4F, samples collected from patients with SHH MB were clearly distinguished from other MB subtypes. To further investigate the reliability of these genome features in classifying the samples, a machine learning classifier was constructed based on the methylation states of the defined genome features. The SHH MB samples could be classified with high sensitivity and specificity in both training and testing datasets (Figure S2A,S2B). We also demonstrated this model could differentiate SHH and other types of MB samples in the training data set, with the value of area under the curve (AUC) was 1, and the testing data set with the value of AUC was 0.998 (Figure S2C,S2D). Unsupervised hierarchical clustering of these features was also able to distinguish SHH from other types of samples (Figure S2E,S2F). These results demonstrate that differential genome regions in promoters cannot only be used to predict gene expression levels, but also to identify the specific MB patient subtypes, when combined with analysis of DNA methylation states.

Figure 4 Genome regions with differential cfDNA reads count related to DNA methylation states in MB and non-tumor samples. (A) The DNA methylation states defined by WGBS of MB and non-tumor samples. Top 1,000 significant differential regions were showed. (B) The comparison of DNA methylation state of 314 differential genome regions. (C) The comparison of DNA methylation state of promoter regions in 314 differential regions. (D,E) The correlation between different samples in each class (D, MB patient samples; E, non-tumor patient samples). (F) Separation of different MB subtypes based on the DNA methylation state of promoter regions in 314 genome regions. The classification was performed with the DNA methylation states of promoter regions with significant distribution feature of cfDNA, and the SHH subtype could be separated clearly. MB: medulloblastoma patient sample. NT: non-tumor, hydrocephalus patient sample. SHH: Sonic hedgehog pathway-activated MB. WNT: Wingless pathway-activated MB. Group 3 & Group 4: Group 3 and Group 4 subtype of MB. All original data related to this figure were obtained from the publicly accessible GEO database. cfDNA, cell free DNA; GEO, gene expression omnibus; t-SNE, t-distributed stochastic neighbor embedding; WGBS, whole genome bisulfite sequencing.

Transcription regulation features identified in CSF cfDNA

In addition to the 114 regions identified around transcription start site (TSS), 283 regions distant from TSS, which might play roles in a large distance gene transcription, were detected. To investigate the transcription regulation function of these regions, we analyzed TF binding sites in these regions. High densities of TF binding sites were detected in these non-promoter regions (Figure 5A). The top 50 TFs with high binding sites numbers are listed in Figure 5B. Interestingly, the TF with highest binding site number was SP2 TF (Figure 5B), which was reported serve roles in cancer metabolism (39). And MYC-associated zinc finger protein (MAZ), which is a reported MYC binding partner that acts through physical interaction with super-enhancers in brain tumors, which had the second highest number of binding sites (40) (Figure 5B). Other TFs such as vascular endothelial zinc finger 1 (VEZF1), which was with large number of binding sites (Figure 5B), has also been found to bind super-enhancer sites in neuroblastoma cell lines (41). We also investigate the correlation between TF numbers and TF binding sites, although no significant relationship was found (Figure 5C). With the high density of TF binding sites in these genome regions, we infer that the distribution features of CSF cfDNA may be useful in investigation of tumor-related super-enhancers. Using the ROSE algorithm, 12 regions were defined as super-enhancers (Figure 5D). To further investigate the relationship between cfDNA distribution, TF numbers and chromatin accessibility, genome feature from a chromatin region on chromosome 21, which had the highest super-enhancer signal value, showed in University of California Santa Cruz (UCSC) genome browser (Figure 5E). The high level of chromatin open state and high density of TF number were illustrated, consistent with the feature of super-enhancer (Figure 5E). Together, these results led us to conclude that the distribution features of CSF cfDNA can provide clues relevant to investigation of the role of super-enhancers in MB tumors, and also with potential to be used to analyze transcription regulation mechanisms in tumors.

Figure 5 Relationship between cfDNA distribution features and transcription regulation. (A) Number of TFs and TF binding sites in non-promoter regions with significantly different cfDNA read counts. (B) Top 50 TFs ordered by number of binding sites in the 283 non-promoter genome regions. The total number of binding sites of each TFs were also showed. (C) The correlation between TF binding sites and the number of TFs in each genome regions. (D) Genome regions defined as super-enhancers. Genome regions with the highest signals calculated using ROSE software are shown. The cutoff value used to define super-enhancers is also presented. (E) UCSC genome browser showed the H3K27Ac states and transcription factor binding sites of JASPAR database in top 1 super-enhancer (chr21:44936001–44937000) defined by ROSE. The defined super-enhancer regions were highlighted in the genome browser. All original data related to this figure were obtained from the publicly accessible GEO database. BS, binding site; cfDNA, cell free DNA; GEO, gene expression omnibus; TF, transcription factor; UCSC, University of California Santa Cruz.

Classification of patients with MB based on cfDNA distribution features

Disordered transcription regulation is an important feature of cancer (42), while the transcription regulation features of tumors exhibit considerable diversity (15). Using cfDNA distribution data, we predicted gene expression levels and super-enhancer sites, indicating that the transcription regulation features of MB can be reflected in CSF cfDNA characteristics. Based on these results, we conclude that cfDNA distribution features could be used to classify patients with cancer (Figure 3C). To further investigate whether the classification could be performed using lower sequencing depth, we randomly subsampled the cfDNA sequencing datasets to 10⁶ reads level. With lower sequencing depth, we found that 19 regions were not covered in 10⁶ reads level (Figure 6A). The fold change between covered and not covered genome regions were also compared. Although fold change of non-covered regions was slightly higher, no significant difference was found (Figure 6B). For these 378 genome regions, we conducted PCA to investigate the ability of sample classification of these CSF cfDNA distribution features. Patients with MB and hydrocephalus could be separated clearly using 2 PCs, especially for hydrocephalus samples which were clustered together (Figure 6C). This result demonstrates that the differential chromatin regions identified in cfDNA could be used to predict patients with MB. The sequencing data depth used in our prediction procedure was generally closely associated with the reliability of the method and sensitivity of the markers. The result showed that the features defined in this study has great potential to be used as markers to construct a classifier even at 10⁶ read count levels (Figure 6C), in which the mean genome coverage was about 0.07× (Table S2). The result also demonstrated the high sensitivity of the use of genome regions as markers to predict MB sample types.

Figure 6 Classification of patients with MB and hydrocephalus using cfDNA defined biomarkers. (A) The number of genome regions covered and not covered by 10⁶ reads datasets. (B) The comparison of fold change between covered and not covered genome regions in subsampled datasets. The subsampled datasets contained 10⁶ reads. (C) PCA analysis of tumor and non-tumor patients with cfDNA distribution differential regions using 10⁶ reads. Covered: the genome regions covered by subsampled 10⁶ reads datasets. Non-covered: the genome regions not covered by subsampled 10⁶ reads datasets. MB: medulloblastoma patient sample. NT: non-tumor, hydrocephalus patient sample. All original data related to this figure were obtained from the publicly accessible GEO database. cfDNA, cell free DNA; GEO, gene expression omnibus; PC, principal components; PCA, PC analysis.

We also analyzed changes in distribution features during therapy procedures in one patient with MB; although the patient could be separated from other patients with MB using cfDNA collected both during and after treatment, the cfDNA distribution features clearly differed at the different time points (Figure S3A). To further demonstrate the cfDNA distribution differential between treat and off treat MB patient, the comparison between these two samples was also performed. As shown in Figure S3B, they were with distinct distribution features, which might provide great potential to classify two treatment conditions. This diversity distribution features were also reflected by similarities among different samples (Figure S3C). Further, using features extracted by PCA from the cfDNA distribution features, this patient with MB could be distinguished from patients with hydrocephalus (Figure S3D). Based on these results, we hypothesize that the transcription regulation features identified in cfDNA could serve as markers to trace the state of patients with MB through the whole treatment procedure.

Discussion

Molecular characterization is a key step in cancer treatment procedures, particularly for patients with brain tumors, and generally requires tumor specimens. For brain tumors, such as MB and glioma, specimens may not be collected because of their location. Further, CNS tumors exhibit high intratumor heterogeneity (43); hence, sampled tumor tissues may not represent the entire molecular characteristics of a tumor (44). Moreover, collection of multiple tissue samples during the therapy process to monitor brain tumor development and therapeutic effects is rarely possible for patients with brain tumors. Hence, new ways to determine the molecular features of brain tumors are urgently needed. In our study, the transcription regulation features of patients with MB were investigated using CSF cfDNA data sourced in GEO database, which could also be used to identify the specific MB subtypes based on tumor molecular characteristics.

Collection of CSF for cytology analysis is a routine method for disease staging and assessment, based on the presence of circulating brain tumor cells (45). In MB, which was selected as a model, the majority of patients have hydrocephalus, and CSF drainage is required before surgery (46,47). These procedures provide a potential source of CSF for cfDNA extraction and analysis of the resulting tumor-related information, without additional sampling steps. As a liquid which directly interacts with brain and brain tumor tissues, CSF cfDNA has been analyzed in several studies (48,49). These studies focused primarily on tumor-specific mutations (48), CNVs (7), and DNA methylation changes (22), each of which have disadvantages as detection targets. For mutations and CNVs, low amounts of tumor-derived cfDNA may be present in CSF, while methylation analysis requires tedious steps that can lead to loss of cfDNA. In our study, we analyzed the relationship between MB characteristics and CSF cfDNA distribution features using 1 kb windows through the whole genome, with the aim of identifying genome wide markers for patient diagnosis and prognosis prediction. We deemed that during the degradation procedure of cfDNA by DNA nuclease, the degraded chance of the features obtained by a wider genome region (1 kb window) was much lower than one base-pair based features such as mutation. That meant even part of cfDNA located in 1 kb genome regions were degraded, the rest part of cfDNA fragments could serve roles during the identification. Based on these genome wide features, detection sensitivity was improved, compared with mutation-based detection methods, which focus on single-base-pair analysis. Compared with classification methods based on DNA methylation, the bisulfite conversion should be performed to complete the DNA methylation detection procedure, which caused the reduction of cfDNA (50,51). The recovery ratio of cfDNA from bisulfite conversion varied from 22% to 66% caused by different commercial treatment kit used in this step (52). While the data used in our analyses were generated by shallow whole genome sequencing of cfDNA, and did not need prior treatment of cfDNA samples, greatly reducing cfDNA loss. For these reasons, in our previous studies which were performed the cfDNA sequencing using Single strand Adaptor Library Preparation (SALP) to investigate the chromatin accessibility features of esophageal cancer, only 200 µL was used to extract input cfDNA for sequencing library construction (17,53).

Like cfDNA in plasma, CSF cfDNA is also generated by dying and apoptotic cells (54); hence, the distribution of cfDNA is closely related to cell chromosome structures, such as nucleosome position. cfDNA which was wrapped around nucleosomes may be protected from nuclease degradation and be available for high-throughput sequencing. While cfDNA which was not protected by nucleosomes may be degraded when released into the circulation, precluding sequencing. Due to these mechanisms, the distributions of cfDNA in nucleosome protected and unprotected genome regions would differ (Figure 3D). These genome regions, such as TSS, were closely related with gene expression levels in different conditions. More importantly, chromosome structures are closely related to chromatin accessibility; therefore, the chromatin opening state can be demonstrated by cfDNA genome distribution. As an important character, the relationship between cfDNA and nucleosome position in pediatric cancer was also focused by other groups, such as Peneder and colleagues (55). Unlike in our study, Peneder et al. paid attention to Ewing sarcoma, which could be investigated using cfDNA from blood (Table S3). While in our study, due to the blood-brain-barrier low brain tumor related cfDNA could be detected in blood. For this reason, CSF cfDNA were used as detection target in our study. Besides, in Peneder’s study, cfDNA were split into short (100–150 bp) and long (151–200 bp) categories based on the length of fragments (Table S3). And the ratio between short and long fragments of genome regions were used to identify cancer samples. In our methods, the counts of cfDNA fragments in much smaller genome regions (1 kb region) were calculated and compared between tumor and non-tumor patients (Table S3), which could provide higher resolution of nucleosome position. And in our method, the position and release states of nucleosome might be detected directly based on the cfDNA reads distribution information.

Under the assumption that cfDNA positioning closely reflects nucleosome occupancy, CSF cfDNA was used to identify elements involved in transcription regulation in patients with MB. In our proof-and-concept study, although the sample number was limited, the 397 chromatin regions identified in our study were verified by ATAC-seq as active transcription regulation regions. Promoter regions could be used to predict gene expression, while non-promoter regions were associated with super-enhancers, demonstrating the transcription regulation features of MB. Additionally, differences in transcription regulation between patients were a decisive factor for identification of different MB subtypes, by effecting the expression levels of specific genes. Therefore, cfDNA distribution features were used to identify subtypes of patients with MB and we could accurately identify patients with MB based on different level of sequence depth data (Figure 3C, Figure 6C). During the identification of super-enhancers using cfDNA differential distribution genome regions, we found that some regions with large numbers of TF binding sites were not defined as super-enhancers (Figure 5A). We infer that TF types and the importance weighting of each TF during transcription regulation should be considered, to increase the relationship between TF binding site-rich genome regions and super-enhancers. In conclusion, our method could not only identify specific MB subtype, such as SHH MB using low read count number data, but could also with great potential to reflect the transcription regulation characteristics of pediatric MB tumors.

Based on the conclusion of our study, the validation of the potential markers should be performed using a larger amount of MB CSF samples. With the sequencing data of a larger patients set, the machine learning based classifier could be trained using our defined CSF cfDNA markers, which might be used to classify the types of patients with high sensitivity and accuracy. With our cfDNA markers, the CSF cfDNA derived from CNS patients could be sequenced and the reads count of each feature could be calculated. Using these reads count as input of the classifier, the clinical classification of patients can be defined. Furthermore, during the construction of classifier, the CSF cfDNA features might be further selected. If an appropriate number of features was kept for the classifier, the cfDNA signal can be detected using a low-throughput detection method such as polymerase chain reaction (PCR). That means that based on our CSF cfDNA markers, the classification of MB patients can be done in a much efficient way, which could not only reduce the cost for patients, but could also save time for clinical examination. More importantly, the samples used for testing were collected in a minimally invasive manner, which was with high patient compliance. We also evaluate the subtype classification ability of CSF cfDNA combine with DNA methylation states of our markers by constructing a machine learning classifier (Figure S2). The high sensitivity and specificity of the classifier also provides a way to identify the specific subtype of MB patients. Besides, the difference of cfDNA distribution during the treatment of one MB patient was illustrated (Figure S3C), that meant that our method might be with great potential to be used in monitoring the progress of MB. Furthermore, the enhancers could be used to dissect the molecular differences between histologically similar tumor entities, which was an urgent issue to be addressed for pediatric CNS tumor patients (56,57). In our study, the super-enhancer could be identified through the distribution feature of cfDNA, offering unique information to inform precision therapies for pediatric patients. On the other hand, the transcription regulation clues obtained from cfDNA could also provide chance to illustrate the mechanisms of the progression of CNS tumors in a continual way. In addition, CSF derived cfDNA detection as minimally invasive molecular diagnostics approach could be performed through the whole process of treatment for one patient. Based on serial CSF cfDNA samples, the treatment effect of therapy strategy could be monitored timely, which was a critical step for the CNS tumor therapy.

Based on the ability to investigate chromatin accessibility and super-enhancer features using CSF cfDNA (58), our results provide a novel way to track the transcription regulation features of brain tumor in real time. That is extremely challenging with traditional tissue biopsy. By capturing dynamic changes in chromatin open state, our method can demonstrate the tumor progression at the epigenetic level, providing a new insight into brain tumor mechanism study. This might reveal further characteristics of brain tumor such as MB and support the development of new brain tumor therapeutic strategies based on these discoveries.

In our proof-and-concept study, MB was selected as a model for the analysis, due to its shortage of mutations compared with other adult tumors. Our method of classification is mainly based on cfDNA genome feature distribution, which represents differences in chromatin accessibility between samples. This method could also be used to predict large numbers of tumors that exhibit specific chromatin accessibility features, as chromatin accessibility heterogeneity among different cancer types and subtypes has been demonstrated (15), providing an opportunity to classify tumors using our method, and further studies should be performed to predict and diagnose other types of cancer.

There are some limitations in this study. In our study, the control CSF samples were collected from hydrocephalus patients, although no tumor symptom was detected from these patients. Some bias might be caused by these non-tumor controls. Some inflammation related signatures in MB patients might be missed due to the hydrocephalus patient’s symptom. To overcome this limitation, CSF of diverse non-MB patients, such as other types of tumor patients without brain metastasis, should be collected and involved in further study. Besides, no classifier was constructed based on the limited sample number of our study to classify MB samples, and the prediction sensitivity and specificity of our method was not calculated, although we constructed a prediction model using the DNA methylation state of the defined genome region with high accuracy. Larger number of MB CSF samples should be collected and used to further validate the reliability of our method and markers.

Conclusions

In this study, we successfully analyzed CSF cfDNA data sourced in GEO database from patients with MB and hydrocephalus, and identified MB related cfDNA distribution features. Based on these genome features, gene expression levels and super-enhancers were analyzed in patients with MB. Further, important TF binding states were also determined by analysis of identified transcription regulation elements. More importantly, the identified genomic distribution features could serve as biomarkers to identify patients with specific MB subtype using low depth of sequence data, which overcomes the shortcomings of CSF cfDNA-based detection methods. Finally, this study provides a new application of CSF cfDNA, which connect CSF cfDNA and the chromatin accessibility diversity of brain tumor patients, and also provides a novel way to detect brain tumors based on epigenetic features.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the STREGA reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2286/rc

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2286/prf

Funding: This work was supported by the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province (No. 21KJB310015), Natural Science Foundation of Jiangsu Province (No. BK20210005), and the Research Foundation for Advanced Talents of Jinling Institute of Technology (No. jit-b-202105).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2286/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Chabon JJ, Hamilton EG, Kurtz DM, et al. Integrating genomic features for non-invasive early lung cancer detection. Nature 2020;580:245-51. [Crossref] [PubMed]
Moding EJ, Liu Y, Nabet BY, et al. Circulating Tumor DNA Dynamics Predict Benefit from Consolidation Immunotherapy in Locally Advanced Non-Small Cell Lung Cancer. Nat Cancer 2020;1:176-83. [Crossref] [PubMed]
Chen L, Abou-Alfa GK, Zheng B, et al. Genome-scale profiling of circulating cell-free DNA signatures for early detection of hepatocellular carcinoma in cirrhotic patients. Cell Res 2021;31:589-92. [Crossref] [PubMed]
Zhang X, Wang Z, Tang W, et al. Ultrasensitive and affordable assay for early detection of primary liver cancer using plasma cell-free DNA fragmentomics. Hepatology 2022;76:317-29. [Crossref] [PubMed]
Gao X, Tao Y, Lamas V, et al. Treatment of autosomal dominant hearing loss by in vivo delivery of genome editing agents. Nature 2018;553:217-21. [Crossref] [PubMed]
Miller AM, Shah RH, Pentsova EI, et al. Tracking tumour evolution in glioma through liquid biopsies of cerebrospinal fluid. Nature 2019;565:654-8. [Crossref] [PubMed]
Liu APY, Smith KS, Kumar R, et al. Serial assessment of measurable residual disease in medulloblastoma liquid biopsies. Cancer Cell 2021;39:1519-30.e4. [Crossref] [PubMed]
Botten GA, Xu J. Genetic and epigenetic dysregulation of transcriptional enhancers in cancer. Annual Review of Cancer Biology 2025;9:79-97.
Hooda J, Novak M, Salomon MP, et al. Early Loss of Histone H2B Monoubiquitylation Alters Chromatin Accessibility and Activates Key Immune Pathways That Facilitate Progression of Ovarian Cancer. Cancer Res 2019;79:760-72. [Crossref] [PubMed]
Pierce SE, Granja JM, Corces MR, et al. LKB1 inactivation modulates chromatin accessibility to drive metastatic progression. Nat Cell Biol 2021;23:915-24. [Crossref] [PubMed]
Avgustinova A, Symeonidi A, Castellanos A, et al. Loss of G9a preserves mutation patterns but increases chromatin accessibility, genomic instability and aggressiveness in skin tumours. Nat Cell Biol 2018;20:1400-9. [Crossref] [PubMed]
Tome-Garcia J, Erfani P, Nudelman G, et al. Analysis of chromatin accessibility uncovers TEAD1 as a regulator of migration in human glioblastoma. Nat Commun 2018;9:4020. [Crossref] [PubMed]
Moraitis S, Piperi C. Multi-Faceted Role of Histone Methyltransferase Enhancer of Zeste 2 (EZH2) in Neuroinflammation and Emerging Targeting Options. Biology (Basel) 2025;14:749. [Crossref] [PubMed]
Ludzia P, Ishii M, Deák G, et al. The kinetoplastid kinetochore protein KKT23 acetyltransferase is a structural homolog of GCN5 that acetylates the histone H2A C-terminal tail. Structure 2025;33:123-35.e10. [Crossref] [PubMed]
Corces MR, Granja JM, Shams S, et al. The chromatin accessibility landscape of primary human cancers. Science 2018;362:eaav1898.
Guilhamon P, Chesnelong C, Kushida MM, et al. Single-cell chromatin accessibility profiling of glioblastoma identifies an invasive cancer stem cell population associated with lower survival. Elife 2021;10:e64090. [Crossref] [PubMed]
Wu J, Dai W, Wu L, et al. Decoding genetic and epigenetic information embedded in cell free DNA with adapted SALP-seq. Int J Cancer 2019;145:2395-406. [Crossref] [PubMed]
Skowron P, Farooq H, Cavalli FMG, et al. The transcriptional landscape of Shh medulloblastoma. Nat Commun 2021;12:1749. [Crossref] [PubMed]
Guerreiro Stucklin AS, Ramaswamy V, Daniels C, et al. Review of molecular classification and treatment implications of pediatric brain tumors. Curr Opin Pediatr 2018;30:3-9. [Crossref] [PubMed]
Thatikonda V, Islam SMA, Autry RJ, et al. Comprehensive analysis of mutational signatures reveals distinct patterns and molecular processes across 27 pediatric cancers. Nat Cancer 2023;4:276-89. [Crossref] [PubMed]
Xi Y, Li W. BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics 2009;10:232. [Crossref] [PubMed]
Li J, Zhao S, Lee M, et al. Reliable tumor detection by whole-genome methylation sequencing of cell-free DNA in cerebrospinal fluid of pediatric medulloblastoma. Sci Adv 2020;6:eabb5427. [Crossref] [PubMed]
Tarasov A, Vilella AJ, Cuppen E, et al. Sambamba: fast processing of NGS alignment formats. Bioinformatics 2015;31:2032-4. [Crossref] [PubMed]
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010;26:841-2. [Crossref] [PubMed]
Chen Y, Chen L, Lun ATL, et al. edgeR v4: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and larger datasets. Nucleic Acids Res 2025;53:gkaf018. [Crossref] [PubMed]
Yu G, Wang LG, He QY. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 2015;31:2382-3. [Crossref] [PubMed]
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012;9:357-9. [Crossref] [PubMed]
Cipriano A, Colantoni A, Calicchio A, et al. Transcriptional and epigenetic characterization of a new in vitro platform to model the formation of human pharyngeal endoderm. Genome Biol 2024;25:211. [Crossref] [PubMed]
Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics 2011;27:1017-8. [Crossref] [PubMed]
Whyte WA, Orlando DA, Hnisz D, et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 2013;153:307-19. [Crossref] [PubMed]
Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43:e47. [Crossref] [PubMed]
Ghisletti S, Barozzi I, Mietton F, et al. Identification and characterization of enhancers controlling the inflammatory gene expression program in macrophages. Immunity 2010;32:317-28. [Crossref] [PubMed]
Hah N, Danko CG, Core L, et al. A rapid, extensive, and transient transcriptional response to estrogen signaling in breast cancer cells. Cell 2011;145:622-34. [Crossref] [PubMed]
Andersson R, Sandelin A. Determinants of enhancer and promoter activities of regulatory elements. Nat Rev Genet 2020;21:71-87. [Crossref] [PubMed]
Haberle V, Stark A. Eukaryotic core promoters and the functional basis of transcription initiation. Nat Rev Mol Cell Biol 2018;19:621-37. [Crossref] [PubMed]
Rose AB. Introns as Gene Regulators: A Brick on the Accelerator. Front Genet 2018;9:672. [Crossref] [PubMed]
Jackson HK, Mitoko C, Linke F, et al. Extracellular Vesicles Potentiate Medulloblastoma Metastasis in an EMMPRIN and MMP-2 Dependent Manner. Cancers (Basel) 2023;15:2601. [Crossref] [PubMed]
Li H, Liu Y, Liu Y, et al. Tumor-associated astrocytes promote tumor progression of Sonic Hedgehog medulloblastoma by secreting lipocalin-2. Brain Pathol 2024;34:e13212. [Crossref] [PubMed]
Orzechowska-Licari EJ, LaComb JF, Mojumdar A, et al. SP and KLF Transcription Factors in Cancer Metabolism. Int J Mol Sci 2022;23:9956. [Crossref] [PubMed]
Sin-Chan P, Mumal I, Suwal T, et al. A C19MC-LIN28A-MYCN Oncogenic Circuit Driven by Hijacked Super-enhancers Is a Distinct Therapeutic Vulnerability in ETMRs: A Lethal Brain Tumor. Cancer Cell 2019;36:51-67.e7. [Crossref] [PubMed]
Decaesteker B, Denecker G, Van Neste C, et al. TBX2 is a neuroblastoma core regulatory circuitry component enhancing MYCN/FOXM1 reactivation of DREAM targets. Nat Commun 2018;9:4866. [Crossref] [PubMed]
Bushweller JH. Targeting transcription factors in cancer - from undruggable to reality. Nat Rev Cancer 2019;19:611-24. [Crossref] [PubMed]
Schaettler MO, Richters MM, Wang AZ, et al. Characterization of the Genomic and Immunologic Diversity of Malignant Brain Tumors through Multisector Analysis. Cancer Discov 2022;12:154-71. [Crossref] [PubMed]
Na MK, Oh Y, Lee D, et al. Comparison of the biological characteristics of glioblastoma tumorspheres obtained from fresh and cryopreserved glioblastoma tissues. J Neurooncol 2025;174:191-206. [Crossref] [PubMed]
Garcés JJ, Cedena MT, Puig N, et al. Circulating Tumor Cells for the Staging of Patients With Newly Diagnosed Transplant-Eligible Multiple Myeloma. J Clin Oncol 2022;40:3151-61. [Crossref] [PubMed]
Sainte-Rose C, Cinalli G, Roux FE, et al. Management of hydrocephalus in pediatric patients with posterior fossa tumors: the role of endoscopic third ventriculostomy. J Neurosurg 2001;95:791-7. [Crossref] [PubMed]
Escudero L, Llort A, Arias A, et al. Circulating tumour DNA from the cerebrospinal fluid allows the characterisation and monitoring of medulloblastoma. Nat Commun 2020;11:5376. [Crossref] [PubMed]
Kim S, Baldassari S, Sim NS, et al. Detection of Brain Somatic Mutations in Cerebrospinal Fluid from Refractory Epilepsy Patients. Ann Neurol 2021;89:1248-52. [Crossref] [PubMed]
Pagès M, Rotem D, Gydush G, et al. Liquid biopsy detection of genomic alterations in pediatric brain tumors from cell-free DNA in peripheral blood, CSF, and urine. Neuro Oncol 2022;24:1352-63. [Crossref] [PubMed]
Munson K, Clark J, Lamparska-Kupsik K, et al. Recovery of bisulfite-converted genomic sequences in the methylation-sensitive QPCR. Nucleic Acids Res 2007;35:2893-903. [Crossref] [PubMed]
Grunau C, Clark SJ, Rosenthal A. Bisulfite genomic sequencing: systematic investigation of critical experimental parameters. Nucleic Acids Res 2001;29:e65. [Crossref] [PubMed]
Worm Ørntoft MB, Jensen SØ, Hansen TB, et al. Comparative analysis of 12 different kits for bisulfite conversion of circulating cell-free DNA. Epigenetics 2017;12:626-36. [Crossref] [PubMed]
Liu S, Wu J, Xia Q, et al. Finding new cancer epigenetic and genetic biomarkers from cell-free DNA by combining SALP-seq and machine learning. Comput Struct Biotechnol J 2020;18:1891-903. [Crossref] [PubMed]
Pan W, Gu W, Nagpal S, et al. Brain tumor mutations detected in cerebral spinal fluid. Clin Chem 2015;61:514-22. [Crossref] [PubMed]
Peneder P, Stütz AM, Surdez D, et al. Multimodal analysis of cell-free DNA whole-genome sequencing for pediatric cancers with low mutational burden. Nat Commun 2021;12:3230. [Crossref] [PubMed]
Mack SC, Pajtler KW, Chavez L, et al. Therapeutic targeting of ependymoma as informed by oncogenic enhancer profiling. Nature 2018;553:101-5. [Crossref] [PubMed]
Miller AM, Karajannis MA. Current Role and Future Potential of CSF ctDNA for the Diagnosis and Clinical Management of Pediatric Central Nervous System Tumors. J Natl Compr Canc Netw 2022;20:1363-9. [Crossref] [PubMed]
George SL, Lynn C, Stankunaite R, et al. Stratified Medicine Pediatrics: Cell-Free DNA and Serial Tumor Sequencing Identifies Subtype-Specific Cancer Evolution and Epigenetic States. Cancer Discov 2025;15:717-32. [Crossref] [PubMed]

Cite this article as: Xia S, Dai W, Wu J. Brain tumor detection based on transcription regulation features identified from public cerebrospinal fluid cell-free DNA sequencing data. Transl Cancer Res 2026;15(2):103. doi: 10.21037/tcr-2025-aw-2286

Brain tumor detection based on transcription regulation features identified from public cerebrospinal fluid cell-free DNA sequencing data

Highlight box

Introduction

Methods

Data collection

Analysis of cfDNA sequencing data

Analysis of high-throughput sequencing datasets

Statistical analysis

Results

Differences in CSF cfDNA genome distribution between patients with and without MB tumors

Significant differences in genome regions related to MB gene expression

Prediction of MB subtype using differential regions in promoter loci

Transcription regulation features identified in CSF cfDNA

Classification of patients with MB based on cfDNA distribution features

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share