Identification and validation of anoikis-related differentially expressed genes in nasopharyngeal carcinoma

Chaobin Huang; Ying Peng; Xueping Zheng; Siqian Cai; Zhongmei Lin; Yahan Zheng; Wei Zheng; Fengying Peng; Yuanji Xu

doi:10.21037/tcr-2025-1263

Original Article

Identification and validation of anoikis-related differentially expressed genes in nasopharyngeal carcinoma

Chaobin Huang^1#, Ying Peng^2#, Xueping Zheng^3#, Siqian Cai¹, Zhongmei Lin¹, Yahan Zheng¹, Wei Zheng¹, Fengying Peng⁴, Yuanji Xu^1,5

¹Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, China; ²Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, China; ³Emergency Department, Women and Children’s Hospital, School of Medicine, Xiamen University, Xiamen, China; ⁴Department of Pathology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, China; ⁵Fujian Key Laboratory of Advanced Technology for Cancer Screening and Early Diagnosis, Fujian Cancer Hospital, Fuzhou, China

Contributions: (I) Conception and design: F Peng, Y Xu; (II) Administrative support: C Huang, F Peng, Y Xu; (III) Provision of study materials or patients: Y Xu; (IV) Collection and assembly of data: C Huang, Y Peng, X Zheng; (V) Data analysis and interpretation: S Cai, Z Lin, Y Zheng, W Zheng; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work.

Correspondence to: Fengying Peng, BM. Department of Pathology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, No. 420 Fuma Road, Fuzhou 350014, China. Email: 446489348@qq.com; Yuanji Xu, PhD. Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, No. 420 Fuma Road, Fuzhou 350014, China; Fujian Key Laboratory of Advanced Technology for Cancer Screening and Early Diagnosis, Fujian Cancer Hospital, Fuzhou, China. Email: xuyuanji@fjmu.edu.cn.

Background: Anoikis resistance is a critical feature enabling cancer cells to survive during detachment from the extracellular matrix. This study aimed to identify and validate anoikis-related differentially expressed genes (ARDEGs) in nasopharyngeal carcinoma (NPC), providing new insights into the molecular mechanisms underlying NPC progression and potential therapeutic targets.

Methods: Four gene expression datasets from the Gene Expression Omnibus (GEO) database were integrated to form the GEO-Combined dataset. NPC and adjacent normal nasopharyngeal tissues comprising the Test_Data were subjected to RNA sequencing. The differentially expressed genes (DEGs) from the GEO-Combined and Test_Data datasets were screened. DEGs associated with anoikis were identified and termed as ARDEGs. The key genes were validated by quantitative real-time polymerase chain reaction (qRT-PCR).

Results: A total of 104 ARDEGs were identified in our study. Five key genes (i.e., PLAUR, PTGS2, SERPINE1, CHI3L1, and ITGAV) were identified using the random forest (RF) and least absolute shrinkage and selection operator (LASSO) algorithms. A nomogram based on these five key genes showed robust diagnostic performance, with the area under the curve (AUC) underscoring its utility as a prognostic tool. Further, the functional enrichment analysis indicated that the risk model was associated with the biological pathways involved in tumor migration and invasion. Based on the model constructed from the five key genes, our study found 152 pairs of messenger RNA (mRNA)-transcription factor (TF) interaction relationships, which may provide insights into the mechanisms of metastasis and recurrence of NPC.

Conclusions: The identification and validation of ARDEGs in NPC highlighted critical molecular players in anoikis resistance, offering potential targets for therapeutic interventions. Our study provides a comprehensive understanding of the role of ARDEGs in NPC, paving the way for further research into targeted therapies for NPC.

Keywords: Nasopharyngeal carcinoma (NPC); anoikis; bioinformatics analysis; therapeutic targets; prognosis

Submitted Jun 12, 2025. Accepted for publication Jul 16, 2025. Published online Jul 27, 2025.

doi: 10.21037/tcr-2025-1263

Highlight box

Key findings

• This study identified 104 anoikis-related differentially expressed genes (ARDEGs) in nasopharyngeal carcinoma (NPC). Of these, five key genes were validated as central regulators of anoikis resistance. A predictive nomogram based on these genes demonstrated potential clinical utility for prognosis assessment.

What is known, and what is new?

• Anoikis resistance is a hallmark of cancer metastasis, enabling detached tumor cells to survive and colonize distant organs. NPC is characterized by high metastatic potential and poor clinical outcomes.

• This study identified 104 ARDEGs specific to NPC, including novel candidates (e.g., CHI3L1 and ITGAV) not previously associated with anoikis resistance in this cancer type. A prognostic nomogram integrating five key ARDEGs was proposed, offering a predictive tool for NPC patient stratification.

What is the implication, and what should change now?

• The five key genes (e.g., PLAUR and ITGAV) represent actionable therapeutic targets to disrupt anoikis resistance and metastasis in NPC. The nomogram model provides a framework for personalized prognosis prediction, potentially guiding treatment stratification.

• To translate these findings, further studies should validate the functional roles of these five key genes in anoikis resistance, examine the repurposing of existing inhibitors in NPC models, and validate the nomogram’s clinical utility through multicenter cohorts to optimize therapeutic strategies.

Introduction

Nasopharyngeal carcinoma (NPC) is a malignant tumor that arises from the epithelial cells of the nasopharynx, which is the upper part of the throat behind the nose. Despite being relatively rare globally, NPC is prevalent in certain regions, particularly in Southeast Asia (1), where it represents a significant health burden. NPC is characterized by its unique epidemiology, association with Epstein-Barr virus, and distinct pattern of metastasis and local invasion. The prognosis of NPC largely depends on the stage at the time of diagnosis and the presence of distant metastasis (2). However, the underlying molecular mechanisms that drive NPC progression and metastasis remain incompletely understood, posing a challenge for the development of effective therapeutic strategies.

Recent advancements in high-throughput technologies, such as RNA sequencing and microarray analysis, have enabled the comprehensive profiling of gene expression in various cancers, including NPC (3,4). Studies have identified numerous differentially expressed genes (DEGs) associated with NPC progression, metastasis, and resistance to therapy (5-9). Among these, the genes involved in the process of anoikis (a form of programmed cell death induced by detachment from the extracellular matrix) have garnered attention due to their critical role in cancer metastasis. Anoikis resistance (10) is a hallmark of metastatic cells, allowing them to survive during dissemination and colonize distant organs. The identification of anoikis-related DEGs (ARDEGs) in NPC could provide valuable insights into the mechanisms underlying NPC metastasis and offer potential therapeutic targets.

Several studies have explored the role of DEGs in NPC (11-13); however, comprehensive analyses of ARDEGs specifically are limited. The molecular interactions and pathways involving ARDEGs in NPC have not been fully elucidated, leaving a gap in our understanding of how these genes contribute to NPC progression and metastasis. Additionally, the validation of ARDEGs using independent datasets and experimental techniques, such as quantitative real-time polymerase chain reaction (qRT-PCR), remains limited. Addressing these gaps is crucial for advancing our knowledge of NPC biology and improving patient outcomes.

In this study, we aimed to identify and validate ARDEGs in NPC by integrating data from multiple sources and employing rigorous bioinformatics analyses. We retrieved expression profile datasets from the Gene Expression Omnibus (GEO) database and performed the differential expression analysis using the “limma” package of R software (version 4.2.2). Subsequently, we intersected the DEGs with the anoikis-related genes (ARGs) obtained from the GeneCards database to identify the ARDEGs. We conducted a gene set enrichment analysis (GSEA) and gene set variation analysis (GSVA) to explore the biological pathways and processes associated with the ARDEGs. To identify the key genes with diagnostic value, we employed the random forest (RF) algorithm and constructed a least absolute shrinkage and selection operator (LASSO) risk model. The diagnostic performance of the model was validated by receiver operating characteristic (ROC) curve analysis. Finally, we validated the expression of the key ARDEGs in NPC and normal tissues using qRT-PCR. The findings of this study provide a comprehensive understanding of the role of ARDEGs in NPC and highlight potential targets for therapeutic intervention. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1263/rc).

Methods

Data acquisition and prearrangement

We downloaded four expression profile datasets of NPC patients from the GEO database (14,15) (http://www.ncbi.nlm.nih.gov/geo/): GSE12452 (16), GSE61218 (17), GSE64634 (18), and GSE13597 (19). The datasets were all derived from Homo sapiens. The data platform of both the GSE12452 and GSE64634 datasets was GPL570 (HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array. The data platform of the GSE61218 dataset was GPL19061 Agilent-043965 custom human array oelinc_xw. While the data platform of the GSE13597 dataset was GPL96 (HG-U133A) Affymetrix Human Genome U133A Array. The probe name annotations for our dataset were based on the respective Gene Expression Omnibus Platform (GPL) platform files.

The R package “sva” (20) was used to remove the batch effects from the GSE12452, GSE61218, GSE64634, and GSE13597 datasets, resulting in an integrated GEO dataset (GEO-Combined). The integrated GEO-Combined dataset comprised 78 NPC samples and 23 control samples. Finally, the GEO-Combined dataset was normalized using the R package “limma” (21) and used as a validation set for the subsequent analysis.

Screening of ARDEGs between normal and tumor samples

To identify the potential mechanisms and related biological characteristics and pathways in NPC, we first used the “limma” package of R software to perform the differential analysis on the NPC self-sequencing dataset (Test_Data) and the GEO-Combined dataset, respectively. These analyses aimed to identify the DEGs between the NPC and control groups. The screening criteria were set as follows: |log₂fold change| ≥0.5 and P<0.05. The identified DEGs were categorized for further study. The differential analysis results were visualized using volcano plots generated using the R package “ggplot2”.

To identify the ARDEGs, we performed an online search of the GeneCards database (https://www.genecards.org/) using “anoikis” as the search term. We specifically retained only the anoikis genes related to “Protein Coding”, resulting in a list of 752 ARGs (detailed in table available at https://cdn.amegroups.cn/static/public/tcr-2025-1263-1.xlsx). We then analyzed the differences between the Test_Data and GEO-Combined datasets using the same criteria (i.e., |log₂fold change| ≥0.5 and P<0.05). The intersecting DEGs and ARGs were identified as the ARDEGs. A Venn diagram was drawn to illustrate the intersection, and a heat map of the expression levels was created using the R package “pheatmap”.

Analysis of the expression and mutation data of the ARDEGs (GSEA and GSVA)

To explore the metabolic pathways and biological processes (BPs) associated with the screened DEGs, a GSEA (22) was conducted to analyze the expression of all genes and their involvement in BPs between the different (NPC/control) groups in the Test_Data dataset. We also examined the connections between the affected cellular components (CCs) and molecular functions (MFs). The criteria for significantly enriched pathways were a P value <0.05 and a false discovery rate (FDR) q value <0.05. The Test_Data results were displayed using a mountain plot.

To assess whether different pathways were enriched among different samples, we obtained the “h.all.v7.4.symbols.gmt” gene set from the MSigDB (23) database. We then performed a GSVA (24) on all genes between the different (NPC/control) groups using the NPC dataset Test_Data. We also calculated the functional enrichment of the genes between the different groups. The screening standard for significant enrichment was set as P<0.05.

Screening of key genes, construction of LASSO risk model, and risk score

RF (25) is an algorithm that integrates multiple decision trees through the idea of ensemble learning. To screen for the key genes, we used the randomforest package (26) to construct a model based on the expression of the ARDEGs in the Test_Data dataset. The parameters for this model were set seed [234] and ntree =1,000. The “MeanDecreaseGini” represents the average decrease in the Gini coefficient, which indicates the impurity of a node. A higher Gini coefficient signifies greater impurity. Thus, the MeanDecreaseGini represents the average reduction in impurity of the variable-separated nodes across all trees. The larger the MeanDecreaseGini, the more important the variable was for our grouping. We performed five rounds of 10-fold cross-validation, using the cross-validation curve to balance the number of variables. The training set itself was used for cross-validation, and variables with relatively small errors were retained. Important variables were selected for the subsequent analysis based on the MeanDecreaseGini.

We used the R package “glmnet” (27), with set seed [500] as a parameter, to perform the LASSO (28,29) regression analysis based on the RF screening results. This allowed us to obtain the LASSO risk model and calculate the risk score (RiskScore). The calculation formula for the LASSO risk score (RiskScore) is expressed as follows:

$RiskScore = \sum_{i} [Coefficient ({gene}_{i}) \times mRNA expression ({gene}_{i})]$ [1]

To avoid overfitting, the run period for the LASSO regression analysis was set to 200, and a penalty term derived from the linear regression was incorporated (lambda × the absolute value of the coefficient) to mitigate model overfitting and enhance generalization. The outcomes of the LASSO regression analysis were illustrated through diagnostic model diagrams and variable trajectory plots. The ARDEGs included in the final LASSO regression model were identified as the key genes for our subsequent analysis.

We substituted the expression of the key genes in the Test_Data dataset into the LASSO risk score calculation formula to obtain the RiskScores for the Test_Data dataset. Similarly, the expression of the key genes in the GEO-Combined dataset was substituted into the LASSO risk score calculation formula to obtain the RiskScores for the GEO-Combined dataset.

Diagnostic performance of the LASSO risk model and expression validation of the key genes

The ROC curve (30) is a sophisticated analytical tool in the form of a graphical plot used to select the optimal model, reject suboptimal models, and determine the best threshold in the same model. The ROC curve serves as a composite metric that reflects the interplay between sensitivity and specificity for continuous variables, and visually illustrates the nuanced balance between these two parameters. To assess the diagnostic performance of the LASSO risk model for NPC, we employed the R package “pROC” to meticulously craft ROC curves for the LASSO risk scores (RiskScores) in both the Test_Data and GEO-Combined datasets. We then computed the area under the curve (AUC) to quantitatively evaluate the diagnostic efficacy of the LASSO risk scores in predicting the onset of NPC.

Subsequently, to explore the interplay among the key genes, the Spearman algorithm was employed to conduct a correlation analysis on the expression levels of the key genes in the Test_Data and GEO-Combined datasets. The results of the correlation analysis were visually represented in correlation heatmaps using the R package “pheatmap”.

Finally, to validate the expression differences of the key genes, we used the Mann-Whitney U test (Wilcoxon rank-sum test) to analyze the expression variations of the key genes between different (NPC/control) groups in the NPC, Test_Data, and GEO-Combined datasets. The findings of the differential analysis were then illustrated using violin plots generated using the R package “ggplot2”.

Genomic and functional analysis of key genes [Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses]

The GO (31) analysis is a commonly used approach for large-scale functional enrichment studies, encompassing BPs, MFs, and CCs. The KEGG (32) is a comprehensive database with information regarding genomics, biological pathways, diseases, and drugs. We applied the R package “clusterProfiler” (33) for the GO and KEGG annotation analyses of the key genes, using the following entry selection criteria: P<0.05 and FDR q value <0.05, which indicated statistical significance.

Protein-protein interaction (PPI) network of key genes

The PPI network comprises individual proteins that interact with one another, and play crucial roles in various BPs, including signal transduction, gene expression regulation, energy and substance metabolism, and cell cycle control. The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database (34) is a resource for searching known PPIs and predicting interactions between proteins. In this study, we used the STRING database to construct a PPI network related to key genes and visualized the PPI network model using Cytoscape (35) (version 3.9.1).

Understanding the structure of proteins is crucial for unraveling their functions. The Alphafold platform (https://www.alphafold.ebi.ac.uk/) (36) pioneered a computational method for predicting protein structures at atomic precision in the absence of homologous templates. The predicted structures cover 98.5% of known human proteins and a similar proportion for proteins from other organisms. We employed the Alphafold website to predict the protein structures of the key genes and presented the results accordingly.

Molecular interaction network of the key genes

The ENCORI database (37) is version 3.0 of the starBase database. It consolidates various interactions, including microRNA (miRNA)-noncoding RNA (ncRNA), miRNA-messenger RNA (mRNA), ncRNA-RNA, RNA-RNA, RNA-binding protein (RBP)-ncRNA, and RBP-mRNA, using cross-linking immunoprecipitation (CLIP)-sequencing and degradation profiling data specifically for plants. It offers multiple visualization interfaces for analyzing miRNA targets. We used the ENCORI database to predict the miRNAs associated with the key genes, applying a filtering criterion of pancancerNum >8 to identify relevant mRNA-miRNA interactions. The resulting interaction network was visualized using Cytoscape software.

The CHIPBase database (version 3.0) (38) (https://rna.sysu.edu.cn/chipbase/) identifies thousands of binding motif matrices and their binding sites from the chromatin immunoprecipitation (CHIP)-sequencing data of DNA-binding proteins. It predicts millions of transcriptional regulatory relationships between transcription factors (TFs) and genes. Using the CHIPBase database (version 3.0), we searched for the TFs that bind to the key genes. A filtering criterion of the sum of “number of samples found (upstream)” and “number of samples found (downstream)” >4 was employed to select the mRNA-TF interaction pairs, and the mRNA-TF interaction network was visualized using Cytoscape software.

Additionally, we used the ENCORI database to predict the RBPs associated with the key genes. Employing a filtering criterion of clusterNum >3, we selected the mRNA-RBP interaction pairs, and visualized the mRNA-RBP interaction network using Cytoscape software.

The Comparative Toxicogenomics Database (CTD) (39) (https://ctdbase.org/) is an innovative digital ecosystem, seamlessly linking information on chemicals, genes, phenotypes, diseases, and established toxicological insights. This comprehensive database serves as a valuable resource for unraveling intricate connections relevant to human health. Harnessing the capabilities of the CTD, we employed advanced computational methods to predict potential drugs or small molecular compounds that interact with key genes. A stringent selection criterion, based on a “Reference Count” >2, was used to meticulously select the mRNA-drug interaction pairs. Subsequently, the intricate network of mRNA-drug interactions was elegantly visualized using the Cytoscape software, providing valuable insights into potential therapeutic interventions at the molecular level.

Sample collection and pretreatment

NPC tissues (n=5) and adjacent normal nasopharyngeal tissues (n=5) were retrieved from patients via nasopharyngeal biopsy at the Fujian Cancer Hospital. These samples underwent RNA sequencing and were grouped as the Test_Data. The RNA sequencing procedure employed by our team has been described in detail previously (40). Additionally, 55 tissue samples, including NPC (n=35) and normal nasopharyngeal epithelium (n=20) tissue samples, were obtained from patients at the Fujian Cancer Hospital between January 2024 and March 2020. None of the participants in this study had undergone any treatment prior to undergoing nasopharyngoscopy. These 55 tissue samples were tested using qRT-PCR.

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Biomedical Ethics Committee of Fujian Cancer Hospital (No. K2024-338-01) and informed consent was taken from all the patients. The clinicopathological staging and classification of the patients were performed in accordance with the criteria of the 8^th edition of the American Joint Committee on Cancer.

qRT-PCR validation of the expression of the ARDEGs

The expression of the ARDEGs in the 35 NPC tissues and 20 normal tissues was verified by qRT-PCR (41). Among the 35 NPC tissue samples, one was excluded due to a diagnosis of lymphoma, and another was excluded due to local tissue hyperplasia. Thus, ultimately, 33 tissue samples were included in the analysis. The clinical data for the qRT-PCR of the NPC patients from Fujian Cancer Hospital are shown in Table S1. The primers of the five key genes are shown in Table S2. The internal reference gene was 18S-ribosomal RNA (rRNA). RTIII All-in-One Mix and dsDNase (Monad Biotech Co., Ltd., Shanghai, China) were used to reverse-transcribe 1 µg of total RNA into complementary DNA. The qRT-PCR validation was conducted on the StepOnePlus Real-Time PCR System (Applied Biosystems, Thermo Fisher Scientific Co., Ltd., Waltham, MA, USA) using Hieff^® qPCR SYBR^® Green Master Mix, High Rox (Yeasen Biotechnology Co., Ltd., Shanghai, China). The reaction conditions were as follows: 95 ℃ for 10 minutes, followed by 41 cycles of 95 ℃ for 15 seconds and 60 ℃ for 1 minute, and a final cycle at 95 ℃ for 15 seconds.

Statistical analysis

The data processing and analysis in this study were performed using R software (version 4.2.2). The continuous variables are presented as the mean ± standard deviation. The Wilcoxon rank-sum test was employed for comparisons between two groups, while the Kruskal-Wallis test was used for comparisons involving three or more groups. Unless explicitly specified, the results were derived by the Spearman correlation analysis to compute the correlation coefficients between distinct molecular entities. All statistical P values are reported as two-tailed, with statistical significance defined as a P value <0.05.

Results

Data preprocessing

The study flow chart is shown in Figure 1. We first used the R package “sva” to address the batch effects in the NPC GSE12452, GSE61218, GSE64634, and GSE13597 datasets. This process led to the creation of the consolidated GEO-Combined GEO dataset. To evaluate the effectiveness of the batch effect removal, we compared the datasets before and after correction through distribution box plots and principal component analysis (PCA) plots (Figure S1A-S1D). The results from the distribution box plots and PCA plots showed that following the batch effect removal, the batch effects in the merged GEO-Combined dataset were effectively eliminated among the samples (42).

Figure 1 Overall analysis flow chart. ARDEG, anoikis-related differentially expressed gene; ARG, anoikis-related gene; DEG, differentially expressed gene; GEO, Gene Expression Omnibus; GO, Gene Ontology; GSEA, gene set enrichment analysis; GSVA, gene set variation analysis; KEGG, Kyoto Encyclopedia of Genes and Genomes; LASSO, least absolute shrinkage and selection operator; miRNA, microRNA; mRNA, messenger RNA; PPI, protein-protein interaction; RBP, RNA-binding protein; ROC, receiver operating characteristic; TF, transcription factor.

Differential expression of ARDEGs in the GEO and Test_Data datasets

To analyze the differential gene expression between the different NPC/control groups in the NPC datasets, we performed a differential analysis using the “limma” package on both the Test_Data and GEO-Combined datasets. This analysis resulted in the identification of the DEGs between the NPC and control groups.

In the Test_Data dataset, a total of 20,813 DEGs were identified. Among these genes, 2,320 showed higher expression in the NPC group, and 2,895 genes showed lower expression. The volcano plot showing the differential analysis results for the Test_Data dataset is presented in Figure 2A. In the GEO-Combined dataset, a total of 11,836 DEGs were identified. Among these genes, 914 exhibited higher expression in the NPC group, and 830 genes showed lower expression. The volcano plot showing the differential analysis results for the GEO-Combined dataset is presented in Figure 2B.

Figure 2 Identification of ARDEGs. (A) Volcano plot depicting the analysis of the DEGs between the different (NPC/control) groups in the Test_Data dataset. (B) Volcano plot illustrating the analysis of the DEGs between the different (NPC/control) groups in the GEO-Combined dataset. (C) Venn diagram showcasing the overlap between the DEGs and ARGs in the Test_Data and GEO-Combined datasets. (D) Heatmap visualizing the expression patterns of the ARDEGs between the different (NPC/control) groups in the Test_Data dataset. (E) Heatmap illustrating the expression patterns of the ARDEGs between the different (NPC/control) groups in the GEO-Combined dataset. ARDEG, anoikis-related differentially expressed gene; ARG, anoikis-related gene; DEG, differentially expressed gene; GEO, Gene Expression Omnibus; NPC, nasopharyngeal carcinoma.

To identify the ARDEGs, we intersected the DEGs from the Test_Data and GEO-Combined datasets. This intersection yielded 104 ARDEGs (as detailed in tables available at https://cdn.amegroups.cn/static/public/tcr-2025-1263-2.xlsx, https://cdn.amegroups.cn/static/public/tcr-2025-1263-3.xlsx), and the results were visualized using a Venn diagram (Figure 2C). Based on the Venn diagram results, we ranked the log fold change values of the 104 ARDEGs between the different (NPC/control) groups in the Test_Data and GEO-Combined datasets in descending order. Subsequently, we used the “pheatmap” package in R to create a heatmap, thereby visually representing the results of the specific differential analysis (Figure 2D,2E).

GSEA

To ascertain the effect of gene expression levels on the development of NPC between the distinct (NPC/control) groups in the Test_Data dataset, we conducted a GSEA. The significantly enriched pathways identified from the Test_Data dataset were visualized in ridge plots (Figure 3A). The results revealed that the genes between the different (NPC/control) groups in the Test_Data dataset were notably enriched in pathways such as the hedgehog signaling pathway, anchoring fibril formation, met promotes cell motility, and assembly of collagen fibrils and other multimeric structures (Figure 3B-3E). To ensure the clarity of the figure, only the top four pathways ranked by normalized enrichment score (NES) are displayed. The detailed list can be found in Table S3.

Figure 3 GSEA. (A) Ridge plots illustrating the top four enriched biological features identified in the GSEA for the genes between the different (NPC/control) groups in the Test_Data dataset. Enriched genes in the Test_Data dataset were prominently associated with pathways such as the hedgehog signaling pathway (B), anchoring fibril formation (C), MET promotes cell motility pathway (D), and assembly of collagen fibrils and other multimeric structures (E). FDR, false discovery rate; GSEA, gene set enrichment analysis; KEGG, Kyoto Encyclopedia of Genes and Genomes; MET, mesenchymal-epithelial transition; NES, normalized enrichment score; NPC, nasopharyngeal carcinoma.

GSVA

To explore the distinctiveness of the h.all.v7.4.symbols.gmt gene set of the MSigDB database in the different (NPC/control) groups in the Test_Data dataset, we conducted a GSVA on all the genes in the dataset. Subsequently, we examined the expression differences of the top 20 pathways with an adjusted P<0.05 (Table S4) between the NPC and control groups. The findings were visually presented in a heatmap (Figure 4A) and group comparison plots (Figure 4B).

Figure 4 GSVA. Comprehensive numerical heatmap (A) and group comparison boxplots (B) showing the intricacies of the GSVA results in the NPC and control groups of the Test_Data dataset. The symbol “ns” is equivalent to P≥0.05, indicating no statistical significance. *, P<0.05; **, P<0.01. GSVA, gene set variation analysis; NPC, nasopharyngeal carcinoma.

The outcomes of the GSVA indicated that among the 20 pathways examined, 13 exhibited statistically significant differences (P<0.05) between the NPC and control groups. Specifically, these pathways included diverse BPs, such as hallmark estrogen response early, hallmark xenobiotic metabolism, hallmark cholesterol homeostasis, hallmark DNA repair, hallmark ultraviolet (UV) response up, hallmark G2/M phase checkpoint (G2M), hallmark interferon alpha response, hallmark notch signaling, hallmark E2 transcription factor targets (E2F), hallmark tumor necrosis factor alpha (TNFA) signaling via nuclear factor kappa B (NFKB), hallmark MYC proto-oncogene targets version 2 (MYC targets V2), hallmark epithelial mesenchymal transition, and hallmark angiogenesis.

The construction of the LASSO model and the screening of key genes

To identify the key genes with diagnostic value in the Test_Data dataset, we initially employed the RF algorithm to analyze the expression levels of the 104 ARDEGs in the NPC and control groups. Setting the seed to 234 and the number of decision trees to 1,000, we generated a decision tree error curve plot (Figure 5A). The results indicated that the error reached its minimum and stabilized when the number of decision trees was around 100.

Figure 5 Selection of the key genes and construction of the LASSO risk model. (A) Model training error plot for the RF algorithm. (B) Scatter plot of MeanDecreaseGini for the ARDEGs (displaying the top 30 in descending order). (C) Cross-validation error curve plot. (D) Diagnostic model plot for the LASSO regression model. (E) Variable trajectory plot for the LASSO regression model. (F) Forest plot of the key genes in the LASSO regression model. ARDEG, anoikis-related differentially expressed gene; coef, coefficient; LASSO, least absolute shrinkage and selection operator; RF, random forest.

Subsequently, we plotted a scatter plot (Figure 5B) depicting the MeanDecreaseGini values of the top 30 ARDEGs arranged in descending order. The MeanDecreaseGini represents the average reduction in the Gini coefficient, where a higher Gini coefficient indicates lower purity and higher impurity in a node. Therefore, a larger MeanDecreaseGini suggested that a gene had greater importance in our (NPC/control) grouping, indicating that it had a more significant effect on the diagnosis of NPC, the disease under study.

Afterwards, we conducted five rounds of 10-fold cross-validation and generated a cross-validation error curve (Figure 5C) to guide the selection of the optimal number of genes. The graph indicates that the model error is relatively small when the number of genes is 69, and it tends to stabilize with an increasing number of genes. Combining this information with the MeanDecreaseGini, we identified specific genes for further analysis. The algorithm identified 69 ARDEGs that significantly affect the diagnosis of NPC (as detailed in Table S5).

Subsequently, leveraging the 69 ARDEGs identified through the RF algorithm, we proceeded to construct a LASSO (43) risk model by regression analysis. The results of the LASSO regression analysis were visually presented by generating a LASSO regression model plot (Figure 5D) and LASSO variable trajectory plot (Figure 5E). The results revealed that the LASSO risk model comprises a total of five ARDEGs; that is, PLAUR, PTGS2, SERPINE1, CHI3L1, and ITGAV. These genes were identified as key genes for our subsequent investigation, and a forest plot depicting these key genes was generated (Figure 5F).

Finally, we applied the RiskScore formula to calculate the risk scores for the key genes using the expression levels in both the Test_Data and GEO-Combined datasets. The RiskScore calculation formula is expressed as follows:

$\begin{array}{l} RiskScore = 0.064 \times P L A U R + 0.256 \times P T G S 2 + 3.187 \times S E R P I N E 1 \\ + 0.061 \times C H I 3 L 1 + 1.499 \times I T G A V \end{array}$ [2]

Diagnostic performance of the LASSO risk model and expression validation of the key genes

The R package “pROC” was used to construct the ROC curves derived from the LASSO risk scores of the Test_Data and GEO-Combined datasets, thereby affirming the diagnostic efficacy of the LASSO risk model (Figure 6A,6B). The ROC analysis revealed that the LASSO risk score expression in the Test_Data dataset exhibited high diagnostic accuracy for NPC (Figure 6A; AUC =1). Similarly, a ROC curve was generated based on the LASSO risk score in the GEO-Combined dataset. The ROC curve showed that the LASSO risk score had a certain level of accuracy in diagnosing NPC in the GEO-Combined dataset (Figure 6B; AUC =0.855).

Figure 6 Diagnostic performance of the LASSO risk model and expression validation of the key genes. ROC curves showing the diagnostic performance of the LASSO risk scores in the Test_Data dataset (A) and GEO-Combined dataset (B). Correlation heatmaps illustrating the relationships among key genes in the Test_Data (C) and GEO-Combined (D) datasets. Group comparison plots displaying the expression differences of the key genes between the different (NPC/control) groups in the Test_Data (E) and GEO-Combined (F) datasets. CHI3L1 (G), PTGS2 (H), SERPINE1 (I), PLAUR (J), and ITGAV (K) expression in NPC and adjacent normal tissues was evaluated by qRT-PCR. The symbol “ns” is equivalent to P≥0.05, indicating no statistical significance. *, P<0.05; **, P<0.01; ***, P<0.001. AUC, area under the curve; CI, confidence interval; FPR, false positive rate; GEO, Gene Expression Omnibus; LASSO, least absolute shrinkage and selection operator; NPC, nasopharyngeal carcinoma; qRT-PCR, quantitative real-time polymerase chain reaction; ROC, receiver operating characteristic; TPR, true positive rate.

Subsequently, we conducted a correlation analysis on the expression of the five key genes (PLAUR, PTGS2, SERPINE1, CHI3L1, and ITGAV) in the Test_Data and GEO-Combined datasets, and then generated a correlation heatmap (Figure 6C,6D). The heatmap showed a predominantly significant positive correlation among the majority of the genes. In the Test_Data dataset, the correlation between PTGS2 and ITGAV was the strongest (r=0.891, P<0.001). In the GEO-Combined dataset, the strongest correlation was observed between PLAUR and SERPINE1 (r=0.561, P<0.001).

To validate the differences in the expression of the five key genes in the NPC datasets across the different (NPC/control) groups, we analyzed the specific expression levels of these key genes in the Test_Data and GEO-Combined datasets. Using the Wilcoxon rank-sum test, we examined the expression differences of the five key genes between the NPC and control groups in both datasets. The results of the expression difference analysis were visually represented using grouped violin plots (Figure 6E,6F). The grouped violin plots indicated statistically significant differences (P<0.01) in the expression of the five key genes between the NPC and control groups in the Test_Data dataset (Figure 6E) and the GEO-Combined dataset (Figure 6F). Additionally, the expression trends of the five key genes were consistent in both datasets. Using qRT-PCR, we found that the expression levels of CHI3L1, PTGS2, and SERPINE1 were significantly higher in the cancer tissues than the normal tissues (all P<0.05) (Figure 6G-6I). However, the expression levels of PLAUR and ITGAV did not differ significantly between the cancer and normal tissues (Figure 6J,6K).

Genomic and functional analysis of key genes (GO and KEGG enrichment analyses)

To further explore the biological functions of the five key genes (i.e., PLAUR, PTGS2, SERPINE1, CHI3L1, and ITGAV), we conducted GO and KEGG enrichment analyses. The specific results are detailed in Table S6 (for the GO enrichment analysis) and Table S7 (for the KEGG enrichment analysis).

The outcomes indicate that the five key genes were predominantly enriched in BPs, such as the negative regulation of the apoptotic signaling pathway and the regulation of transforming growth factor beta production. Additionally, they were enriched in CCs such as the specific granule, secretory granule lumen, cytoplasmic vesicle lumen, and vesicle lumen. While the enriched MFs included protease binding, insulin-like growth factor I binding, opsonin binding, and fibroblast growth factor binding.

The GO and KEGG enrichment analysis results were visualized in a bubble plot (Figure S2A). Simultaneously, the BPs, CCs, MFs, and biological pathways (KEGG) were depicted in network plots based on the GO and KEGG enrichment analyses (Figure S2B-S2E). The connections indicate the corresponding molecules and annotations for each entry, with larger nodes representing a higher number of molecules encompassed by the respective entry.

PPI network

A PPI analysis of the five key genes (i.e., PLAUR, PTGS2, SERPINE1, CHI3L1, and ITGAV) was performed using the STRING database. A PPI network (Figure S3A) comprising these five key genes was then constructed.

Subsequently, a functional similarity analysis of the five key genes was conducted. Using the R package “GOSemSim”, we computed the semantic similarity among the GO terms, sets of GO terms, gene products, and gene clusters. The results revealed the functional similarity among the five key genes, particularly highlighting that PLAUR had the highest semantic similarity values with other crucial genes. These findings were visually represented in a boxplot in Figure S3B, illustrating the functional relationships among the key genes.

To analyze the genomic locations of the five key genes on human chromosomes, we employed the “RCircos” package to perform the positional annotation of these genes (Figure S3C). As the figure shows, these key genes were predominantly situated on chromosomes 1, 2, 7, and 19. Notably, chromosome 1 had the highest concentration, hosting a total of two key genes. The proximity of these key genes on the chromosomes suggests a close genomic-level association, particularly for those positioned on chromosome 1. To visualize the molecular structure, we used the AlphaFold website to analyze and showcase the protein structures of the five key genes (Figure S3D-S3H).

Network of molecular interactions among the key genes

To examine the interactions among five key genes (i.e., PLAUR, PTGS2, SERPINE1, CHI3L1, and ITGAV) and other molecules, we first used the ENCORI database to predict the associated miRNAs. We then filtered the mRNA pairs using a stringent criterion of pancancerNum >8. The resulting mRNA-miRNA interaction network, visualized in Cytoscape (Figure S4A), included five mRNAs and 87 miRNAs, totaling 111 interaction pairs (for further details, see table available at https://cdn.amegroups.cn/static/public/tcr-2025-1263-4.xlsx).

Next, we searched the CHIPBase database to identify the TFs that bind to these genes, applying a filter for mRNA-TF interactions with significant sample counts. A stringent filtering criterion was applied, including only mRNA-TF interactions where the sum of the upstream and downstream sample counts >4. The mRNA-TF interaction network (Figure S4B) comprises five mRNAs and 80 TFs, forming 152 interaction pairs (see table available at https://cdn.amegroups.cn/static/public/tcr-2025-1263-5.xlsx).

Additionally, ENCORI was used to predict the RBP interactions based on the criterion of a clustersNum >3. The resulting mRNA-RBP network (Figure S4C) comprises five mRNAs and 86 RBPs, yielding 121 interaction pairs (see table available at https://cdn.amegroups.cn/static/public/tcr-2025-1263-6.xlsx).

Finally, we used the CTD database to identify the drugs that might interact with the key genes based on the criterion of a mRNA-drug interaction reference count >2. The mRNA-drug interaction network (Figure S4D) includes five mRNAs and 57 drugs, resulting in 71 interaction pairs (see table available at https://cdn.amegroups.cn/static/public/tcr-2025-1263-7.xlsx).

Discussion

This study sought to identify and validate the ARDEGs in NPC. To achieve this objective, we compared the RNA sequencing data derived from NPC tissues to that derived from normal nasopharyngeal tissues, and used the “limma” package to identify the DEGs. These DEGs were then cross-referenced with the Gene Cards database to identify the ARGs. Further validation was conducted by qRT-PCR on a larger cohort comprising NPC and normal tissue samples. Notably, this study identified 104 ARDEGs and significantly enriched pathways, such as the hedgehog signaling pathway and mesenchymal-epithelial transition (MET) factor promotes cell motility pathway, and constructed a LASSO risk model based on the five critical genes (i.e., PLAUR, PTGS2, SERPINE1, CHI3L1, and ITGAV). A robust LASSO risk model with high diagnostic accuracy was constructed based on the five key genes. This model had an AUC of 1 in the Test_Data dataset and an AUC of 0.855 in the GEO-Combined dataset, which confirmed its robustness. The validity of our findings was validated using another 55 samples, and we found that the expression levels of PLAUR, PTGS2, and SERPINE1 were significantly higher in the cancer tissues than the normal tissues (all P<0.05).

The GSEA revealed that the five key genes were mainly enriched in the hedgehog signaling pathway, anchoring fibril formation, and MET factor promotes cell motility pathway, and the assembly of collagen fibrils and other multimeric structures. A previous study showed that the hedgehog signaling pathway is essential for embryo development and tissue patterning in the human body (44). This pathway is silenced in most adult tissues; however, its aberrant activation has been documented in a variety of malignancies (45), and it has been reported to be activated in a ligand-dependent manner, contributing to carcinogenesis and cancer progression (46). Hedgehog signaling is reactivated in various types of cancer, and this contributes to cancer progression by facilitating proliferation, invasion, and cell survival (47). Another study showed that the anchoring fibril formation that connects the dermis is formed by mutations in COL7A1 encoding type VII collagen (48). Hwang et al. found that MET has significant roles in malignant tumor progression (49). Therefore, it is possible that the five key genes identified in our study are related to the development and metastasis of NPC.

To validate the differences in the expression of the five key genes in the NPC datasets across the different (NPC/control) groups, we analyzed the specific expression levels of these key genes in the Test_Data and GEO-Combined datasets. The expression trends of the five key genes were consistent in both datasets. This result was consistent with the validation of the 55 samples, where the expression of the five genes was higher in the tumor group than the normal tissue group. Using qRT-PCR, we found that the expression levels were statistically significant (all P<0.05). However, the expression of PLAUR and ITGAV in the validation samples did not differ significantly between the tumor and normal groups. The observed discrepancy may be attributed to potential RNA degradation in long-term stored clinical samples introducing bias in qRT-PCR detection (50). Besides, existing studies showed that PLAUR and ITGAV expression correlate with NPC molecular subtypes and malignancy grades (51,52). Our cohort likely contained a higher proportion of low-expression subtypes, potentially obscuring statistical significance. It would be valuable to conduct subtype-stratified validation in future studies.

CHI3L1 belongs to glycoside hydrolase family 18 (53). It has been shown to be overexpressed in numerous both human cancers and animal tumor models (54-57). Another study showed that the knockdown of CHI3L1 enhanced the proliferative capacity of NPC cells, potentially via the inactivation of the Akt pathway (58). One study found that SERPINE1 is a potent promoter of tumor progression (59). Research has demonstrated that the TF TEL2 inhibits the metastasis of NPC by downregulating SERPINE1, while the upregulation of SERPINE1 is associated with enhanced metastatic potential in NPC (60). Another study uncovered comparable findings that PTGS2 showed high levels of staining in head and neck squamous cell carcinoma (HNSCC). Further, the knockdown of PTGS2 was shown to significantly suppress the proliferation of NPC cells (61). Recently, a study showed that miR-26a-5p binds to the 3'-untranslated region of PTGS2, thus reducing PTGS2 protein levels and further inhibiting NPC development (62). Taken together, the findings of our study show some credibility and accuracy.

Our study also identified an mRNA-TF interaction network with five key genes. Comprising five mRNAs (PLAUR, PTGS2, SERPINE1, CHI3L1, and ITGAV), and 80 TF molecules, a total of 152 pairs of mRNA-TF interactions were formed. Many studies have identified mRNA-TFs that are relevant to tumorigenesis and metastasis. For example, Valencia et al. found that heterozygous Coffin-Siris syndrome (CSS)-associated SMARCB1 mutations result in dominant gene regulatory and morphologic changes during induced pluripotent stem cell (iPSC)-neuronal differentiation (63), while Radko-Juettner et al. showed that cancer results from the DCAF5-mediated degradation of SWI/SNF complexes (64). However, specific mechanism of action in NPC needs to be further investigated.

Our study had some limitations. For example, while the sample size of our study was adequate for initial identification and validation, this investigation focused on NPC as a holistic entity; future studies should both expand the cohort size and perform stratified validation experiments to confirm the generalizability of our results. Additionally, functional studies need to be conducted to elucidate the precise roles of the identified ARDEGs in NPC progression and anoikis resistance. Further, future research should be performed to explore the potential therapeutic implications of targeting these genes and pathways, including the development of novel treatments aimed at enhancing anoikis sensitivity in NPC cells.

Conclusions

This study identified and validated critical ARDEGs in NPC. Our findings provide valuable insights into the molecular mechanisms underlying NPC, and highlight some potential biomarkers and therapeutic targets. Future research should focus on further functional studies and clinical validation to translate these findings into effective diagnostic and therapeutic strategies for NPC.

Acknowledgments

We would like to sincerely thank the founders of the public databases, including GEO, GeneCards, MSigDB, KEGG, GO, STRING, ENCORI, CHIPBase, and CTD, for providing open access. The authors also appreciate the great support from Dr. Ligen Yu (Nanyang Technological University, Singapore) in improving the quality of this paper.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1263/rc

Data Sharing Statement: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1263/dss

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1263/prf

Funding: This study was supported in part by grants from the Fujian Clinical Research Center for Radiation and Therapy of Digestive, Respiratory, and Genitourinary Malignancies (No. 2021Y2014), the Joint Funds for the Innovation of Science and Technology, Fujian Province (No. 2024Y9597), and the National Clinical Key Specialty Construction Program.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1263/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All the patients involved in this study provided written informed consent, and the study was approved by the Biomedical Ethics Committee of Fujian Cancer Hospital (No. K2024-338-01). The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Bray F, Ferlay J, Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2018;68:394-424. [Crossref] [PubMed]
Bossi P, Chan AT, Licitra L, et al. Nasopharyngeal carcinoma: ESMO-EURACAN Clinical Practice Guidelines for diagnosis, treatment and follow-up Ann Oncol 2021;32:452-65. [Crossref] [PubMed]
Wen X, Liu X, Mao YP, et al. Long non-coding RNA DANCR stabilizes HIF-1α and promotes metastasis by interacting with NF90/NF45 complex in nasopharyngeal carcinoma. Theranostics 2018;8:5676-89. [Crossref] [PubMed]
Wen X, Tang X, Li Y, et al. Microarray Expression Profiling of Long Non-Coding RNAs Involved in Nasopharyngeal Carcinoma Metastasis. Int J Mol Sci 2016;17:1956. [Crossref] [PubMed]
Ding T, Zhang Y, Ren Z, et al. EBV-Associated Hub Genes as Potential Biomarkers for Predicting the Prognosis of Nasopharyngeal Carcinoma. Viruses 2023;15:1915. [Crossref] [PubMed]
Xu Y, Huang X, Ye W, et al. Comprehensive analysis of key genes associated with ceRNA networks in nasopharyngeal carcinoma based on bioinformatics analysis. Cancer Cell Int 2020;20:408. [Crossref] [PubMed]
Liu C, Ni C, Li C, et al. Lactate-related gene signatures as prognostic predictors and comprehensive analysis of immune profiles in nasopharyngeal carcinoma. J Transl Med 2024;22:1116. [Crossref] [PubMed]
Liu Y, Xie Y, Wang Y. Exploring lipid metabolism-associated gene biomarkers and their regulatory mechanisms in nasopharyngeal carcinoma. Cancer Biomark 2025;42:18758592241301683. [Crossref] [PubMed]
Tan Y, Zhou J, Liu K, et al. Novel prognostic biomarkers in nasopharyngeal carcinoma unveiled by mega-data bioinformatics analysis. Front Oncol 2024;14:1354940. [Crossref] [PubMed]
Taddei ML, Giannoni E, Fiaschi T, et al. Anoikis: an emerging hallmark in health and diseases. J Pathol 2012;226:380-93. [Crossref] [PubMed]
Wang B, Wang W, Wang H, et al. Microarray Analysis of Novel Genes Involved in Nasopharyngeal Carcinoma. Bull Exp Biol Med 2021;170:658-64. [Crossref] [PubMed]
An F, Zhang Z, Xia M. Functional analysis of the nasopharyngeal carcinoma primary tumor-associated gene interaction network. Mol Med Rep 2015;12:4975-80. [Crossref] [PubMed]
Wang Y, Zou Y, Chen X, et al. Relevance of pyroptosis-associated genes in nasopharyngeal carcinoma diagnosis and subtype classification. J Gene Med 2024;26:e3653. [Crossref] [PubMed]
Barrett T, Troup DB, Wilhite SE, et al. NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res 2007;35:D760-5. [Crossref] [PubMed]
Davis S, Meltzer PS. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 2007;23:1846-7. [Crossref] [PubMed]
Sengupta S, den Boon JA, Chen IH, et al. Genome-wide expression profiling reveals EBV-associated inhibition of MHC class I expression in nasopharyngeal carcinoma. Cancer Res 2006;66:7999-8006. [Crossref] [PubMed]
Fan C, Wang J, Tang Y, et al. Upregulation of long non-coding RNA LOC284454 may serve as a new serum diagnostic biomarker for head and neck cancers. BMC Cancer 2020;20:917. [Crossref] [PubMed]
Bo H, Gong Z, Zhang W, et al. Upregulated long non-coding RNA AFAP1-AS1 expression is associated with progression and poor prognosis of nasopharyngeal carcinoma. Oncotarget 2015;6:20404-18. [Crossref] [PubMed]
Bose S, Yap LF, Fung M, et al. The ATM tumour suppressor gene is down-regulated in EBV-associated nasopharyngeal carcinoma. J Pathol 2009;217:345-52. [Crossref] [PubMed]
Leek JT, Johnson WE, Parker HS, et al. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 2012;28:882-3. [Crossref] [PubMed]
Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43:e47. [Crossref] [PubMed]
Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005;102:15545-50. [Crossref] [PubMed]
Liberzon A, Birger C, Thorvaldsdóttir H, et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 2015;1:417-25. [Crossref] [PubMed]
Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 2013;14:7. [Crossref] [PubMed]
Gruber HE, Hoelscher GL, Ingram JA, et al. Genome-wide analysis of pain-, nerve- and neurotrophin -related gene expression in the degenerating human annulus. Mol Pain 2012;8:63. [Crossref] [PubMed]
Liu Y, Zhao H. Variable importance-weighted Random Forests. Quant Biol 2017;5:338-51.
Engebretsen S, Bohlin J. Statistical predictions with glmnet. Clin Epigenetics 2019;11:123. [Crossref] [PubMed]
Cai W, van der Laan M. Nonparametric bootstrap inference for the targeted highly adaptive least absolute shrinkage and selection operator (LASSO) estimator. Int J Biostat 2020; Epub ahead of print. [Crossref]
Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 2010;33:1-22.
Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol 2010;5:1315-6. [Crossref] [PubMed]
Yu G. Gene Ontology Semantic Similarity Analysis Using GOSemSim. Methods Mol Biol 2020;2117:207-15. [Crossref] [PubMed]
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28:27-30. [Crossref] [PubMed]
Yu G, Wang LG, Han Y, et al. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 2012;16:284-7. [Crossref] [PubMed]
Szklarczyk D, Gable AL, Lyon D, et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 2019;47:D607-13. [Crossref] [PubMed]
Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003;13:2498-504. [Crossref] [PubMed]
Varadi M, Anyango S, Deshpande M, et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 2022;50:D439-44. [Crossref] [PubMed]
Li JH, Liu S, Zhou H, et al. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res 2014;42:D92-7. [Crossref] [PubMed]
Zhou KR, Liu S, Sun WJ, et al. ChIPBase v2.0: decoding transcriptional regulatory networks of non-coding RNAs and protein-coding genes from ChIP-seq data. Nucleic Acids Res 2017;45:D43-50. [Crossref] [PubMed]
Davis AP, Grondin CJ, Johnson RJ, et al. Comparative Toxicogenomics Database (CTD): update 2021. Nucleic Acids Res 2021;49:D1138-43. [Crossref] [PubMed]
Chen Y, Huang X, Zhu K, et al. LIMD2 is a Prognostic and Predictive Marker in Patients With Esophageal Cancer Based on a ceRNA Network Analysis. Front Genet 2021;12:774432. [Crossref] [PubMed]
Zhang X, Li X, Wang C, et al. Identification of markers for predicting prognosis and endocrine metabolism in nasopharyngeal carcinoma by miRNA-mRNA network mining and machine learning. Front Endocrinol (Lausanne) 2023;14:1174911. [Crossref] [PubMed]
Adamer MF, Brüningk SC, Tejada-Arranz A, et al. reComBat: batch-effect removal in large-scale multi-source gene-expression data integration. Bioinform Adv 2022;2:vbac071. [Crossref] [PubMed]
Sauerbrei W, Royston P, Binder H. Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Stat Med 2007;26:5512-28. [Crossref] [PubMed]
Xin M, Ji X, De La Cruz LK, et al. Strategies to target the Hedgehog signaling pathway for cancer therapy. Med Res Rev 2018;38:870-913. [Crossref] [PubMed]
Ruch JM, Kim EJ. Hedgehog signaling pathway and cancer therapeutics: progress to date. Drugs 2013;73:613-23. [Crossref] [PubMed]
Alizadeh H, Akbarabadi P, Dadfar A, et al. A comprehensive overview of ovarian cancer stem cells: correlation with high recurrence rate, underlying mechanisms, and therapeutic opportunities. Mol Cancer 2025;24:135. [Crossref] [PubMed]
Onishi H, Katano M. Hedgehog signaling pathway as a therapeutic target in various types of cancer. Cancer Sci 2011;102:1756-60. [Crossref] [PubMed]
Turczynski S, Titeux M, Tonasso L, et al. Targeted Exon Skipping Restores Type VII Collagen Expression and Anchoring Fibril Formation in an In Vivo RDEB Model. J Invest Dermatol 2016;136:2387-95. [Crossref] [PubMed]
Hwang S, Kim HE, Min M, et al. Epigenetic Silencing of SPINT2 Promotes Cancer Cell Motility via HGF-MET Pathway Activation in Melanoma. J Invest Dermatol 2015;135:2283-91. [Crossref] [PubMed]
Röder B, Frühwirth K, Vogl C, et al. Impact of long-term storage on stability of standard DNA for nucleic acid-based methods. J Clin Microbiol 2010;48:4260-2. [Crossref] [PubMed]
Luo Q, Long J, Hu L, et al. EBV Reactivation-associated gene signature predicts poor prognosis in nasopharyngeal carcinoma. J Transl Med 2025;23:616. [Crossref] [PubMed]
Ding Y, Pan Y, Liu S, et al. Elevation of MiR-9-3p suppresses the epithelial-mesenchymal transition of nasopharyngeal carcinoma cells via down-regulating FN1, ITGB1 and ITGAV. Cancer Biol Ther 2017;18:414-24. [Crossref] [PubMed]
Zhao T, Su Z, Li Y, et al. Chitinase-3 like-protein-1 function and its role in diseases. Signal Transduct Target Ther 2020;5:201. [Crossref] [PubMed]
Bergmann OJ, Johansen JS, Klausen TW, et al. High serum concentration of YKL-40 is associated with short survival in patients with acute myeloid leukemia. Clin Cancer Res 2005;11:8644-52. [Crossref] [PubMed]
Schmidt H, Johansen JS, Gehl J, et al. Elevated serum level of YKL-40 is an independent prognostic factor for poor survival in patients with metastatic melanoma. Cancer 2006;106:1130-9. [Crossref] [PubMed]
Cintin C, Johansen JS, Christensen IJ, et al. High serum YKL-40 level after surgery for colorectal carcinoma is related to short survival. Cancer 2002;95:267-74. [Crossref] [PubMed]
Jensen BV, Johansen JS, Price PA. High levels of serum HER-2/neu and YKL-40 independently reflect aggressiveness of metastatic breast cancer. Clin Cancer Res 2003;9:4423-34.
Li D, Fan G, Zhou Y. Chitinase 3 like-1 activates the Akt pathway, inducing NF-κB-dependent release of pro-inflammatory cytokines and promoting the proliferative ability in nasopharyngeal carcinoma cells. Cytokine 2024;179:156631. [Crossref] [PubMed]
Wehbe S, Gallwas J, Gründker C. Inhibition of Plasminogen Activator Inhibitor-1 (PAI-1) by Tiplaxtinin Reduces Aggressiveness of Cervical Carcinoma Cells. Anticancer Res 2025;45:1793-805. [Crossref] [PubMed]
Sang Y, Chen MY, Luo D, et al. TEL2 suppresses metastasis by down-regulating SERPINE1 in nasopharyngeal carcinoma. Oncotarget 2015;6:29240-53. [Crossref] [PubMed]
Mizokami H, Okabe A, Choudhary R, et al. Enhancer infestation drives tumorigenic activation of inactive B compartment in Epstein-Barr virus-positive nasopharyngeal carcinoma. EBioMedicine 2024;102:105057. [Crossref] [PubMed]
Cai B, Qu X, Kan D, et al. miR-26a-5p suppresses nasopharyngeal carcinoma progression by inhibiting PTGS2 expression. Cell Cycle 2022;21:618-29. [Crossref] [PubMed]
Valencia AM, Collings CK, Dao HT, et al. Recurrent SMARCB1 Mutations Reveal a Nucleosome Acidic Patch Interaction Site That Potentiates mSWI/SNF Complex Chromatin Remodeling. Cell 2019;179:1342-1356.e23. [Crossref] [PubMed]
Radko-Juettner S, Yue H, Myers JA, et al. Targeting DCAF5 suppresses SMARCB1-mutant cancer by stabilizing SWI/SNF. Nature 2024;628:442-9. [Crossref] [PubMed]

(English Language Editor: L. Huleatt)

Cite this article as: Huang C, Peng Y, Zheng X, Cai S, Lin Z, Zheng Y, Zheng W, Peng F, Xu Y. Identification and validation of anoikis-related differentially expressed genes in nasopharyngeal carcinoma. Transl Cancer Res 2025;14(7):4429-4446. doi: 10.21037/tcr-2025-1263

Identification and validation of anoikis-related differentially expressed genes in nasopharyngeal carcinoma

Highlight box

Introduction

Methods

Data acquisition and prearrangement

Screening of ARDEGs between normal and tumor samples

Analysis of the expression and mutation data of the ARDEGs (GSEA and GSVA)

Screening of key genes, construction of LASSO risk model, and risk score

Diagnostic performance of the LASSO risk model and expression validation of the key genes

Genomic and functional analysis of key genes [Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses]

Protein-protein interaction (PPI) network of key genes

Molecular interaction network of the key genes

Sample collection and pretreatment

qRT-PCR validation of the expression of the ARDEGs

Statistical analysis

Results

Data preprocessing

Differential expression of ARDEGs in the GEO and Test_Data datasets

GSEA

GSVA

The construction of the LASSO model and the screening of key genes

Diagnostic performance of the LASSO risk model and expression validation of the key genes

Genomic and functional analysis of key genes (GO and KEGG enrichment analyses)

PPI network

Network of molecular interactions among the key genes

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share