The identification and prediction of lung adenocarcinoma prognosis using a novel gene signature associated with DNA replication
Highlight box
Key findings
• DNA replication is the cellular process most susceptible to DNA damaging and to risk of carcinogenesis. DNA replication-related genes classification may serve as good prognostic indicators of lung adenocarcinoma (LUAD).
What is known, and what is new?
• LUAD can be evaluated using DNA replication-related genes.
• This study identified LUAD subtypes and biomarkers associated with DNA replication.
What is the implication, and what should change now?
• Our results offer new perspectives for the diagnosis and therapeutic strategies of LUAD, potentially advancing clinical approaches to this disease.
Introduction
Lung cancer is the most prevalent and lethal type of cancer, and lung adenocarcinoma (LUAD) accounts for approximately 60% of non-small cell lung cancer cases (1,2). Patients with LUAD usually undergo treatments such as surgery, chemotherapy, radiation therapy, and molecular targeted therapy. However, due to rapid tumor growth and resistance to existing treatments, only about 30% of LUAD patients survive beyond five years (1,3). Consequently, it is critical to enhance understanding of the mechanisms underlying LUAD.
In addition to curative surgeries like lobectomy or segmental lung resection with or without adjuvant chemotherapy, or radiotherapy, more precise treatments like epidermal growth factor receptor (EGFR)-targeted therapy and immunotherapy have enhanced the survival of LUAD patients (4). Moreover, Kirsten rat sarcoma 2 viral oncogene homolog (KRAS) mutant lung cancer is the most prevalent form of lung adenocarcinoma (LUAD) (3). However, the prognosis of LUAD patients continues to be unfavorable, and the 5-year overall survival (OS) rate of LUAD patients is still only approximately 16% (1-4). Based on the most recent the tumor-node-metastasis (TNM) staging system, the 5-year recurrence rates vary from around 20% in stage I to 50% in stage III (5,6). Thus, it is essential to extend our understanding of the potential mechanisms driving LUAD progression to improve patient prognosis.
Genome instability is a characteristic feature of cancer, and DNA replication is the cellular process most susceptible to it (7). High levels of DNA damage can lead to replication stress, causing genome instability (7). A novel and promising therapeutic strategy focusing on enhancing the integration of damaged deoxynucleoside triphosphates (dNTPs) in cancer cells has been suggested. Targeting NUDT1 (also known as MTH1), a protein that prevents the misincorporation of oxidized dNTPs during replication and is essential in cancer cells but dispensable in normal cells, can lead to the selective killing of cancer cells (7-9). Understanding the molecular mechanisms underlying replication stress is essential in understanding tumorigenesis. We performed a systematic analysis to assess the levels of DNA replication expression in LUAD, examined the relationship between DNA replication and the tumor immune microenvironment, and evaluated its prognostic significance. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2024-2536/rc)
Methods
The Cancer Genome Atlas (TCGA)-LUAD dataset
TCGA-LUAD dataset was obtained from TCGA database (https://portal.gdc.cancer.gov/), and the LUAD RNA-sequencing data were analyzed. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).
Identification of prognostic genes in LUAD
A univariate Cox analysis was conducted to identify the genes associated with the prognosis of LUAD patients. Based on P values, the top 20 genes were shown. Using the R package ClusterProfiler, a Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was conducted to identify the signaling pathways associated with the prognostic genes in LUAD.
Identification of tumor subtypes based on the DNA replication-related pathways in LUAD
Consensus ClusterPlus (version 1.54.0) was used to perform a consensus clustering analysis of TCGA-LUAD dataset to investigate the expression of the DNA replication-related genes and LUAD subtypes. Using the R package pheatmap (version 1.0.12), a heatmap was generated by setting the cluster variable (k) from 2 to 6. A Kaplan-Meier analysis was conducted to compare the OS between the subgroups.
Identification of DNA replication-related differentially expressed genes (DEGs) in LUAD
The DEseq2 R package was used to identify the DEGs between the different LUAD clusters. A P value <0.05 and log2 fold change >1 was considered significant. For the visualization of the DEGs, the ggplot2 R package was used to create volcano plots. Based on data from the pheatmap R package (version 1.0.12), a heatmap of the DEGs was generated. To analyze the Gene Ontology (GO) and KEGG terms, the ClusterProfiler package was used.
Immune activity between the two DNA replication-related clusters in LUAD
An assessment of immune activity was conducted by CIBERSORT using the genes related to DNA replication. Using eight immune checkpoint-related genes (i.e., CD274, PDCD1, PDCD1LG2, CTLA4, LAG3, HAVCR2, TIGIT, and SIGLEC15), two DNA replication clusters were compared in terms of immune activity. Heatmaps and boxplots were generated using the R packages heatmaps and ggplot2. The infiltration of immune cells and the activation of immune pathways were compared using the Wilcox test. A P value <0.05 was considered statistically significant.
Development of the DNA replication-related gene prognostic model
TCGA-LUAD cohort prognostic model construction was carried out using the “glmnet” R package to evaluate the prognostic value of the DNA replication-related genes. The lambda condition was determined by the minimum criteria for variables with non-zero coefficients. The risk score was calculated using the following formula: Risk score = the expression level of each gene × the corresponding coefficient. Patients in TCGA-LUAD dataset were classified into high- and low-subgroups based on the median risk score. Using the “survival” R package, we compared the OS among high- and low-risk groups. A Cox proportional hazards analysis was conducted to determine the hazard ratios (HRs) with confidence intervals (CIs).
Statistical analysis
A survival analysis was performed to determine independent prognostic factors for LUAD, establishing a significance threshold at P<0.05. After adjusting for multiple comparisons, two-tailed P values below 0.05 were considered statistically significant.
Results
Identification of KEGG pathways and prognostic genes in LUAD
A univariate Cox analysis was conducted to identify the genes related to prognosis in LUAD patients. A total of 2,412 prognostic genes for LUAD were examined in the univariate Cox analysis. The top 20 genes are depicted according to their P values (Figure 1A). A KEGG enrichment analysis was then conducted using these prognostic genes for LUAD. The KEGG enrichment analysis indicated that pathways such as metabolic pathways, DNA replication, proteasome, and the cell cycle may play significant roles in LUAD (Figure 1B). Given the close association between DNA replication and cancer, our investigation then evaluated the functional implications of DNA replication status in patients with LUAD.

LUAD subtypes based on the DNA replication-related genes
DNA replication has been found to be closely associated with tumor development (7-9), but further research is needed to determine the role of DNA replication in LUAD. A consensus clustering analysis was carried out using the identified DNA replication-related genes (i.e., FEN1, MCM5, POLD2, MCM4, MCM6, SSBP1, POLE2, RFC2, MCM2, PCNA, POLA2, MCM7, RFC3, POLE4, and RPA3) to further stratify the TCGA-LUAD patient cohort.
A consistency cluster analysis and a principal component analysis of the TCGA-LUAD patient cohort were conducted, and two clusters were distinguished (Figure 2A,2B). The gene expression profile of 15 DNA replication-related genes was shown in a heatmap between the two risk groups of TCGA-LUAD patients, the high- (G1) and low- (G2) risk group (Figure 2C). A statistically significant difference was found between clusters 1 and 2 in terms of OS (HR: 1.532, 95% CI: 1.136–2.064, P=0.005; Figure 2D).

Identification of the underlying mechanisms between the two risk groups of LUAD patients
To examine the distinct mechanisms underlying the two risk groups, we identified the DEGs using the following criteria: P<0.05 and |log2 fold change| >1. The results were depicted in a volcano plot [Group 1 (G1) vs. Group 2 (G2); Figure 3A]. The expression patterns of the top 50 DEGs demonstrated diverse trends across the two clusters as shown in the heatmap (Figure 3B).

To further explore the biological significance of these DEGs, we performed GO and KEGG enrichment analyses. The results of the GO analysis for the biological processes revealed that the upregulated genes were mainly related to the hallmarks of cancer initiation and progression (e.g., chromosome segregation, DNA replication, the cell cycle checkpoint, and DNA helicase activity) (Figure 3C), while the downregulated genes were mainly related to leukocyte activation involved in inflammatory macrophage activation, and passive transmembrane transporter activity (Figure 3D). Similarly, the results of the KEGG analysis showed that the upregulated genes were primarily associated with the cell cycle, p53 signaling pathway, DNA replication, and base excision repair (Figure 3E), while the downregulated genes were primarily associated with arachidonic acid metabolism, linoleic acid metabolism, and complement and coagulation cascades (Figure 3F). Many studies have shown that chromosome segregation, DNA replication, and the cell cycle checkpoint are tumor markers (10,11). The above results indicated that the tumor migration and proliferation ability of the tumor cells tend to be stronger in the G1 LUAD subtype than the G2 LUAD subtype.
Immune activity between two DNA replication-related groups in LUAD
Numerous studies have reported a significant link between DNA replication and immune activity across various cancers (6-8). Initially, a Spearman correlation analysis was conducted to assess the relationship between the expression levels of DNA replication-related genes and immune scores. Several DNA replication-related genes (e.g., FEN1, MCM5, POLD2, MCM4, MCM6, SSBP1, POLE2, RFC2, MCM2, PCNA, POLA2, MCM7, RFC3, POLE4, and RPA3) were found to be strongly correlated with various immune cell types. We then compared the immune activity between the two DNA replication-related clusters in LUAD patients. The boxplots revealed significant differences in the immune cells, including the B cells, endothelial cells, natural killer (NK) cells, cluster of differentiation (CD)4+ T cells, and CD8+ T cells, between the G1 LUAD samples and G2 LUAD samples (Figure 4A).

Additionally, the boxplots showed that 5 of the 10 immune checkpoint inhibitor (ICI)-related genes (i.e., CD274, LAG3, PDCD1, PDCD1LG2, and SIGLEC15) were more highly expressed in the G1 LUAD samples than in the G2 LUAD samples (Figure 4B). These findings suggest a strong association between DNA replication processes and immune activity.
The tumor stemness of the two LUAD subtypes
Cancer stem cells (CSCs) are crucial for tumor initiation, recurrence, spread, and resistance to chemotherapy. There was a significant disparity in the CSC scores between the G1 and G2 risk groups (Figure 5A). Based on these findings, DNA replication-related classification could serve as a potential predictor of drug resistance.

A low survival rate was observed following immune checkpoint blockade (ICB) treatment in patients with elevated tumor immune dysfunction and exclusion (TIDE) scores. The G1 patients had higher TIDE scores than the G2 patients (Figure 5B), indicating G1 risk groups having an insufficient response to ICB therapy and poor prognosis.
Construction of a prognostic model based on DNA replication-related genes
Least absolute shrinkage and selection operator and Cox regression analyses were conducted, and 15 genes associated with DNA replication in LUAD were identified. A gene signature comprising these 15 genes was created based on the optimal lambda value (Figure 6A). Using this gene signature, TCGA-LUAD patients were stratified into high- and low-risk groups (Figure 6B). The survival analysis indicated that patients classified as low-risk had better OS than those classified as high-risk (HR: 1.833, 95% CI: 1.361–2.468, P<0.001; Figure 6B). The accuracy of the prognostic model was evaluated using a receiver operating characteristic curve (ROC), resulting in areas under the curve (AUC) of 0.629 for 1 year, 0.634 for 3 years, and 0.582 for 5 years (Figure 6C), confirming the prognostic potential of the model.

Discussion
In this study, a total of 2,412 prognostic genes were obtained, and the DNA replication-related pathways closely associated with LUAD were identified by a KEGG enrichment analysis. The patients from the TCGA-LUAD cohort were divided into high- and low-risk groups based on the 15 DNA replication-related genes (i.e., FEN1, MCM5, POLD2, MCM4, MCM6, SSBP1, POLE2, RFC2, MCM2, PCNA, POLA2, MCM7, RFC3, POLE4, and RPA3). The upregulated genes were mainly related to the hallmarks of cancer (e.g., chromosome segregation, DNA replication, the cell cycle checkpoint, and DNA helicase activity), while the downregulated genes were mainly related to leukocyte activation involved in inflammatory macrophage activation, and passive transmembrane transporter activity. The immune cells, including the B cells, endothelial cells, NK cells, CD4+ T cells, and CD8+ T cells, in the G1 LUAD samples were clearly different from those in the G2 LUAD samples, and five of the 10 ICI-related genes were of a higher expression level in the G1 LUAD samples than the G2 LUAD samples. The tumor stemness of the two LUAD subtypes differed significantly. Furthermore, a six-gene (FEN1, MCM5, POLD2, MCM4, SSBP1, and POLE4) prognostic model was constructed to predict the prognosis of LUAD patients.
We conducted a systematic analysis to investigate the role of DNA replication in LUAD. DNA replication, which causes genomic instability, is the most vulnerable cellular process in cancer development (7). When DNA damage is high, it can cause replication stress, which causes genome instability that has previously been found to be associated with pre-cancerous and cancerous cells (7). Using a novel and promising therapeutic strategy, cancer cells can be altered to integrate damaged dNTPs. T cancer cells can be selectively killed by targeting NUDT1 (also known as MTH1), a protein that prevents the misincorporation of oxidized dNTPs during replication (7-9). Understanding the molecular mechanisms underlying replication stress is essential in understanding tumorigenesis. However, the role of DNA replication in LUAD requires further study.
A positive correlation was found between DNA replication-related gene expression and prognosis among LUAD patients, and a six-gene (FEN1, MCM5, POLD2, MCM4, SSBP1, and POLE4) prognostic model was constructed to predict the prognosis of LUAD patients. A previous study reported that flap endonuclease 1 (FEN1) promotes tumor progression and confers resistance to cisplatin in non-small cell lung cancer cells (10). HDAC1-mediated malignant progression of lung cancer is aggravated by minichromosome maintenance complex component 5 (MCM5) (11). Research has shown that DNA polymerase delta subunit 2 (POLD2) promotes the proliferation of triple-negative breast cancer by activating E2F1 (12). Minichromosome maintenance complex component 4 (MCM4) acts as a biomarker for LUAD prognosis (13). Mitochondrial single-stranded DNA binding protein (SSBP1) downregulation increases radiosensitivity in non-small cell lung cancer cells (14). DNA polymerase epsilon subunit 4 (POLE4) is a chaperone that maintains DNA replication integrity by co-chaperoning histone H3 and H4 (15).
Also, the interplay between DNA replication and immune checkpoint regulation is particularly relevant in the context of cancer. Tumor cells often exploit DNA replication stress to evade immune detection, leading to poor prognoses and treatment outcomes. For example, cancer stem cells (CSCs) can exhibit increased DNA repair capabilities and immune evasion, which contribute to their radioresistance and therapeutic failure (16). Moreover, the accumulation of cytosolic DNA as a result of replication stress can trigger innate immune responses, highlighting a complex relationship in which DNA replication not only influences immune cell activity but also shapes the immune landscape within tumors (17). Thus, investigating the effects of DNA replication on immune cell function and checkpoint regulation is crucial for advancing our understanding of immune responses in health and disease, particularly in cancer immunotherapy (18,19).
This study highlights the significance of a comprehensive bioinformatics analysis in understanding LUAD. By identifying key genes associated with prognosis, the developed six-gene prognostic model can be a valuable tool for clinicians. It has the potential to enhance patient stratification and inform treatment strategies, ultimately aiming to improve outcomes for LUAD patients. Focusing on DNA replication-related genes as prognostic markers indeed represents a novel and insightful approach to LUAD research. This emphasis addresses the critical need for improved patient stratification, which can significantly enhance personalized treatment strategies.
There are some limitations in this study. Compared to a previous study, a 2- transcription factor signature (1-year AUC =0.73, 2-year AUC =0.60 and 3-year AUC =0.61) based on the TCGA database was identified (20). In Yang et al. research, fatty acid synthesis and metabolism were used to construct the prognosis model for hepatocellular carcinoma, and the ROC curves indicated that the AUC for the high-risk group exceeded 0.7 at the 1-, 3-, and 5-year marks (21). Moreover, further in vitro and in vivo research needs to be conducted to investigate the specific role of DNA replication in LUAD. A prospective study may uncover the mechanisms by which DNA replication-related genes contribute to LUAD progression, which could provide new strategies for the treatment of LUAD.
Conclusions
There is a close relationship between the DNA replication-related genes and the tumor classification of LUAD patients. Our innovative signature incorporating DNA replication-related genes was found to be a good prognostic predictor of LUAD. Our findings may provide novel insights into the diagnosis and treatment of LUAD.
Acknowledgments
We appreciate the unrestricted use of TCGA databases.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2024-2536/rc
Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2024-2536/prf
Funding: This work was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2024-2536/coif). P.C. has received research funding from AstraZeneca, Amgen, Boehringer Ingelheim, Merck, Novartis, Roche, and Takeda; speaker’s honoraria from AstraZeneca, Gilead, Janssen, Merck, Novartis, Roche, Pfizer, Thermo Fisher, Takeda; support for attending meetings from AstraZeneca, Eli Lilly, Daiichi Sankyo, Janssen, Gilead, Merck, Novartis, Pfizer, Takeda; and personal fees for participating to advisory boards from AstraZeneca, Boehringer Ingelheim, Chugai, Janssen, Pfizer, Novartis, MSD, Takeda and Roche, all outside the submitted work. The other authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Testa U, Castelli G, Pelosi E. Alk-rearranged lung adenocarcinoma: From molecular genetics to therapeutic targeting. Tumori 2024;110:88-95. [Crossref] [PubMed]
- Chen Q, Zheng X, Cheng W, et al. Landscape of targeted therapies for lung squamous cell carcinoma. Front Oncol 2024;14:1467898. [Crossref] [PubMed]
- Moldvay J, Tímár J. KRASG12C mutant lung adenocarcinoma: unique biology, novel therapies and new challenges. Pathol Oncol Res 2023;29:1611580. [Crossref] [PubMed]
- Wu YL, Tsuboi M, He J, et al. Osimertinib in Resected EGFR-Mutated Non-Small-Cell Lung Cancer. N Engl J Med 2020;383:1711-23. [Crossref] [PubMed]
- Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
- Drosten M, Barbacid M. Targeting KRAS mutant lung cancer: light at the end of the tunnel. Mol Oncol 2022;16:1057-71. [Crossref] [PubMed]
- MacGilvary N, Cantor SB. Positioning loss of PARP1 activity as the central toxic event in BRCA-deficient cancer. DNA Repair (Amst) 2024;144:103775. [Crossref] [PubMed]
- Taiyab A, Ashraf A, Sulaimani MN, et al. Role of MTH1 in oxidative stress and therapeutic targeting of cancer. Redox Biol 2024;77:103394. [Crossref] [PubMed]
- Ding Y, Liu Q. Targeting the nucleic acid oxidative damage repair enzyme MTH1: a promising therapeutic option. Front Cell Dev Biol 2024;12:1334417. [Crossref] [PubMed]
- He L, Luo L, Zhu H, et al. FEN1 promotes tumor progression and confers cisplatin resistance in non-small-cell lung cancer. Mol Oncol 2017;11:640-54. [Crossref] [PubMed]
- Zhang LL, Li Q, Zhong DS, et al. MCM5 Aggravates the HDAC1-Mediated Malignant Progression of Lung Cancer. Front Cell Dev Biol 2021;9:669132. [Crossref] [PubMed]
- Zhang Z. POLD2 is activated by E2F1 to promote triple-negative breast cancer proliferation. Front Oncol 2022;12:981329. [Crossref] [PubMed]
- Tan Y, Ding L, Li G. MCM4 acts as a biomarker for LUAD prognosis. J Cell Mol Med 2023;27:3354-62. [Crossref] [PubMed]
- Wang Y, Hu L, Zhang X, et al. Downregulation of Mitochondrial Single Stranded DNA Binding Protein (SSBP1) Induces Mitochondrial Dysfunction and Increases the Radiosensitivity in Non-Small Cell Lung Cancer Cells. J Cancer 2017;8:1400-9. [Crossref] [PubMed]
- Bellelli R, Belan O, Pye VE, et al. POLE3-POLE4 Is a Histone H3-H4 Chaperone that Maintains Chromatin Integrity during DNA Replication. Mol Cell 2018;72:112-126.e5. [Crossref] [PubMed]
- Meyer F, Engel AM, Krause AK, et al. Efficient DNA Repair Mitigates Replication Stress Resulting in Less Immunogenic Cytosolic DNA in Radioresistant Breast Cancer Stem Cells. Front Immunol 2022;13:765284. [Crossref] [PubMed]
- Sugimura N, Kubota E, Mori Y, et al. Reovirus combined with a STING agonist enhances anti-tumor immunity in a mouse model of colorectal cancer. Cancer Immunol Immunother 2023;72:3593-608. [Crossref] [PubMed]
- Sayson SL, Fan JN, Ku CL, et al. DNAJA3 regulates B cell development and immune function. Biomed J 2024;47:100628. [Crossref] [PubMed]
- Murayama T, Nakayama J, Jiang X, et al. Targeting DHX9 Triggers Tumor-Intrinsic Interferon Response and Replication Stress in Small Cell Lung Cancer. Cancer Discov 2024;14:468-91. [Crossref] [PubMed]
- Zhengdong A, Xiaoying X, Shuhui F, et al. Identification of fatty acids synthesis and metabolism-related gene signature and prediction of prognostic model in hepatocellular carcinoma. Cancer Cell Int 2024;24:130. [Crossref] [PubMed]
- Yang Y, Ye X, Zhang H, et al. A novel transcription factor-based signature to predict prognosis and therapeutic response of hepatocellular carcinoma. Front Genet 2022;13:1068837. [Crossref] [PubMed]
(English Language Editor: L. Huleatt)