Identification of lysine crotonylation-driven molecular clusters and immune dysregulation in HBV-related hepatocellular carcinoma via bioinformatics and machine learning
Highlight box
Key findings
• Two distinct lysine crotonylation-related molecular clusters were identified and we established a predictive model to assess the likelihood of patients infected with hepatitis B virus (HBV) developing hepatocellular carcinoma (HCC).
What is known and what is new?
• The lysine crotonylation-related genes are associated with HBV-HCC.
• We confirmed a predictive model associated with lysine crotonylation.
What is the implication, and what should change now?
• A predictive model based on lysine crotonylation predicts prognosis and immune dysregulation of HBV-HCC patients.
Introduction
Chronic infection with the hepatitis B virus (HBV) poses a major risk factor for mortality associated with cirrhosis and liver carcinoma (1). It is estimated that approximately 248 million individuals worldwide are affected by chronic HBV infection, resulting in over 600,000 fatalities each year due to related complications (2). Notably, chronic HBV infection constitutes nearly half the global population of hepatocellular carcinoma (HCC) and contributes to one-third of HCC-related deaths worldwide (3,4). This epidemiological burden displays significant geographical heterogeneity (prevalence: 0.1–35%), particularly concentrated in HBV-endemic regions like sub-Saharan Africa and East Asia (1,5,6). In China, despite a 13% reduction in age-standardized incidence rates (from 6.58 to 5.73 per 100,000 population) from 1990–2021, absolute HBV-HCC cases are nearly doubled (63,118 to 118,665) (7), underscoring the persistent public health challenge. The insidious progression of HBV-HCC frequently results in diagnosis at advanced-stage, where therapeutic limitations attribute to a 5-year survival rate below 18% (8). Investigations have shown that the combination of liver partition and portal vein ligation or transarterial chemo-embolization and portal vein embolization is used for staged hepatectomy in patients with HBV-HCC (9). A study suggested that beta-actin (ACTB) can serve as a prognostic biomarker for lenvatinib in HCC. In early-stage HCC patients, lower levels of ACTB were correlated with a better response to lenvatinib (10). However, biomarkers to predict who will most likely develop into HCC following HBV infection remain to be identified. Thus, this clinical reality necessitates urgent development of innovative predictive markers with enhanced sensitivity and specificity.
Emerging evidence implicates lysine crotonylation (Kcr), a dynamic post-translational modification (PTM) mediated by writer/reader/eraser protein systems (11), as a critical regulator of oncogenic metabolism. While global Kcr suppression characterizes HCC progression (12), specific non-histone modifications demonstrate paradoxical tumor-promoting effects. For instance, SEPT2-K74 crotonylation activates AKT signaling to drive metastasis (13), and distinct crotonyltransferase signatures correlate with poor HCC prognosis (14). Mechanistically, lysine crotonylation modulates key pathways including pyruvate metabolism, tricarboxylic acid cycle (TCA) cycle regulation, and immune evasion (15,16). But, the landscape of lysine crotonylation-related genes (LCRGs) in HBV-HCC remains unexplored, representing a critical knowledge gap in virus-driven oncogenesis.
The integration of machine learning (ML) into biomedical research represents a paradigm shift, particularly in personalized medicine and computer-aided diagnosis. Its capacity for analyzing complex biological datasets has revolutionized multiple domains, including genomic exploration (17), where ML algorithms excel in clustering microarray data and deciphering RNA sequencing patterns. This analytical power extends to clinical decision-making, as demonstrated by ML models capable of stratifying cirrhosis patients’ HCC risks with enhanced precision (18), potentially enabling earlier interventions. Notably, the multilayer perceptron architecture has shown exceptional utility, both in predicting post-hepatectomy HCC recurrence probability (19) and in differentiating HBV-HCC from liver cirrhosis through serum peptidome analysis (20). These technological breakthroughs align with the emerging paradigm of precision oncology, where multi-omics integration enables biomarker discovery and personalized therapeutic strategies (21,22). Furthermore, ML-driven biomarker discovery has identified novel genetic signatures associated with the pathogenesis of HBV-HCC (23,24), offering targets for both diagnostic development and therapeutic innovation. Nevertheless, the synergistic potential of ML and crotonylation biology in HBV-HCC remains untapped.
To address these gaps, we present an integrative multi-omics investigation combining weighted gene co-expression network analysis (WGCNA) algorithm with ensemble ML algorithms. Our systematic approach encompasses: (I) transcriptomic profiling of LCRG dysregulation in HBV-HCC tissues with adjacent non-tumoral liver tissues from HBV-infected patients. (II) Identification of cluster-specific gene modules through WGCNA. (III) Development and validation of a predictive nomogram using four ML paradigms. (IV) Comprehensive evaluation of the model performance through clinical impact curves and external validation. (V) Prognostic characterization of diagnostic biomarkers across HCC subtypes. This study establishes the first evidence-based framework linking lysine crotonylation dynamics with HBV-HCC, while demonstrating the clinical utility of ML-driven LCRGs signatures. Figure 1 presents the study flowchart. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-728/rc).
Methods
Data collection and preparation
The raw expression profile datasets (GSE55092, GSE121248, and GSE47197) were obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). The merged dataset (GSE55092, including 81 non-tumoral liver tissues from HBV patients and 39 HBV-HCC tumor tissues; GSE121248, including 37 non-tumoral liver tissues from HBV patients and 70 HBV-HCC tumor tissues) was designated as the primary training set. GSE47197, which includes 63 non-tumoral liver tissues from HBV patients and 61 HBV-HCC tumor tissues, was used as the external validation set. The additional descriptive details were available in Table S1. To transform the probe expression matrix into a gene expression matrix, we employed Strawberry Perl (version 5.30.0.1) in conjunction with platform annotation files. Following this, the two datasets were merged, and batch effects were mitigated utilizing the “ComBat” method from the “sva” package within the R programming environment. The efficacy of batch effect removal was evaluated through principal component analysis (PCA). Based on the idea from a previous study (25), 18 LCRGs was selected for further analysis. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Identification of differentially-expressed genes (DEGs)
The “limma” package was first employed to identify DEGs within the training dataset, utilizing stringent thresholds of |log2 fold change (FC)| >1 and adjusted P<0.05. Subsequently, the differentially-expressed LCRGs (DE-LCRGs), with a threshold of P<0.05, were also identified using the “limma” package. The box plot was constructed by “ggpubr” package and heat maps were generated by “pheatmap” package. To investigate the correlations among genes, the R package “corrplot” was utilized to analyze the differential LCRGs.
Immune cell infiltration and correlation analyses
The CIBERSORT algorithm (26) was utilized to examine the immunological features within the training dataset, and we evaluated the correlation coefficient between the DE-LCRGs and the relevant immune cell features. Spearman correlation analysis was performed, with a P value of less than 0.05 denoting a significant correlation. Ultimately, the findings were represented visually using the R package “corrplot”.
Unsupervised clustering analysis
The “ConsensusClusterPlus” package was employed for unsupervised cluster analysis based on the gene expression profiles of DE-LCRGs. We systematically organized 109 samples from HBV-HCC patients into various clusters. This was done using a k-means algorithm with 1,000 iterations. To determine the optimal number of clusters, we meticulously examined cumulative distribution function (CDF) curves, consistency matrices, and clustering scores that exceeded a threshold of 0.8. As a result of this analysis, we selected the highest number of subtypes (k=9) for further exploration. Additionally, we used PCA, a well-known method for dimensionality reduction, to visually represent the distribution of the identified clusters. Then, we visualized the expression patterns of DE-LCRGs across clusters using the “pheatmap” package and the “wilcox.test” function, and then we analyzed variations in immune infiltration across the clusters using the “CIBERSORT” algorithms.
Gene set variation analysis (GSVA)
The “GSVA” package was employed to perform GSVA enrichment analysis (27), allowing for the assessment of variability in the enriched gene sets among the various LCRG clusters. The gene matrix transposed (gmt) files, “c2.cp.kegg.v2022.1.Hs.symbols.gmt” and “c5.all.v2022.1.Hs.symbols.gmt”, were obtained from the MSigDB database for the GSVA analysis. P value of less than 0.05 was deemed statistically significant in this analysis.
WGCNA
The “WGCNA” package facilitated the creation of a clusters WGCNA network aimed at pinpointing gene modules (28). A suitable soft threshold was first identified to construct a weighted adjacency matrix. Module eigengenes represent the overall gene expression profiles that define each module. Additionally, module significance (MS) indicates the relationship between modules and disease status, while gene significance (GS) relates to the correlation between individual genes and clinical phenotypes.
Development of a predictive model utilizing various ML algorithms
In the present study, we developed four ML models, including random forest (RF), support vector machine (SVM), generalized linear model (GLM), and least absolute shrinkage and selection operator (LASSO), employing the “caret” package. To ensure accurate evaluations, we utilized the “DALEX” package, which enabled us to create residual distribution maps and reverse cumulative residual distributions. Moreover, the performance of the models was visually represented through receiver operating characteristic (ROC) curves produced using the “pROC” package, where values approaching 1 signified a high level of training accuracy. Ultimately, based on the assessments of predictive accuracy, we determined the most effective ML model and subsequently identified five key genes deemed essential for predicting HBV-HCC.
Nomogram model construction and independent validation analysis
Nomograms were commonly used in predicting cancer prognosis as they condense complex statistical models into a single numerical value representing the probability of an event occurring (29). We developed both a logistic nomogram and an interactive online dynamic nomogram for clinical prediction, utilizing the “rms” package (30). The predictive performance of these nomograms was evaluated through a range of metrics, such as calibration curves, decision curve analysis (DCA) curves, clinical impact curves, and ROC curves (31). Additionally, the external validation cohort GSE47197 was employed to assess the model’s robustness and diagnostic accuracy. The area under the curve (AUC) value was calculated using the “pROC” package, with an AUC value >0.7 deemed indicative of predictive utility. To analyze the expression levels of predictive genes, we utilized the “ggplot2” package to compare data between patients diagnosed with HBV-HCC and the control groups (HBV).
Examination of predictive gene expression patterns and their prognostic relevance in HCC
The RNA-sequencing data that has been processed using STAR, along with the associated clinical metadata for The Cancer Genome Atlas (TCGA)-LIHC cohort, was sourced from the GDC portal (https://portal.gdc.cancer.gov). In order to investigate the correlation between predictive genes and patient survival as well as prognosis in HCC, we utilized the “survival” package for testing the proportional hazards assumption and for conducting survival regression analysis. The findings were subsequently visualized through the application of the “survminer” and “ggplot2” packages. A significance level of P<0.05 was established for statistical relevance.
Statistical analysis
Statistical analyses were performed using R software (version 4.4.3) and Strawberry Perl (version 5.30.0.1). Group comparisons were conducted using both Wilcoxon and Student’s t-tests, with statistical significance set at P<0.05. Significance levels were denoted as *, P<0.05; **, P<0.01; and ***, P<0.001.
Results
Analysis of LCRGs expression patterns in HBV-HCC patients
A total of 16 DE-LCRGs were identified between the HBV-HCC and HBV control samples in our training dataset (Figure 2A,2B). Notably, genes such as EP300, KAT2A, KAT2B, HDAC1, HDAC2, HDAC8, TAF1, YEATS2, and DPF2 were over-expressed in HBV-HCC patients. Conversely, CREBBP, KAT8, SIRT2, SIRT3, HDAC2, MLLT3, and KAT6A showed decreased expression levels compared to control individuals. To further investigate the potential roles of these DE-LCRGs in HBV-HCC, we conducted a correlation analysis (Figure 2C). This analysis revealed robust positive correlations between HDAC2 and YEATS2, CREBBP and KAT8, as well as KAT2B and SIRT3. On the other hand, significant negative correlations were identified between YEATS2 and KAT2B (Figure 2D). Overall, these findings indicate that the abnormal expression of LCRGs may be involved in the pathogenesis of HBV-HCC.
Immune infiltration analysis
To investigate the proportion of each immune cell type between HBV-HCC and HBV control samples in the training dataset, the CIBERSORT algorithm was employed. A bar chart was utilized to depict the distribution of 22 distinct immune cell categories across each sample (Figure 3A). Patients with HBV-HCC exhibited an increased presence of CD8 T cells, follicular helper T cells, regulatory T cells (Tregs), resting natural killer (NK) cells, M0 macrophages, activated dendritic cells, and resting mast cells. Conversely, there was a marked reduction in the populations of memory B cells, plasma cells, resting CD4 memory T cells, activated CD4 memory T cells, gamma delta T cells, M1 macrophages, activated mast cells, and neutrophils in HBV-HCC when compared to the HBV control liver tissues (Figure 3B). Notably, strong correlations were identified between DE-LCRGs and various critical immune cell types, particularly M0 macrophages, activated mast cells, resetting mast cells, neutrophils, plasma cells, and resting CD4 memory T cells (Figure 3C). These findings offer insights into the immune cell dynamics associated with HBV-HCC patients.
Identification of LCRGs clusters in HBV-HCC
We conducted a consensus clustering analysis involving 16 DE-LCRGs to gain insights into their expression profiles among liver tumor tissues from patients with HBV-HCC. The optimal clustering solution was identified as k=2, evidenced by the minimal variation observed in the CDF curve within the consensus index range of 0.2 to 0.6 (Figure 4A,4B). A comprehensive assessment of the area under the CDF curve for k values between 2 and 9 demonstrated notable differences among consecutive CDF curves (Figure 4C). At k=2, each subtype achieved the highest concordance score (Figure 4D). This clustering analysis categorized the 109 HBV-HCC patient samples into two distinct clusters: Cluster 1, which included 83 patients, and Cluster 2, comprising 26 patients. Additionally, PCA effectively differentiated between these two clusters, highlighting the utility of unsupervised clustering methods in analyzing samples from HBV-HCC patients (Figure 4E).
Analysis of DE-LCRGs clusters in HBV-HCC
Distinct expression patterns of DE-LCRGs were observed between the two identified clusters. Specifically, Cluster 1 had higher expression levels of KAT2B, SIRT2, SIRT3, HDAC8, and TAF1. In contrast, Cluster 2 showed increased levels of EP300, HDAC1, HDAC2, MLIT3, YEATS2, and DPF2 (Figure 5A,5B). A bar chart was utilized to depict the distribution of 22 distinct immune cell categories across two clusters in the training dataset (Figure 5C). Cluster 1 was characterized by a higher abundance of native B cells, activated NK cells, and resting mast cells. Conversely, Cluster 2 showed greater infiltration of M0 macrophages (Figure 5D).
Biological function and pathway analyses of two distinct clusters
The GSVA enrichment analysis for Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) was used to assess the pathway activities and biological functions of each cluster. The GSVA GO enrichment analysis showed that Cluster 1 was significantly enriched in regulating translation processes, including synapse modulation, synaptic translation, negative regulation of ubiquitin protein transferase activity, and the eukaryotic translation initiation factor 3 complex. In contrast, Cluster 2 showed significant enrichment in the negative regulation of blood pressure, negative regulation of fatty acid biosynthesis, and oxaloacetate transport (Figure 6A). Furthermore, the GSVA KEGG pathway enrichment results indicated that Cluster 1 showed increased activity in epithelial cell signaling during Helicobacter pylori infection, as well as in the GNRH signaling pathway and the B cell receptor signaling pathway. In contrast, Cluster 2 showed substantial enrichment in primary bile acid biosynthesis, tyrosine metabolism, and steroid hormone biosynthesis (Figure 6B).
Analysis of WGCNA for Cluster 1 and Cluster 2
We used the “WGCNA” packages to create clusters WGCNA related to DE-LCRGs and identified genes specific to Clusters 1 and 2. We selected a soft thresholding power (β) of 7 based on its scale independence and average connectivity, which helped identify co-expressed gene modules (Figure 7A). Using the dynamic tree cut method, we identified eight distinct co-expression modules, which are shown in a heat map of the topological overlap matrix (Figure 7B,7C). The “MEturquoise” module and “MEyellow” showed the higher correlation with Clusters 1 and 2, with a correlation coefficient of r=0.71 (P=3e−18) and r=0.46 (P=4e−7), respectively (Figure 7D). A scatter plot showed a strong correlation between the genes in the “MEturquoise” module and the “MEyellow” module (Figure 7E). As a result, two modules, consisting of 103 genes specific to the cluster, was chosen for further analysis.
Identification of cluster-specific DEGs
A total of 619 DEGs were identified in the training dataset and shown in volcano plots (Figure 8A), and the heatmap displayed the distribution of the top 50 significantly different DEGs between the adjacent normal liver samples and HBV-HCC samples (Figure 8B). A Venn diagram identified 21 cluster-specific DEGs from the overlap of 619 DEGs linked to HBV-HCC patients and 103 genes related to DE-LCRGs-associated clusters (Figure 8C). Notably, in the HBV-HCC groups, except for one gene without significance, all the remaining 20 genes were significantly downregulated compared to control samples (Figure 8D), and the heatmap displayed the distribution of 20 significantly cluster-specific DEGs (Figure 8E).
ML predictive models
Four ML models (RF, SVM, LASSO, and GLM) were trained based on 20 significantly cluster-specific DEGs. Among the four models, the RF model had the lowest residual error (Figure 9A,9B). The top 10 significant features for each model were ranked based on the root mean square error (RMSE) (Figure 9C). ROC analysis of the training cohort confirmed the RF model’s exceptional predictive capabilities, achieving the highest AUC of 0.973, which was higher than that of SVM (0.965), LASSO (0.939), and GLM (0.892) (Figure 9D). These results indicate that the RF model is the most effective algorithm for distinguishing between different patient groups. From the RF model, five genes (GCDH, GPT2, F9, FOXM1, and CDC20) emerged as the most prominent predictors, positioning them as potential biomarkers for further exploration into the pathogenesis of HBV-HCC.
Construction of the nomogram
To better predict the predictive value of HBV-HCC, we constructed a nomogram model based on the five predictive markers in the RF model (Figure 10A). The results of calibration curves showed that the predictive ability of the nomogram model was accurate (Figure 10B). The clinical applicability was further validated through DCA, which revealed significant net benefits across a spectrum of threshold probabilities (Figure 10C). Additionally, the clinical impact curve underscored the model’s powerful predictive capability (Figure 10D). The nomogram’s discriminative power was confirmed through ROC analysis, which yielded the following AUC value: 0.879 for GCDH, 0.792 for GPT2, 0.832 for F9, 0.920 for FOXM1, and 0.928 for CDC20 (Figure 10E). Further, the AUC value of the nomogram model itself achieved a high level of predictive utility, with an AUC of 0.943 [95% confidence interval (CI): 0.905–0.981] (Figure 10F), indicating excellent predictive accuracy. We confirmed that the expression levels of three predictive genes (GCDH, GPT2, and F9) were significantly lower, while FOXM1 and CDC20 were notably elevated in patients diagnosed with HBV-HCC (Figure 10G).
Model validation
To further validate the five-gene model, we assessed its reliability and reproducibility within the validation dataset GSE47197. A nomogram model was devised, incorporating five predictive markers, and its remarkable predictive precision was corroborated through the calibration curve, DCA, and clinical impact curve (Figure 11A-11D). ROC analysis demonstrated AUC values surpassing 0.70 for the individual predictive genes, while the integrative predictive model, combining these genes, attained an AUC of 0.901 (95% CI: 0.839–0.962), excelling the performance of any single gene (Figure 11E,11F). Additionally, we validated those three predictive genes (GCDH, GPT2, and F9) with decreased expression, whereas FOXM1 and CDC20 showed increased expressions in patients with HBV-HCC (Figure 11G).
Examination of predictive genes expression and prognostic implications in HCC
To further uncover genes potentially linked to the development of HCC, we analyzed the mRNA expression of five predictive genes in TCGA-LIHC (Figure 12A-12E). Furthermore, we assessed the correlation between the expression levels of these genes and the survival outcomes of HCC patients. Our results revealed that reduced expression levels of GCDH, GPT2, and F9, alongside increased expression of FOXM1 and CDC20, were significantly correlated with poorer prognoses in the TCGA-LIHC cohort (P<0.05) (Figure 12F-12J).
Discussion
HCC is a major global health burden, with HBV infection being a significant risk factor. HBV infection involves a complex interaction among viral, host, and environmental factors (2). Although recent studies have highlighted the role of Kcr modifications in HCC, the relationship between HBV-HCC and LCRGs remains insufficiently understood. Furthermore, ML methods and nomogram generation based on the LCRGs-related specific clusters have not been used in the prediction of HBV-HCC. Here, we used a series of integrated bioinformatics analyses and ML methods to investigate molecular classification, immune profiling to construct a predict model based on the five LCRGs-related clusters genes for diagnosing HBC-HCC patients.
We first performed a comprehensive analysis of LCRG expression patterns in HBV-HCC tumor samples compared to HBV liver samples. Sixteen out of the eighteen DE-LCRGs exhibited different levels of expression with significant synergistic or antagonistic effects, highlighting the importance of LCRGs in the development and progression of HBV-induced HCC. Notably, chromatin remodelers (EP300, KAT2A/B, HDAC1/2/8) and transcriptional regulators (TAF1, YEATS2, DPF2) were upregulated, whereas metabolic modulators (CREBBP, KAT8, SIRT2/3) and differentiation regulators (MLLT3, KAT6A) showed down-regulated. These findings indicated that aberrant histone crotonylation may influence HCC aggressiveness (14), suggesting HBV-driven hepatocarcinogenesis may involve epigenetic reprogramming through LCRG dysregulation. Our findings also showed that these genes exhibit distinct expression profiles linking to immunological responses. It has been reported that HBV infection disrupts both innate and adaptive immunity, thereby creating a microenvironment to promote tumor growth (32). While tumor-infiltrating leukocytes (TILs) influence HCC prognosis, their functional significance differs between HBV and HCV. Specifically, In HBV-HCC, plasma cells and dendritic cells are linked to survival, whereas in HCV-HCC, monocytes play a similar role (33). It has been reported that HBV-specific CD8+ T cells exhibit dual roles: while essential for viral clearance, their excessive activation may exacerbate hepatic inflammation and carcinogenesis (34). Additionally, CD8+ T cells and M0 macrophages are indicators of recurrence-free survival in HBV-HCC, while neutrophils are relevant in HCV-HCC (33). Intriguingly, adaptive NK cells with attenuated antitumor activity were enriched in HBV-HCC patients, potentially compromising immune surveillance (35). The HBV-HCC microenvironment favors expansion of immunosuppressive elements, particularly Tregs. Our data corroborate previous findings that Treg infiltration correlates with elevated viral loads and poor prognosis (36), likely through promoting CD8+ T cell exhaustion (37). The atypical infiltration of dendritic cells, macrophages, NK cells, T cells, and neutrophils documented in this study confirms results from previous reports that persistent viral replication existed in patients with HBV-HCC (38-40). Based on these findings, we hypothesize that dysregulation of HBV-induced LCRGs may disrupt the activation of immune cells, which could contribute to the TILs seen in HBV-HCC patients. This hypothesis warrants experimental validation to determine whether LCRGs modulation could reverse immune dysfunction in HBV-HCC.
The molecular clustering of DE-LCRGs revealed two distinct subgroups (Cluster 1 and Cluster 2) with different expression pattern and immunological characteristics. Cluster 1, characterized by elevated expression of KAT2B, SIRT2, SIRT3, HDAC8, and TAF1, exhibited a higher abundance of native B cells, activated NK cells, and resting mast cells. It has been reported that HBV disrupts liver metabolism, promoting HCC development, with distinct metabolic profiles showing upregulation of steroid hormone biosynthesis, bile acid metabolism, and sphingolipid metabolism, activating MAPK/mTOR signaling and reprogramming lipid metabolism in HCC cells (41). This is consistent with the Cluster 2 results of the GSVA KEGG analysis performed in the present study (41), but further research is necessary to explore.
The application of ML in HBV-HCC research has significantly facilitated biomarker discovery and mechanistic understanding (24,42,43). Among four rigorously evaluated algorithms (RF, SVM, LASSO, GLM), the RF model demonstrated superior predictive performance (AUC =0.973), consistent with its established advantages in processing complex biomedical datasets through ensemble decision tree architecture (44). Our RF model identified five pivotal genes (GCDH, GPT2, F9, FOXM1, CDC20) with robust predictive utility in HBV-HCC. GCDH is a mitochondrial flavoprotein enzyme that catalyzes the dehydrogenation and decarboxylation of glutaryl-CoA, transforming it into crotonyl-CoA while releasing carbon dioxide (45). Our multi-omics analysis revealed reduced GCDH expression in HBV-HCC tissues, correlating with unfavorable clinical outcomes. A previous study established GCDH’s tumor-suppressive role via metabolic regulation in glioblastoma (15). Additionally, GCDH has been found to inhibit the progression of HCC by blocking the pentose phosphate pathway and glycolysis through crotonylation, leading to the senescence of HCC cells (46). Our work extends these findings by linking its downregulation specifically to HBV-HCC related crotonylation. GPT2 encodes a mitochondrial alanine transaminase that catalyzes the transamination between alanine and 2-oxoglutarate, producing pyruvate and glutamate (47). It has been reported that abnormal GPT2 expression decreases α-ketoglutarate, enhancing TCA cycle anaplerosis and promoting cell survival and growth, thereby connecting the Warburg effect to oncogenesis via pyruvate metabolism (48,49). The down-regulation of the F9 gene predicts unfavorable outcome in HCC, and it was also involved in HBV-HCC progression (50,51). FOXM1 overexpression is linked to malignant characteristics and poor prognosis in HBV-HCC, serving as an independent risk factor for patient recurrence and survival post-surgery (52). It has been reported that MiR-3677-3p, up-regulated in HBV-HCC, enhances tumor progression and sorafenib resistance by targeting FBXO31, which stabilizes FOXM1, promoting HCC development (53). Moreover, CDC20, a regulating protein in the cell cycle, may play a role in early diagnosis, tumor stage, and poor outcomes of HBV-HCC (54). Additionally, the robustness of this five-gene signature was validated in external cohort GSE47197 (AUC =0.901), confirming its generalizability across independent datasets. To facilitate clinical translation, we developed a diagnostic nomogram integrating these biomarkers. The calibration curves demonstrated strong concordance between predicted and observed outcomes, while DCA substantiated its clinical net benefit across probability thresholds. Thus, our findings provide compelling evidence that the established model is a dependable tool for HBV-HCC diagnosis. The precise functions of GCDH, GPT2, F9, FOXM1, CDC20 in HBV-HCC through Kcr are yet to be fully elucidated, and our network analysis suggests Kcr-mediated epigenetic regulation may lie in the center of these molecular observations, proposing a novel mechanistic framework for HBV-HCC progression that definitely warrants experimental validation in the future.
Nevertheless, our current study has some limitations. The predictive model genes were retrospectively searched through GEO dataset of HBV-HCC and their clinical utility needs prospective validation. Future comprehensive clinical or experimental studies are necessary to confirm the link between DE-LCRGs and the pathogenesis and prognosis of HBV-HCC. Moreover, we recognize that the lack of multi-etiology HCC cohorts (such as HCV-induced or alcohol-associated HCC) restricts clinical generalizability. Future studies will validate these signatures in cohorts with diverse HCC etiologies to evaluate the differential predictive utility related to LCRGs.
Conclusions
Our current study developed a novel LCRGs-associated predictive model to predict the prognosis of HBV-HCC. In addition, we identified two unique molecular clusters based on LCRGs, indicating the potential roles of lysine crotonylation in the pathogenesis of HBV-HCC.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-728/rc
Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-728/prf
Funding: This study was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-728/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Ott JJ, Stevens GA, Groeger J, et al. Global epidemiology of hepatitis B virus infection: new estimates of age-specific HBsAg seroprevalence and endemicity. Vaccine 2012;30:2212-9. [Crossref] [PubMed]
- Schweitzer A, Horn J, Mikolajczyk RT, et al. Estimations of worldwide prevalence of chronic hepatitis B virus infection: a systematic review of data published between 1965 and 2013. Lancet 2015;386:1546-55. [Crossref] [PubMed]
- Xie Y, Hepatitis B. Virus-Associated Hepatocellular Carcinoma. Adv Exp Med Biol 2017;1018:11-21. [Crossref] [PubMed]
- Ringelhan M, Heikenwalder M, Protzer U. Direct effects of hepatitis B virus-encoded proteins and chronic infection in liver cancer development. Dig Dis 2013;31:138-51. [Crossref] [PubMed]
- Amarapurkar DN, Mada K, Kapoor D. Management of Chronic Hepatitis B Infection in India. J Assoc Physicians India 2015;63:43-52.
- Abdelhamed W, El-Kassas M. Hepatitis B virus as a risk factor for hepatocellular carcinoma: There is still much work to do. Liver Res 2024;8:83-90. [Crossref] [PubMed]
- Long J, Cui K, Wang D, et al. Burden of Hepatocellular Carcinoma and Its Underlying Etiologies in China, 1990-2021: Findings From the Global Burden of Disease Study 2021. Cancer Control 2024;31:10732748241310573. [Crossref] [PubMed]
- Liu Y, Veeraraghavan V, Pinkerton M, et al. Viral Biomarkers for Hepatitis B Virus-Related Hepatocellular Carcinoma Occurrence and Recurrence. Front Microbiol 2021;12:665201. [Crossref] [PubMed]
- Gavriilidis P, Pawlik TM, Meirson T, et al. Associating liver partition and portal vein ligation or combined transarterial chemo-embolisation and portal vein embolisation for staged hepatectomy for HBV-related hepatocellular carcinoma. Hepatobiliary Surg Nutr 2023;12:272-5. [Crossref] [PubMed]
- Dong W, Zou M, Sheng J, et al. ACTB may serve as a predictive marker for the efficacy of lenvatinib in patients with HBV-related early-stage hepatocellular carcinoma following partial hepatectomy: a retrospective cohort study. J Gastrointest Oncol 2023;14:2479-99. [Crossref] [PubMed]
- Zhao H, Han Y, Zhou P, et al. Protein lysine crotonylation in cellular processions and disease associations. Genes Dis 2024;11:101060. [Crossref] [PubMed]
- Li Z, Li J, Li F, et al. Potential functions and mechanisms of lysine crotonylation modification (Kcr) in tumorigenesis and lymphatic metastasis of papillary thyroid cancer (PTC). J Transl Med 2024;22:874. [Crossref] [PubMed]
- Zhang XY, Liu ZX, Zhang YF, et al. SEPT2 crotonylation promotes metastasis and recurrence in hepatocellular carcinoma and is associated with poor survival. Cell Biosci 2023;13:63. [Crossref] [PubMed]
- Wan J, Liu H, Ming L. Lysine crotonylation is involved in hepatocellular carcinoma progression. Biomed Pharmacother 2019;111:976-82. [Crossref] [PubMed]
- Yuan H, Wu X, Wu Q, et al. Lysine catabolism reprograms tumour immunity through histone crotonylation. Nature 2023;617:818-26. [Crossref] [PubMed]
- Liao M, Sun X, Zheng W, et al. LINC00922 decoys SIRT3 to facilitate the metastasis of colorectal cancer through up-regulation the H3K27 crotonylation of ETS1 promoter. Mol Cancer 2023;22:163. [Crossref] [PubMed]
- Zou S, Wu Z. A narrative review of the application of machine learning in venous thromboembolism. Vascular 2024;32:698-704. [Crossref] [PubMed]
- Singal AG, Mukherjee A, Elmunzer BJ, et al. Machine learning algorithms outperform conventional regression models in predicting development of hepatocellular carcinoma. Am J Gastroenterol 2013;108:1723-30. [Crossref] [PubMed]
- Liu R, Wu S, Yu HY, et al. Prediction model for hepatocellular carcinoma recurrence after hepatectomy: Machine learning-based development and interpretation study. Heliyon 2023;9:e22458. [Crossref] [PubMed]
- Wang N, Cao Y, Song W, et al. Serum peptide pattern that differentially diagnoses hepatitis B virus-related hepatocellular carcinoma from liver cirrhosis. J Gastroenterol Hepatol 2014;29:1544-50. [Crossref] [PubMed]
- Chen S, Zhang Z, Wang Y, et al. Using Quasispecies Patterns of Hepatitis B Virus to Predict Hepatocellular Carcinoma With Deep Sequencing and Machine Learning. J Infect Dis 2021;223:1887-96. [Crossref] [PubMed]
- Zhu D, Tulahong A, Abuduhelili A, et al. Machine Learning Prognostic Model for Post-Radical Resection Hepatocellular Carcinoma in Hepatitis B Patients. J Hepatocell Carcinoma 2025;12:353-65. [Crossref] [PubMed]
- Jia C, Chen J, Wang X, et al. Machine learning and experimental screening of chromatin regulator signatures and potential drugs in hepatitis B related hepatocellular carcinoma. J Biomol Struct Dyn 2025;43:2335-49. [Crossref] [PubMed]
- Kucukakcali Z, Akbulut S, Colak C. Machine Learning-based Prediction of HBV-related Hepatocellular Carcinoma and Detection of Key Candidate Biomarkers. Medeni Med J 2022;37:255-63. [Crossref] [PubMed]
- Yang B, Wen F, Cui Y. Integrative transcriptome analysis identifies a crotonylation gene signature for predicting prognosis and drug sensitivity in hepatocellular carcinoma. J Cell Mol Med 2024;28:e70083. [Crossref] [PubMed]
- Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 2015;12:453-7. [Crossref] [PubMed]
- Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 2013;14:7. [Crossref] [PubMed]
- Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008;9:559. [Crossref] [PubMed]
- Iasonos A, Schrag D, Raj GV, et al. How to build and interpret a nomogram for cancer prognosis. J Clin Oncol 2008;26:1364-70. [Crossref] [PubMed]
- Ohori Tatsuo Gondo And Riu Hamada M, Gondo T, Hamada R. Nomogram as predictive model in clinical practice. Gan To Kagaku Ryoho 2009;36:901-6.
- Weng Y, Zhou T, Ye H. Development and assessment of a mortality risk prediction nomogram model for pneumocystis disease in ICU within 28 days. Sci Rep 2025;15:2410. [Crossref] [PubMed]
- Zheng Q, Sun Q, Yao H, et al. Single-cell landscape identifies the immunophenotypes and microenvironments of HBV-positive and HBV-negative liver cancer. Hepatol Commun 2024;8:e0364. [Crossref] [PubMed]
- Hsiao YW, Chiu LT, Chen CH, et al. Tumor-Infiltrating Leukocyte Composition and Prognostic Power in Hepatitis B- and Hepatitis C-Related Hepatocellular Carcinomas. Genes (Basel) 2019;10:630. [Crossref] [PubMed]
- Cho HJ, Cheong JY. Role of Immune Cells in Patients with Hepatitis B Virus-Related Hepatocellular Carcinoma. Int J Mol Sci 2021;22:8011. [Crossref] [PubMed]
- Rennert C, Tauber C, Fehrenbach P, et al. Adaptive Subsets Limit the Anti-Tumoral NK-Cell Activity in Hepatocellular Carcinoma. Cells 2021;10:1369. [Crossref] [PubMed]
- You M, Gao Y, Fu J, et al. Epigenetic regulation of HBV-specific tumor-infiltrating T cells in HBV-related HCC. Hepatology 2023;78:943-58. [Crossref] [PubMed]
- He J, Miao R, Chen Y, et al. The dual role of regulatory T cells in hepatitis B virus infection and related hepatocellular carcinoma. Immunology 2024;171:445-63. [Crossref] [PubMed]
- Li TY, Yang Y, Zhou G, et al. Immune suppression in chronic hepatitis B infection associated liver disease: A review. World J Gastroenterol 2019;25:3527-37. [Crossref] [PubMed]
- Sun R, Li J, Lin X, et al. Peripheral immune characteristics of hepatitis B virus-related hepatocellular carcinoma. Front Immunol 2023;14:1079495. [Crossref] [PubMed]
- Lim CJ, Lee YH, Pan L, et al. Multidimensional analyses reveal distinct immune microenvironment in hepatitis B virus-related hepatocellular carcinoma. Gut 2019;68:916-27. [Crossref] [PubMed]
- Zhu Y, Zhao Y, Ning Z, et al. Metabolic self-feeding in HBV-associated hepatocarcinoma centered on feedback between circulation lipids and the cellular MAPK/mTOR axis. Cell Commun Signal 2024;22:280. [Crossref] [PubMed]
- Zhang S, Jiang C, Jiang L, et al. Construction of a diagnostic model for hepatitis B-related hepatocellular carcinoma using machine learning and artificial neural networks and revealing the correlation by immunoassay. Tumour Virus Res 2023;16:200271. [Crossref] [PubMed]
- Xiong Y, Qiao W, Wang Q, et al. Construction and validation of a machine learning-based nomogram to predict the prognosis of HBV associated hepatocellular carcinoma patients with high levels of hepatitis B surface antigen in primary local treatment: a multicenter study. Front Immunol 2024;15:1357496. [Crossref] [PubMed]
- Rigatti SJ. Random Forest. J Insur Med 2017;47:31-9. [Crossref] [PubMed]
- Ribeiro JV, Lucas TG, Bross P, et al. Potential complementation effects of two disease-associated mutations in tetrameric glutaryl-CoA dehydrogenase is due to inter subunit stability-activity counterbalance. Biochim Biophys Acta Proteins Proteom 2020;1868:140269. [Crossref] [PubMed]
- Lao Y, Cui X, Xu Z, et al. Glutaryl-CoA dehydrogenase suppresses tumor progression and shapes an anti-tumor microenvironment in hepatocellular carcinoma. J Hepatol 2024;81:847-61. [Crossref] [PubMed]
- Cui M, Peng J, Zhou Y, et al. Exosomal GPT2 derived from triple-negative breast cancer cells promotes metastasis by activating BTRC. Thorac Cancer 2023;14:2018-25. [Crossref] [PubMed]
- Kim M, Gwak J, Hwang S, et al. Mitochondrial GPT2 plays a pivotal role in metabolic adaptation to the perturbation of mitochondrial glutamine metabolism. Oncogene 2019;38:4729-38. [Crossref] [PubMed]
- Smith B, Schafer XL, Ambeskovic A, et al. Addiction to Coupling of the Warburg Effect with Glutamine Catabolism in Cancer Cells. Cell Rep 2016;17:821-36. [Crossref] [PubMed]
- Li L, Guo M, Xia Y, et al. Study on F9 gene expression downregulation and its clinical value in hepatocellular carcinoma. Zhonghua Gan Zang Bing Za Zhi 2023;31:716-22. [Crossref] [PubMed]
- Chen W, Desert R, Ge X, et al. The Matrisome Genes From Hepatitis B-Related Hepatocellular Carcinoma Unveiled. Hepatol Commun 2021;5:1571-85. [Crossref] [PubMed]
- Xia L, Huang W, Tian D, et al. Upregulated FoxM1 expression induced by hepatitis B virus X protein promotes tumor metastasis and indicates poor prognosis in hepatitis B virus-related hepatocellular carcinoma. J Hepatol 2012;57:600-12. [Crossref] [PubMed]
- He H, Zhou J, Cheng F, et al. MiR-3677-3p promotes development and sorafenib resistance of hepatitis B-related hepatocellular carcinoma by inhibiting FOXM1 ubiquitination. Hum Cell 2023;36:1773-89. [Crossref] [PubMed]
- Qiang R, Zhao Z, Tang L, et al. Identification of 5 Hub Genes Related to the Early Diagnosis, Tumour Stage, and Poor Outcomes of Hepatitis B Virus-Related Hepatocellular Carcinoma by Bioinformatics Analysis. Comput Math Methods Med 2021;2021:9991255. [Crossref] [PubMed]

