Combination of Naples prognostic score and pathological factors for evaluating the long-term prognosis of patients with late-onset colorectal cancer: a multicenter machine learning study
Highlight box
Key findings
• A machine learning (ML) nomogram integrating the Naples prognostic score (NPS), platelet-to-lymphocyte ratio (PLR), and clinicopathological factors could effectively predict the long-term overall survival in patients with late-onset colorectal cancer (LOCRC).
• Top predictors: elevated PLR [hazard ratio (HR) =1.358; P=0.02] was identified as a risk factor for LOCRC, while the NPS group, Union for International Cancer Control stage, alpha-fetoprotein level, carcinoembryonic antigen level, carbohydrate antigen 19-9 level, tumor differentiation, pT stage, pN stage, and perineural invasion (HR =0.390, 0.269, 0.544, 0.728, 0.632, 0.452, 0.109, 0.602, and 0.674, respectively; P<0.05), were identified as protective factors.
• Model performance: least absolute shrinkage and selection operator + random survival forest algorithm achieved a high concordance index, with values of 0.872, 0.768, and 0.737 in the training, internal validation, and external validation cohorts, respectively.
What is known and what is new?
• NPS and clinicopathological variables are individually linked to the outcomes of patients with colorectal cancer (CRC). However, Cox regression has limitations in handling the complex interactions present in this condition.
• This study uniquely integrated NPS, PLR, and clinicopathological data into an ML-based nomogram for patients with LOCRC. PLR was demonstrated to be an independent risk factor, while NPS group and clinicopathological factors were identified as protective factors.
What is the implication, and what should change now?
• The nomogram is a cost-effective tool for personalized risk stratification in older adult patients with CRC, suggesting the heightened importance of nutritional status (i.e., NPS) and PLR in the prognosis of patients with LOCRC. Future research should focus on validating the model in multicenter, large-sample cohorts and clarifying the related underlying mechanisms.
Introduction
Late-onset colorectal cancer (LOCRC) is a type of colorectal cancer (CRC) arising in individuals typically aged 50 years or older (1). LOCRC is associated with distinct prognostic, molecular, and pathological features as compared to CRC in younger patients (2,3), and the mortality risk of LOCRC increases with age (4,5).
According to the Global Cancer Observatory (GLOBOCAN) data from 2022, LOCRC accounts for 10.5% of CRC cases and 9.7% of related deaths worldwide, with poorer outcomes linked to diagnostic delays and inconsistent treatment availability (Figure S1A,S1B) (6,7).
Several studies have demonstrated that indicators of nutritional and inflammatory status, such as the Naples prognostic score (NPS), are valuable predictors of patient outcomes. These parameters can be easily derived from routine blood examinations. Inflammation, in particular, plays a pivotal role in cancer initiation, progression, and stage advancement (8-10).
Notably, patients with LOCRC have an increased prevalence of malnutrition, inflammation, and frailty. These conditions collectively represent nutrition-related inflammatory factors that may substantially influence disease progression and survival outcomes. However, the current Union for International Cancer Control (UICC) tumor-node-metastasis (TNM) staging system (11) does not incorporate these host-related factors, resulting in limited prognostic accuracy, particularly in older adult patients, whose outcomes are often influenced by nutritional status (12). Integrating these parameters into prognostic models through machine learning (ML) approaches could improve risk stratification and enhance predictive precision in this population. Furthermore, recent studies have underscored the prognostic relevance of inflammatory, immune, and nutritional markers in cancer (9). Additionally, ongoing research continues to identify novel biomarkers for the early detection and prevention of CRC (10,13,14).
As demonstrated in previous studies, the NPS, consisting of serum albumin (Alb) level, cholesterol (CHOL) level, neutrophil-to-lymphocyte ratio (NLR), and lymphocyte-to-monocyte ratio (LMR), is an independent prognostic factor for CRC, exhibiting an inverse correlation with patient survival outcomes (8). Similarly, the platelet-to-lymphocyte ratio (PLR) and UICC stage exhibit an inverse correlation with patient prognosis (15-19).
The NPS integrates inflammatory and nutritional markers with independent prognostic factors such as the PLR and UICC staging and thus represents a multidimensional assessment that correlates with survival outcomes. Validating this combined approach in patients with LOCRC may facilitate risk-stratified care and ultimately improve outcomes in this vulnerable population. Therefore, in this study, we sought to validate the value of NPS in combination with clinicopathological factors for predicting the prognosis of LOCRC through use of ML-based predictive models. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1795/rc) (20).
Methods
Patients
A retrospective analysis was conducted on 1,558 patients with LOCRC (aged 50 years or older) with histologically confirmed adenocarcinoma who underwent surgery for CRC at The First Affiliated Hospital of Kunming Medical University. Patients were divided into a training set (n=1,090) and an internal validation set (n=468) at a 7:3 ratio. An independent, external validation set validation cohort (n=420) was also included, consisting of patients with LOCRC with histologically confirmed diagnosed adenocarcinoma who underwent CRC surgery at The Third People’s Hospital of Honghe Prefecture. Data from September 2014 to September 2024 were extracted. Patients were excluded if they were missing preoperative inflammation index data or incomplete biomarkers (Alb, CHOL, NLR, and LMR). The study flowchart for patient inclusion and exclusion is displayed in Figure 1.
This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments and was approved by the Ethics Committee of The First Affiliated Hospital of Kunming Medical University (No. 2024-L-124) and the Ethics Committee of The Third People’s Hospital of Honghe Prefecture (No. 2024-KYXM-29). The requirement for informed consent was waived due to the retrospective nature of the analysis.
Data collected
Data were collected from the hospital record system, including clinicopathological parameters [i.e., gender, age, mismatch repair (MMR) status, body mass index (BMI), pathological tumor types, degree of tumor tissue differentiation, tumor size, pT stage, pN stage, UICC stage, vascular invasion, and perineural invasion] and inflammatory markers and immune-nutritional indices [i.e., Alb, CHOL, NLR, LMR, PLR, preoperative alpha-fetoprotein (AFP), carcinoembryonic antigen (CEA), and carbohydrate antigen 19-9 (CA19-9)].
Continuous variables, including Alb, CHOL, NLR, LMR, PLR, and tumor size, were analyzed via X-tile software version 3.6.1 to determine the optimal cutoff values, which were 3.6, 141.5, 3.3, 2.5, 251.8, and 2, respectively (see Figure S2 for details).
The NPS was calculated based on a combination of Alb, CHOL, NLR, and LMR (8,21). We modified and determined the cutoff value of the NPS as follows: the score for Alb concentration ≥3.6 g/dL was 0, while that for <3.6 g/dL was 1; the score for CHOL concentration >141.5 mg/dL was 0, while that for ≤141.5 mg/dL was 1; the score for NLR ≤3.3 was 0, while that for >3.3 was 1; and the score for LMR >2.5 was 0, while that for ≤2.5 was 1. The scores of the four indicators were assigned to groups 0, 1, and 2. Patients were categorized into three groups as follows: NPS 0, an indicator score of 0; NPS 1, an indicator score of 1 or 2; and NPS 2, an indicator score of 3 or 4. This approach aligns with previous studies that stratified NPS into clinically meaningful tiers (8,21). The flowchart for NPS classification is displayed in Figure S3. The prognostic relevance was also analyzed for the prognostic PLR and clinicopathological factors.
Follow-up
Patients in the training, internal validation, and external validation cohorts were followed up until September 2024 or patient death.
Statistical analysis
Survival analysis was performed via the Kaplan-Meier (KM) method with the log-rank test, and independent prognostic factors were identified through Cox proportional hazards regression analysis. Univariate and multivariable Cox regression analyses were conducted to evaluate the prognostic factors for overall survival (OS), with results expressed as hazard ratios (HRs) and 95% confidence intervals (CIs) via a stepwise selection approach. A P value <0.05 was considered statistically significant.
Statistical analyses were conducted with SPSS version 27.0 (IBM Corp., Armonk, NY, USA) and R version 4.1.3 (The R Foundation for Statistical Computing, Vienna, Austria) was used for statistical analysis and visualization. Survival figures and graphs were drawn via GraphPad Prism software version 8.0.2 (Dotmatics, Boston, MA, USA).
Establishment of the ML model
ML algorithms were employed to construct predictive models, and their performance was evaluated with the concordance index (C-index), the area under the curve (AUC), and decision curve analysis (DCA). The resulting nomogram was developed for patients with resectable LOCRC to predict 1-, 3-, and 5-year recurrence risks, addressing the unmet need for personalized postoperative surveillance strategies in older adult patients.
Ten independent prognostic survival factors (NPS, PLR, UICC staging, AFP, CEA, CA19-9, tumor differentiation, pT stage, pN stage, and perineural invasion) were incorporated into the predictive modeling to estimate the long-term prognosis of LOCRC.
To ensure robust model development and reduce algorithm-specific bias, several ML algorithms were evaluated in parallel under identical training, internal validation, and external validation settings. This comparative framework facilitated the identification of the most stable and accurate prognostic model.
The ML algorithms applied including random survival forest (RSF) (22), least absolute shrinkage and selection operator (LASSO) (23), stepwise Cox regression (24), generalized boosted regression modeling (GBM) (25), supervised principal components (Super-PC) (26), survival support vector machine (survival-SVM) (27), partial least squares regression for Cox (plsRcox) (28), and Cox-Boost (29). These algorithms were selected for their established utility in survival analysis and frequent application in oncological prognostic modeling. Collectively, they represent a diverse set of statistical and ML approaches capable of handling censored data, event status, and time-to-event outcomes, enabling assessment of both linear and nonlinear relationships, variable shrinkage and selection, and ensemble-based predictive performance. Incorporating this algorithmic diversity ensured methodological robustness and comparability with prior studies in cancer prognosis (30-32).
The cohort was divided into high- and low-risk groups based on the median model-derived risk score, and KM survival curves were plotted accordingly. Time-dependent receiver operating characteristic (ROC) curves for 1-, 3-, and 5-year survival were generated with the timeROC package in R. Nomograms and calibration curves were constructed via the rms R package, and clinical utility was further evaluated through DCA (33-35).
Results
Clinical characteristics
This study included 1,558 patients, comprising 936 males (60.1%) and 622 females (39.9%). The most common diagnosis was adenocarcinoma (n=1,454, 93.3%), followed by mucinous adenocarcinoma (n=38, 2.4%), and mixed adenocarcinoma (n=66, 4.2%). Among the patients, 1,385 (88.9%) had a PLR <251.8, while 173 (11.1%) had a PLR ≥251.8. NPS groups 0, 1, and 2 included 971 (62.3%), 502 (32.2%), and 85 (5.5%) patients, respectively. The distribution of patients with LOCRC according to UICC staging was as follows: stage I, 23.7% (n=370); stage II, 37.0% (n=576); stage III, 34.0% (n=529); and stage IV, 5.3% (n=83) (Table 1).
Table 1
| Clinicopathological characteristics | Total | OS | χ2 | P | |
|---|---|---|---|---|---|
| 0 | 1 | ||||
| Age | |||||
| LOCRC (≥50 years) | 1,558 (100.0) | 1,261 (80.9) | 297 (19.1) | ||
| Gender | 8.07 | 0.004** | |||
| Male | 936 (60.1) | 736 (78.6) | 200 (21.4) | ||
| Female | 622 (39.9) | 525 (84.4) | 97 (15.6) | ||
| BMI, kg/m2 | 1.56 | 0.45 | |||
| <18.5 | 131 (8.4) | 101 (77.1) | 30 (22.9) | ||
| ≥18.5, <24 | 897 (57.6) | 726 (80.9) | 171 (19.1) | ||
| ≥24 | 530 (34.0) | 434 (81.9) | 96 (18.1) | ||
| NPS | 32.84 | <0.001*** | |||
| G0 | 971 (62.3) | 826 (85.1) | 145 (14.9) | ||
| G1 | 502 (32.2) | 309 (75.5) | 123 (24.5) | ||
| G2 | 85 (5.5) | 56 (65.9) | 29 (34.1) | ||
| MMR | 5.91 | 0.02* | |||
| pMMR | 1,329 (85.3) | 1,089 (81.9) | 240 (19.1) | ||
| dMMR | 229 (14.7) | 172 (75.1) | 57 (24.9) | ||
| Alb, g/dL | 18.14 | <0.001*** | |||
| ≥3.6 | 1,388 (89.1) | 1,144 (82.4) | 244 (17.6) | ||
| <3.6 | 170 (10.9) | 117 (68.8) | 53 (31.2) | ||
| CHOL, mg/dL | 20 | <0.001*** | |||
| >141.5 | 1,278 (82.0) | 1,061 (83.0) | 217 (17.0) | ||
| ≤141.5 | 280 (18.0) | 200 (71.4) | 80 (28.6) | ||
| NLR | 6.18 | 0.01* | |||
| ≤3.3 | 1,293 (83.0) | 1,061 (82.1) | 232 (17.9) | ||
| >3.3 | 265 (17.0) | 200 (75.5) | 65 (24.5) | ||
| LMR | 18.19 | <0.001*** | |||
| >2.5 | 1,325 (85.0) | 1,096 (82.7) | 229 (17.3) | ||
| ≤2.5 | 233 (15.0) | 165 (70.8) | 68 (29.2) | ||
| PLR | 6.09 | 0.01* | |||
| <251.8 | 1,385 (88.9) | 1,133 (81.8) | 252 (18.2) | ||
| ≥251.8 | 173 (11.1) | 128 (74.0) | 45 (26.0) | ||
| AFP, ng/mL | 13.17 | <0.001*** | |||
| <7 | 1,488 (95.5) | 1,216 (81.7) | 272 (18.3) | ||
| ≥7 | 70 (4.5) | 45 (64.3) | 25 (35.7) | ||
| CEA, ng/mL | 44.46 | <0.001*** | |||
| <5 | 937 (60.1) | 809 (86.3) | 128 (13.7) | ||
| ≥5 | 621 (39.9) | 452 (72.8) | 169 (27.2) | ||
| CA19-9, ng/mL | 52.01 | <0.001*** | |||
| <30 | 1,251 (80.3) | 1,057 (84.5) | 194 (15.5) | ||
| ≥30 | 307 (19.7) | 204 (66.4) | 103 (33.6) | ||
| Tumor size (cm) | 2.12 | 0.14 | |||
| <2 | 387 (24.8) | 323 (83.5) | 64 (16.5) | ||
| ≥2 | 1,171 (75.2) | 938 (80.1) | 233 (19.9) | ||
| Tumor Site | 8 | 0.02* | |||
| Right-side colon cancer | 258 (16.6) | 194 (75.2) | 64 (24.8) | ||
| Left-side colon cancer | 446 (28.6) | 374 (83.9) | 72 (16.1) | ||
| Rectal cancer | 854 (54.8) | 693 (81.1) | 161 (18.9) | ||
| Histological type | 11.59 | 0.003** | |||
| Adenocarcinoma | 1,454 (93.3) | 1,181 (81.2) | 273 (18.8) | ||
| Mucinous adenocarcinoma | 38 (2.4) | 23 (60.5) | 15 (39.5) | ||
| Mixed adenocarcinoma | 66 (4.2) | 57 (86.4) | 9 (13.6) | ||
| Tumor differentiation | 20.06 | <0.001*** | |||
| High | 586 (37.6) | 485 (82.8) | 101 (17.2) | ||
| Medium | 897 (57.6) | 730 (81.4) | 167 (18.6) | ||
| Low/undifferentiated | 75 (4.8) | 46 (61.3) | 29 (38.7) | ||
| pT stage | 85.73 | <0.001*** | |||
| pT1 | 85 (5.5) | 83 (97.6) | 2 (2.4) | ||
| pT2 | 372 (23.9) | 334 (89.8) | 38 (10.2) | ||
| pT3 | 812 (52.1) | 658 (81) | 154 (19) | ||
| pT4 | 289 (18.5) | 186 (64.4) | 103 (35.6) | ||
| pN stage | 77.2 | <0.001*** | |||
| pN0 | 971 (62.3) | 846 (87.1) | 125 (12.9) | ||
| pN1 | 387 (24.8) | 290 (74.9) | 97 (25.1) | ||
| pN2 | 200 (12.8) | 125 (62.5) | 75 (37.5) | ||
| pM stage | 146.7 | <0.001*** | |||
| No | 1,475 (94.7) | 1,236 (83.8) | 239 (16.2) | ||
| Yes | 83 (5.3) | 25 (30.1) | 58 (69.9) | ||
| UICC stage | 191.7 | <0.001*** | |||
| Stage I | 370 (23.7) | 340 (91.9) | 30 (8.1) | ||
| Stage II | 576 (37.0) | 499 (86.6) | 77 (13.4) | ||
| Stage III | 529 (34.0) | 397 (75.0) | 132 (25.0) | ||
| Stage IV | 83 (5.3) | 25 (30.1) | 58 (69.9) | ||
| Vascular invasion | 32.9 | <0.001*** | |||
| Negative | 1,242 (79.7) | 1,041 (83.8) | 201 (16.2) | ||
| Positive | 316 (20.3) | 220 (69.6) | 96 (30.4) | ||
| Perineural invasion | 47.25 | <0.001*** | |||
| Negative | 1,144 (73.4) | 973 (85.1) | 171 (14.9) | ||
| Positive | 414 (26.6) | 288 (69.6) | 126 (30.4) | ||
Data are presented as number (%). 0, alive; 1, dead. *, P<0.05; **, P<0.01; ***, P<0.001. AFP, alpha-fetoprotein; Alb, albumin; BMI, body mass index; CA19-9, carbohydrate antigen 19-9; CEA, carcinoembryonic antigen; CHOL, cholesterol; dMMR, MMR-deficient; LMR, lymphocyte-to-monocyte ratio; LOCRC, late-onset colorectal cancer; M, metastasis; MMR, mismatch repair; N, node; NLR, neutrophil-to-lymphocyte ratio; NPS, Naples prognostic score; OS, overall survival; PLR, platelet-to-lymphocyte ratio; pMMR, MMR-proficient; T, tumor; UICC, Union for International Cancer Control.
Univariate and multivariate analyses for the association of NPS, PLR, and clinicopathological factors with OS prognosis in LOCRC
The univariate analysis revealed significant correlations between OS and the following variables: NPS, gender, PLR, AFP, CEA, CA19-9, Alb, CHOL, NLR, LMR, tumor differentiation, histological type, pT stage, pN stage, pM stage, UICC stage, vascular invasion, and perineural invasion. These variables were further evaluated via multivariate analysis, and an elevated PLR (PLR ≥251.8) was identified as an independent prognostic risk factor, predictive of poorer OS (HR =1.358, 95% CI: 1.045–1.763; P=0.02). Conversely, NPS groups 0 and 1 were associated with improved survival (HR =0.390, 95% CI: 0.257–0.593; P<0.001). The following variables also demonstrated significant prognostic value: UICC stage (HR =0.269, 95% CI: 0.116–0.622; P=0.002), AFP (HR =0.544, 95% CI: 0.358–0.826; P=0.004), CEA (HR =0.728, 95% CI: 0.567–0.936; P=0.01), CA19–9 (HR =0.632, 95% CI: 0.487–0.822; P<0.001), tumor differentiation (HR =0.452, 95% CI: 0.293–0.697; P<0.001), pT stage (HR =0.109, 95% CI: 0.023–0.524; P=0.006), pN stage (HR =0.602, 95% CI: 0.439–0.823; P=0.002), and perineural invasion (HR =0.674, 95% CI: 0.524–0.866; P=0.002). NPS, PLR, and clinicopathological factors were identified as independent predictors of OS in patients with LOCRC (Table 2 and Figure 2).
Table 2
| Variable | Univariate | Multivariate | |||
|---|---|---|---|---|---|
| HR (95% CI) | P value | HR (95% CI) | P value | ||
| NPS group | |||||
| G2 | 0.59 (0.394–0.885) | 0.01* | 0.686 (0.454–1.035) | 0.07 | |
| G1 | 0.35 (0.235–0.522) | <0.001*** | 0.390 (0.257–0.593) | <0.001*** | |
| G0 | Ref. | ||||
| Gender | |||||
| Female | 1.47 (1.15–1.87) | 0.002** | |||
| Male | Ref. | ||||
| BMI, kg/m2 | |||||
| ≥18.5, <24 | 1.246 (0.827–1.878) | 0.29 | |||
| ≥24 | 1.003 (0.781–1.288) | 0.98 | |||
| <18.5 | Ref. | ||||
| MMR | |||||
| dMMR | 0.859 (0.643–1.148) | 0.30 | |||
| pMMR | Ref. | ||||
| Alb, g/dL | |||||
| <3.6 | 0.532 (0.395–0.716) | <0.001*** | |||
| ≥3.6 | Ref. | ||||
| CHOL, mg/dL | |||||
| ≤141.5 | 0.548 (0.424–0.708) | <0.001*** | |||
| >141.5 | Ref. | ||||
| NLR | |||||
| ≤3.3 | 0.684 (0.519–0.900) | 0.007** | |||
| >3.3 | Ref. | ||||
| LMR | |||||
| ≤2.5 | 0.544 (0.414–0.731) | <0.001*** | |||
| >2.5 | Ref. | ||||
| PLR | |||||
| ≥251.8 | 1.283 (1.002–1.643) | 0.049* | 1.358 (1.045–1.763) | 0.02* | |
| <251.8 | Ref. | ||||
| AFP, ng/mL | |||||
| ≥7 | 0.477 (0.316–718) | <0.001*** | 0.544 (0.358–0.826) | 0.004** | |
| <7 | Ref. | ||||
| CEA, ng/mL | <7 | ||||
| ≥5 | 0.442 (0.351–0.556) | <0.001*** | 0.728 (0.567–0.936) | 0.01* | |
| <5 | Ref. | ||||
| CA19-9, ng/mL | |||||
| ≥30 | 0.376 (0.296–0.477) | <0.001*** | 0.632 (0.487–0.822) | <0.001*** | |
| <30 | Ref. | ||||
| Tumor size (cm) | |||||
| ≥2 | 0.934 (0.734–1.172) | 0.56 | |||
| <2 | Ref. | ||||
| Tumor site | |||||
| Left-side colon cancer | 1.137 (0.850–1.520) | 0.39 | |||
| Rectal cancer | 0.833 (0.631–1.099) | 0.20 | |||
| Right-side colon cancer | Ref. | ||||
| Histological type | |||||
| Mucinous adenocarcinoma | 0.920 (0.472–1.792) | 0.81 | |||
| Mixed adenocarcinoma | 2.543 (1.111–5.819) | 0.03* | |||
| Adenocarcinoma | Ref. | ||||
| Tumor differentiation | |||||
| Medium | 0.324 (0.214–0.490) | <0.001*** | 0.452 (0.293–0.697) | <0.001*** | |
| Low/undifferentiated | 0.393 (0.265–0.584) | <0.001*** | 0.513 (0.342–0.769) | <0.001*** | |
| High | Ref. | ||||
| pT stage | |||||
| pT2 | 0.060 (0.015–0.245) | <0.001*** | 0.109 (0.023–0.524) | 0.006** | |
| pT3 | 0.283 (0.195–0.410) | <0.001*** | 0.384 (0.197–0.749) | 0.005** | |
| pT4 | 0.511 (0.389–0.656) | <0.001*** | 0.695 (0.538–0.899) | 0.006** | |
| pT1 | Ref. | ||||
| pN stage | |||||
| pN1 | 0.261 (0.186–0.348) | <0.001*** | 0.602 (0.439–0.823) | 0.002** | |
| pN2 | 0.539 (0.398–0.729) | <0.001*** | |||
| pN0 | Ref. | ||||
| pM stage | |||||
| Yes | 0.150 (0.113–0.200) | <0.001*** | |||
| No | Ref. | ||||
| Vascular invasion | |||||
| Positive | 0.415 (0.325–0.529) | <0.001*** | |||
| Negative | |||||
| Perineural invasion | |||||
| Positive | 0.502 (0.399–0.632) | <0.001*** | 0.674 (0.524–0.866) | 0.002** | |
| Negative | Ref. | ||||
| UICC stage | |||||
| Stage II | 0.076 (0.049–0.118) | <0.001*** | 0.269 (0.116–0.622) | 0.002** | |
| Stage III | 0.117 (0.083–0.165) | <0.001*** | 0.137 (0.081–0.232) | <0.001*** | |
| Stage IV | 0.245 (0.180–0.335) | <0.001*** | 0.343 (0.234–0.503) | <0.001*** | |
| Stage I | Ref. | ||||
*, P<0.05; **, P<0.01; ***, P<0.001. AFP, alpha-fetoprotein; Alb, albumin; BMI, body mass index; CA19-9, carbohydrate antigen 19-9; CEA, carcinoembryonic antigen; CHOL, cholesterol; CI, confidence interval; dMMR, MMR-deficient; HR, hazard ratio; LMR, lymphocyte-to-monocyte ratio; LOCRC, late-onset colorectal cancer; M, metastasis; MMR, mismatch repair; N, node; NLR, neutrophil-to-lymphocyte ratio; NPS, Naples prognostic score; OS, overall survival; PLR, platelet-to-lymphocyte ratio; pMMR, MMR-proficient; Ref., reference; T, tumor; UICC, Union for International Cancer Control.
Model training for the long-term prognosis of LOCRC
ML was applied to randomly divide patients with LOCRC into a training set (n=1,090) and an internal validation set (n=468) at a 7:3 ratio, and a predictive model was constructed based on identified independent prognostic factors. In Figure 3A, the four columns represent the C-index values of each algorithm across the training, internal validation, external validation, and overall mean datasets. The C-index of the LASSO + RSF model was 0.872 for the training set, 0.768 for the internal validation set, and 0.737 for the external validation set (Figure 3A). Among all evaluated models, the LASSO + RSF combination demonstrated the highest predictive performance and accurately estimated patient prognosis. Consequently, we selected the LASSO + RSF combination as the foundation for subsequent modeling.
ML was used to establish a prediction nomogram model for LOCRC according to the multivariate analysis results from the training set (Figure 3B). The nomogram integrated 10 prognostic variables (NPS, PLR, UICC staging, pT stage, pN stage, AFP, CEA, CA19-9, tumor differentiation, and perineural invasion) to predict 1-, 3-, and 5-year survival probabilities. Each prognostic factor was assigned a score between 0 and 100. The model provided rapid clinical risk stratification by visually translating individual patient characteristics into quantitative survival estimates. Importantly, higher total scores were generally associated with a better prognosis; however, in the pT and CEA subgroups, higher scores were associated with a worse prognosis.
Based on the results of the LASSO + RSF analysis, we constructed a prognostic risk model for OS and categorized patients into high- and low-risk groups based on risk scores. The survival probabilities of patients in the high-risk group were significantly worse than those of patients in the low-risk group (P<0.001), indicating that our risk model could accurately predict the prognosis of patients. The KM survival curves are provided in Figure 3C.
The AUC was used to assess the predictive power of the risk model. The AUC values of the risk model in the training set were 0.88, 0.91, and 0.89 at 1-, 3-, and 5-year, respectively, demonstrating exceptionally high predictive accuracy (Figure 3D).
DCA demonstrated that the model provided greater net benefit than did PLR alone for predicting survival at 1-, 3-, and 5-year across clinically relevant threshold probabilities (Figure 3E).
Calibration curves demonstrated excellent agreement between predicted and observed survival rates at 1-, 3-, and 5-year timepoints (Figure 3F). In addition to its favorable decision curve performance, the model exhibited robust prognostic accuracy and clinical utility across all time points.
Internal validation of the model for the long-term prognosis of LOCRC
We randomly selected 468 patients for internal validation, and the C-index of the LASSO + RSF model was 0.768 (Figure 3A), partly validating the efficacy of our risk modeling approach.
The AUC values of the risk model in the internal validation were 0.8, 0.79, and 0.77 at 1-, 3-, and 5-year, respectively, indicating high predictive accuracy (Figure 4A).
The KM curve demonstrated a significant difference between the high- and low-risk groups, and the survival probabilities of patients in the low-risk group were significantly better than those of patients in the high-risk group (P<0.001; Figure 4B). This suggested additional validation for the efficacy of our risk modeling approach.
The calibration curve showed that the predicted OS at 1-, 3-, and 5-year timepoints was highly consistent with the observed outcomes, indicating that the model was well-calibrated and robust (Figure 4C).
External validation of the model for the long-term prognosis of LOCRC
We validated the findings using an external dataset (n=420), and the C-index of the optimal predictive model was 0.737 (Figure 3A).
In the ROC analysis, the AUC values for the external validation set were 0.81, 0.79, and 0.76 at the 1-, 3-, and 5-year timepoints, respectively, indicating high predictive accuracy (Figure 5A).
KM analysis revealed significant survival differences between the high- and low-risk groups. The survival probabilities of patients in the high-risk group were significantly worse than those of patients in the low-risk group (P<0.001; Figure 5B).
The calibration curves suggested strong agreement between predicted and observed OS rates at 1-, 3-, and 5-year timepoints, indicating the model’s robust construction and predictive accuracy (Figure 5C). These results collectively confirmed the model’s clinical utility for risk assessment in this population.
Discussion
LOCRC is a prevalent malignancy of the gastrointestinal tract, and in China, LOCRC accounts for 11.8% of CRC cases and 9.5% of deaths (Figure S4A,S4B) (6,7). This clinical burden highlights the necessity of developing efficient prognostic models that incorporate routinely available clinicopathological variables to improve risk stratification in LOCRC populations.
In this study, NPS, PLR, and clinicopathological factors were independent prognostic factors for LOCRC. Perioperative nutritional-inflammatory biomarkers, easily quantified through routine blood tests, are clinically significant prognostic factors for survival outcomes in patients with LOCRC. The PLR is a validated prognostic indicator in gastrointestinal malignancies (19). Previous studies have suggested that PLR is a significant prognostic indicator for CRC (15,36). In our study, elevated PLR (PLR ≥251.8) independently predicted worse OS in patients with LOCRC (HR =1.358, 95% CI: 1.045–1.763; P=0.02; Figure 2B), which is consistent with previous studies that demonstrated elevated PLR to be an indicator of thrombocytosis and/or lymphocytopenia (37-39). Systemic inflammation promotes tumor progression through platelet-mediated mechanisms, including angiogenesis and immune evasion, as well as cytokine-driven thrombocytosis induced by interleukin-6 (IL-6) and interleukin-1 (IL-1). Concurrent CD4+ lymphocytopenia further compromises tumor immune surveillance, leading to poorer clinical outcomes (40-42).
The NPS, which integrates nutritional, inflammatory, and immune markers, is being increasingly used to predict outcomes in patients with colon (43), CRC (44), or renal cancer (45). The NPS, by incorporating all currently established markers, including Alb, CHOL, NLR, and LMR, offers optimal accuracy for prognosing patients with CRC. Other studies applying NPS to the prognostic prediction of patients with CRC have shown that patients with an NPS of 0 or 1 experience better survival outcomes (8,21). In our study of LOCRC, NPS groups 0, 1, and 2 had 971 (62.3%), 502 (32.2%), and 85 (5.5%) patients, respectively. Survival analysis identified NPS to be an independent prognostic factor for LOCRC. When the NPS was 0, there was no significant difference in the OS as compared to when the NPS was 2 (P=0.07); however, when NPS was 1, a significant difference in OS was observed (P<0.001). An NPS score of 0 indicates Alb >3.6 g/dL, CHOL >141.5 mg/dL, NLR <3.3, and LMR >2.5, reflecting a better nutritional and inflammatory status, and is associated with improved survival outcomes (8,36,46).
It has been well established that the prognosis of patients with CRC is correlated with UICC staging, pT stage, pN stage, preoperative AFP, CEA, and CA19-9 levels, tumor differentiation, and perineural invasion. Poor outcomes for CRC are frequently observed among older adult patients and those with advanced-stage disease, lymph node metastasis, perineural invasion, or elevated preoperative tumor markers (AFP >7 ng/mL, CEA >5 ng/mL, and CA19-9 >30 ng/mL). In our study, patients with LOCRC and advanced-stage disease, lymph node metastasis, perineural invasion, or elevated AFP (>7 ng/mL), CEA (>5 ng/mL), or CA19-9 (>30 ng/mL) levels had worse survival outcomes, consistent with previous findings (11,47-50).
Nomogram visualization allows for a more accurate prediction of disease probability, thereby facilitating better clinical decision-making (4,11,13,47,50). In our study, we combined independent prognostic factors identified by multivariable Cox regression analysis (including NPS, PLR, UICC staging, pT stage, pN stage, tumor differentiation, perineural invasion, and AFP, CEA, and CA19-9 levels) to construct a clinical prediction model. Several studies have demonstrated the superior utility of combinatorial ML models in renal cell carcinoma, prostate cancer, and older adult patients with hepatocellular carcinoma (30-32). The model in our study demonstrated strong predictive performance in training, internal validation, and external validation cohorts (with C-index values of 0.872, 0.786, and 0.737, respectively), with good consistency for long-term survival prognosis in patients with LOCRC. These results are consistent with those of previous studies and suggest the model’s potential utility for improving clinical decision-making.
The model is cost-effective and clinically applicable, utilizing readily available data from routine clinical and pathological parameters as well as preoperative blood tests. This accessibility may assist clinicians in decision-making processes. However, further validation through larger external datasets is required to confirm its reliability.
Limitations
Our study included several limitations: (I) a relatively small sample size for both training and validation; (II) potentially compromised survival analysis due to a short follow-up duration; (III) reduced statistical power due to a restricted sample size; (IV) inability of static preoperative NPS assessment to reflect dynamic interactions between inflammation, nutrition, and tumor progression; and (V) a lack of molecular biomarker data [particularly for Kirsten rat sarcoma (KRAS), v-raf murine sarcoma viral oncogene homolog B1 (BRAF), and microsatellite instability].
Future studies should address these limitations through (I) larger multicenter prospective designs; (II) extended follow-up for long-term prognosis evaluation; (III) serial biomarker measurements to assess temporal prognostic impacts; and (IV) incorporation of molecular profiling in LOCRC assessment.
Conclusions
The ML model integrating NPS, PLR, and clinicopathological factors effectively predicts the prognosis of patients with LOCRC, offering a practical, cost-effective tool for risk stratification and personalized treatment optimization.
Acknowledgments
We thank The First Affiliated Hospital of Kunming Medical University and The Third People’s Hospital of Honghe Prefecture for providing the data for this study. We also acknowledge the contributions of the research team, data analysts, and medical staff in patient care and data collection.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1795/rc
Data Sharing Statement: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1795/dss
Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1795/prf
Funding: This work was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1795/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was a retrospective analysis approved by the Ethics Committee of The First Affiliated Hospital of Kunming Medical University (No. 2024-L-124) and the Ethics Committee of The Third People’s Hospital of Honghe Prefecture (No. 2024-KYXM-29), which exempted the requirement for informed consent from the patients.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Gao XH, Li J, Liu LJ, et al. Trends, clinicopathological features, surgical treatment patterns and prognoses of early-onset versus late-onset colorectal cancer: A retrospective cohort study on 34067 patients managed from 2000 to 2021 in a Chinese tertiary center. Int J Surg 2022;104:106780. [Crossref] [PubMed]
- Li W, Liu J, Lan Y, et al. Development and validation of survival prediction tools in early and late onset colorectal cancer patients. Sci Rep 2025;15:12864. [Crossref] [PubMed]
- Okagawa Y, Seto K, Yoshida K, et al. Clinicopathological features of early-onset colorectal cancer in Japanese patients: a single-center retrospective study. BMC Gastroenterol 2025;25:156. [Crossref] [PubMed]
- Yu C, Zhang Y. Establishment of prognostic nomogram for elderly colorectal cancer patients: a SEER database analysis. BMC Gastroenterol 2020;20:347. [Crossref] [PubMed]
- Burnett-Hartman AN, Powers JD, Chubak J, et al. Treatment patterns and survival differ between early-onset and late-onset colorectal cancer patients: the patient outcomes to advance learning network. Cancer Causes Control 2019;30:747-55. [Crossref] [PubMed]
- Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2024;74:229-63. [Crossref] [PubMed]
- Wang S, Zheng R, Li J, et al. Global, regional, and national lifetime risks of developing and dying from gastrointestinal cancers in 185 countries: a population-based systematic analysis of GLOBOCAN. Lancet Gastroenterol Hepatol 2024;9:229-37. [Crossref] [PubMed]
- Ohkuma M, Takano Y, Goto K, et al. Significance of Naples prognostic score for postoperative complications after colorectal cancer surgery. Surg Today 2025;55:1481-7. [Crossref] [PubMed]
- Yamamoto T, Kawada K, Obama K. Inflammation-Related Biomarkers for the Prediction of Prognosis in Colorectal Cancer Patients. Int J Mol Sci 2021;22:8002. [Crossref] [PubMed]
- Nøst TH, Alcala K, Urbarova I, et al. Systemic inflammation markers and cancer incidence in the UK Biobank. Eur J Epidemiol 2021;36:841-8. [Crossref] [PubMed]
- Geng L, Wang L, Jiang X, et al. A prognostic nomogram based on desmoplastic reaction/tumor deposit modified lymph node staging in colorectal cancer. J Gastrointest Oncol 2025;16:485-502. [Crossref] [PubMed]
- Zhuang P, Chen JX, Xia B, et al. Impact of PG-SGA-assessed malnutrition stratification on clinical outcomes in advanced gastrointestinal malignancies: a retrospective cohort study. BMC Gastroenterol 2025;25:766. [Crossref] [PubMed]
- Li KJ, Zhang ZY, Wang K, et al. Prognostic scoring system using inflammation- and nutrition-related biomarkers to predict prognosis in stage I-III colorectal cancer patients. World J Gastroenterol 2025;31:104588. [Crossref] [PubMed]
- Chu B, Chen Y, Pan J. Prognostic significance of systemic immune inflammation index for ovarian cancer: An updated systematic review and meta-analysis. J Ovarian Res 2025;18:41. [Crossref] [PubMed]
- Coser RB, Nahas CSR, Cassenote AJF, et al. Response Prediction to Neoadjuvant Chemoradiotherapy in Rectal Cancer Based on Systemic Inflammatory Markers (NLR, PLR, and LMR). J Gastrointest Cancer 2025;56:134. [Crossref] [PubMed]
- Allahyari A, Fallah F, Bahrami Taqanaki P, et al. Evaluating the neutrophil-to-lymphocyte ratio (NLR) and platelet-to-lymphocyte ratio (PLR) as prognostic and treatment response biomarkers in stage IV colorectal cancer patients. Oncology in Clinical Practice 2024;21:200-5.
- Ding Y, Liu Z, Li J, et al. Predictive effect of the systemic inflammation response index (SIRI) on the efficacy and prognosis of neoadjuvant chemoradiotherapy in patients with locally advanced rectal cancer. BMC Surg 2024;24:89. [Crossref] [PubMed]
- Piringer G, Ponholzer F, Thaler J, et al. Prediction of survival after neoadjuvant therapy in locally advanced rectal cancer - a retrospective analysis. Front Oncol 2024;14:1374592. [Crossref] [PubMed]
- Ma L, Yang F, Guo W, et al. Prognostic role of platelet-to-lymphocyte ratio in patients with rectal cancer undergoing resection: a systematic review and meta-analysis. Front Oncol 2024;14:1415443. [Crossref] [PubMed]
- Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015;350:g7594. [Crossref] [PubMed]
- Sugimoto A, Fukuoka T, Shibutani M, et al. Prognostic significance of the Naples prognostic score in colorectal cancer patients undergoing curative resection: a propensity score matching analysis. BMC Gastroenterol 2023;23:88. [Crossref] [PubMed]
- Zhang Q, Xu R, Zhen W, et al. Competing risk and random survival forest models for predicting survival in post-resection elderly stage I-III colorectal cancer patients. Sci Rep 2025;15:24269. [Crossref] [PubMed]
- Ranstam J, Cook JA. LASSO regression. British Journal of Surgery 2018;105:1348.
- Lu S, Zhang X, Zheng X, et al. LASSO-Cox model in the prognostic evaluation of radiochemotherapy efficacy for lymph node metastatic nasopharyngeal carcinoma. Front Oncol 2025;15:1606967. [Crossref] [PubMed]
- Ridgeway G. Generalized Boosted Models: A guide to the gbm package. Available online: https://mirror.niser.ac.in/cran/web/packages/gbm/vignettes/gbm.pdf
- Aktürk Hayat E, Türe M, Şenol Ş. An Alternative Dimension Reduction Approach to Supervised Principal Components Analysis in High Dimensional Survival Data. Turkiye Klinikleri J Biostat 2016;8:21-9.
- Xu MQ, Jiang ZS, Liao WY, et al. Predicting Postoperative Recurrence Using a Support Vector Machine for Patients With Esophageal Squamous Cell Carcinoma: Machine Learning Modeling Development and Validation Study. JMIR Cancer 2025;11:e68027. [Crossref] [PubMed]
- Yamaguchi K, Abdelbaky S, Yu L, et al. PLASMA: Partial LeAst Squares for Multiomics Analysis. Cancers (Basel) 2025;17:287. [Crossref] [PubMed]
- Hamidi O, Amini P, Tapak L, et al. Prediction of Distant Metastasis of Lymph-Node-Negative Primary Breast Cancer From Gene Expression Profiling Using Cox-Boost Regression Model. Cancer Inform 2024;23:11769351241297493.
- Zhu B, Mo Z, Bao Y, et al. Integrated transcriptome analysis and combinatorial machine learning to construct a homeostatic model of acetylation for ccRCC and validate the key gene GCNT4. Cancer Cell Int 2025;25:236. [Crossref] [PubMed]
- Zhang J, Pan J, Lin J, et al. Prognostic and Predictive Value of Machine Learning-Based Biomarker and Pathomics Signatures in Patients With Prostate Cancer. Cancer Sci 2025;116:2893-906. [Crossref] [PubMed]
- Cai C, Zhu H, Li B, et al. Prognostic Analysis of Elderly Patients with Hepatocellular Carcinoma: an Exploration and Machine Learning Model Prediction Based on Age Stratification and Surgical Approach. J Hepatocell Carcinoma 2025;12:747-64. [Crossref] [PubMed]
- Janssens ACJW, Martens FK. Reflection on modern methods: Revisiting the area under the ROC Curve. Int J Epidemiol 2020;49:1397-403. [Crossref] [PubMed]
- Frank E. Harrell J. Regression Modeling Strategies. In: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. 2nd ed. Cham: Springer; 2015:582.
- Vickers AJ, Holland F. Decision curve analysis to evaluate the clinical benefit of prediction models. Spine J 2021;21:1643-8. [Crossref] [PubMed]
- Li K, Chen Y, Zhang Z, et al. Preoperative pan-immuno-inflammatory values and albumin-to-globulin ratio predict the prognosis of stage I-III colorectal cancer. Sci Rep 2025;15:11517. [Crossref] [PubMed]
- Altıntaş YE, Bilici A, Yıldız Ö, et al. The Role of Pre-Treatment Inflammatory Biomarkers in Predicting Tumor Response to Neoadjuvant Chemoradiotherapy in Rectal Cancer. Medicina (Kaunas) 2025;61:865. [Crossref] [PubMed]
- Xu N, Zhang JX, Zhang JJ, et al. The prognostic value of the neutrophil-to-lymphocyte ratio (NLR) and platelet-to-lymphocyte ratio (PLR) in colorectal cancer and colorectal anastomotic leakage patients: a retrospective study. BMC Surg 2025;25:57. [Crossref] [PubMed]
- Wang X, Pu X, Wu X, et al. Prognostic implication of dynamic platelet count in lung cancer patients with thrombocytosis: a retrospective analysis. PeerJ 2025;13:e19551. [Crossref] [PubMed]
- Liao K, Zhang X, Liu J, et al. The role of platelets in the regulation of tumor growth and metastasis: the mechanisms and targeted therapy. MedComm (2020) 2023;4:e350.
- Gan J, Zhang X, Guo J. The role of platelets in tumor immune evasion and metastasis: mechanisms and therapeutic implications. Cancer Cell Int 2025;25:258. [Crossref] [PubMed]
- Garcia-Leon MJ, Liboni C, Mittelheisser V, et al. Platelets favor the outgrowth of established metastases. Nat Commun 2024;15:3297. [Crossref] [PubMed]
- Li X, Cheng C, Huo X, et al. Clinical significance of the modified Naples prognostic score in patients with stage II-III colon cancer undergoing curative resection: a retrospective study from the real world. Front Oncol 2024;14:1403666. [Crossref] [PubMed]
- Park SH, Woo HS, Hong IK, et al. Impact of Postoperative Naples Prognostic Score to Predict Survival in Patients with Stage II-III Colorectal Cancer. Cancers (Basel) 2023;15:5098. [Crossref] [PubMed]
- Aytaç İ, Güven Aytaç B, Kilci O, et al. Naples Prognostic Score for Graft Functions After Renal Transplantation: A Retrospective Analysis. Ann Transplant 2023;28:e942007. [Crossref] [PubMed]
- Gu J, Deng S, Jiang Z, et al. Modified Naples prognostic score for evaluating the prognosis of patients with obstructive colorectal cancer. BMC Cancer 2023;23:941. [Crossref] [PubMed]
- Tang M, Wang H, Cao Y, et al. Nomogram for predicting occurrence and prognosis of liver metastasis in colorectal cancer: a population-based study. Int J Colorectal Dis 2021;36:271-82. [Crossref] [PubMed]
- Ammendola S, Turri G, Marconi I, et al. The presence of poorly differentiated clusters predicts survival in stage II colorectal cancer. Virchows Arch 2021;478:241-8. [Crossref] [PubMed]
- Que Y, Wu R, Li H, et al. A prediction nomogram for perineural invasion in colorectal cancer patients: a retrospective study. BMC Surg 2024;24:80. [Crossref] [PubMed]
- Xu R, Chi H, Zhang Q, et al. Enhancing the diagnostic accuracy of colorectal cancer through the integration of serum tumor markers and hematological indicators with machine learning algorithms. Clin Transl Oncol 2025;27:299-308. [Crossref] [PubMed]

