Enhanced prognostic prediction of cancer-specific mortality in elderly bladder cancer patients post-radical cystectomy: an XGBoost model study

Gaowei Li; Kang Xia

doi:10.21037/tcr-24-2023

Original Article

Enhanced prognostic prediction of cancer-specific mortality in elderly bladder cancer patients post-radical cystectomy: an XGBoost model study

Gaowei Li¹ , Kang Xia²

¹Department of Oncology, Chongqing Hospital of The First Affiliated Hospital of Guangzhou University of Chinese Medicine (Chongqing Beibei Hospital of Traditional Chinese Medicine), Chongqing, China; ²Department of Urology, Chongqing Hospital of The First Affiliated Hospital of Guangzhou University of Chinese Medicine (Chongqing Beibei Hospital of Traditional Chinese Medicine), Chongqing, China

Contributions: (I) Conception and design: Both authors; (II) Administrative support: Both authors; (III) Provision of study materials or patients: Both authors; (IV) Collection and assembly of data: K Xia; (V) Data analysis and interpretation: K Xia; (VI) Manuscript writing: Both authors; (VII) Final approval of manuscript: Both authors.

Correspondence to: Kang Xia, MD. Department of Urology, Chongqing Hospital of The First Affiliated Hospital of Guangzhou University of Chinese Medicine (Chongqing Beibei Hospital of Traditional Chinese Medicine), No. 380 Jiangjun Road, Beibei District, Chongqing 400700, China. Email: 15025405080@163.com.

Background: Tumor stage, surgery and age are positively correlated with cancer-specific mortality (CSM) in patients diagnosed with bladder cancer (BCa). In light of the successful application of machine learning to process big data in many fields outside of medicine, we aimed to establish and validate whether machine learning models could improve our ability to predict the development of CSM in elderly BCa patients after radical cystectomy (RC).

Methods: Data on eligible patients diagnosed with BCa were obtained from the Surveillance, Epidemiology, and End Results database (2000–2021) and divided into training and validation cohorts in a ratio of 7:3. First, risk factors for the development of CSM in patients were identified by Cox regression analysis. Then, iterative testing and tuning through automated hyperparameter optimization and ten-fold cross-validation were performed to generate stable extreme gradient boosting (XGBoost) models with optimal performance. Receiver operating characteristic (ROC) curve, area under the curve (AUC), calibration curve and confusion matrix were used to evaluate the performance of XGBoost model.

Results: There were 11,763 patients included, of which 5,788 died from BCa. By the comparison of different machine learning models, the final XGBoost model we constructed showed high accuracy and precision in predicting the development of CSM in BCa patients (6-month CSM: AUC =0.799, 12-month CSM: AUC =0.756, 36-month CSM: AUC =0.746, and 60-month CSM: AUC =0.745). The results of accuracy, precision, recall and F1 score confirmed the superior performance of the XGBoost model. The important scores for clinical characteristics and the Shapley Additive Explanations plots highlighted the importance of key factors: chemotherapy, tumor stage, marital status, and tumor size were the top four factors in all models.

Conclusions: Our study validated and confirmed the feasibility and high performance of the XGBoost model in predicting CSM in elderly BCa patients after RC. The potential of machine learning contributes to accurately predict the prognosis of cancer.

Keywords: Bladder cancer (BCa); machine learning; extreme gradient boosting (XGBoost); radical cystectomy (RC); cancer-specific mortality (CSM)

Submitted Oct 20, 2024. Accepted for publication Jan 09, 2025. Published online Mar 27, 2025.

doi: 10.21037/tcr-24-2023

Highlight box

Key findings

• Enhanced prognostic prediction: the study presents an extreme gradient boosting (XGBoost) model that markedly enhances the prediction of cancer-specific mortality (CSM) in elderly bladder cancer (BCa) patients following radical cystectomy (RC), with high accuracy across diverse timeframes.

What is known and what is new?

• Traditional methods predict BCa outcomes based on clinical and pathological features, often inadequate for elderly patients due to their complex health profiles.

• This study introduces a machine learning approach, specifically the XGBoost model, to predict CSM in elderly BCa patients post-RC, thereby addressing a previously overlooked demographic.

What is the implication, and what should change now?

• The XGBoost model has the potential to enhance clinical decision-making by providing more accurate CSM predictions for elderly BCa patients post-RC.

• Clinicians are advised to consider integrating machine learning models into their practice. Further research is required for external validation and inclusion of additional variables like lifestyle factors and quality of life, which are not currently captured in the Surveillance, Epidemiology, and End Results database.

Introduction

Bladder cancer (BCa) is one of the most prevalent malignancies worldwide, particularly among the elderly population (1,2). In 2018, there were approximately 550,000 new cases and 200,000 deaths per year worldwide, with annual increases (3,4). The most important risk factor for BCa is smoking, which accounts for 50% of all cases, followed by occupational exposure to aromatic amines and ionizing radiation (5,6). Radical cystectomy (RC) is the standard surgical treatment for muscle invasive BCa (7). However, elderly patients face a higher risk of cancer-specific mortality (CSM) post-surgery (8). Traditional prognostic methods primarily rely on clinical and pathological features such as tumor stage, but these methods often fall short in accurately predicting outcomes for elderly patients (9). The advent of big data and enhanced computational power has led to the development of machine learning methods, which have demonstrated remarkable potential in medical prognostication by processing complex, multidimensional data to improve prediction accuracy (10).

Machine learning algorithm, such as logistic regression, random forests, and neural networks, have been applied to the prognostication of various cancers, yielding significant results (11-13). These models analyze extensive patient data to identify key prognostic factors and generate personalized predictions. Despite the proven efficacy of machine learning in BCa prognostication, research focusing specifically on elderly patients remains limited. Elderly patients present unique challenges due to physiological decline and multiple comorbidities, which can complicate the process of accurate prognostication. Therefore, it is necessary to establish an accurate CSM prediction model for elderly BCa patients after RC using machine learning algorithms to provide a scientific basis for clinical decision-making.

This study leverages data from the Surveillance, Epidemiology, and End Results (SEER) database, which provides a comprehensive range of information on patients, including demographic details, tumor characteristics, treatment details, and survival outcomes. By training and validating diverse machine learning models, including extreme gradient boosting (XGBoost), logistic regression, support vector machine, and random forest, we expect to identify model with optimal-performance and evaluate its effectiveness in predicting CSM at 6-, 12-, 36-, and 60-month follow-up time. The objective was to establish and validate the potential of machine learning models to enhance the ability to predict the development of CSM in elderly BCa patients after RC. This study contributes to the development of clinical artificial intelligence models, optimizes the long-term follow-up for elderly BCa patients after RC, and provides deeper insight into their prognosis. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-2023/rc).

Methods

Population selection

Figure 1 illustrates our study design and analytical workflow. The data for this study were sourced from the SEER database [SEER research data, 17 registries, Nov 2023 Sub (2000–2021)], which is publicly accessible. We included patients who were pathologically diagnosed with BCa between 2000 and 2021. Cases diagnosed by death certificate and autopsy were excluded. Cases with incomplete information were also excluded. The primary endpoint of interest was CSM in elderly BCa patients after RC.

Figure 1 The flowchart described the process of conducting the study and statistical analysis. AUC, area under the curve; CSM, cancer specific mortality; ROC, receiver operating characteristic; XGBoost, extreme gradient boosting.

This study utilized anonymized data, in accordance with the Declaration of Helsinki (as revised in 2013). Therefore, patient informed consent and Institutional Review Board (IRB) approval were not required.

Variable selection using Cox regression analysis

When collecting raw data from the SEER database, the following variables were included: age, sex, race, year of diagnosis, summary stage, histologic type, surgery on primary site, radiation recode, chemotherapy recode, time from diagnosis to treatment in days recode, tumor size, marital status, median household income, survival month, cause of death, vital status. According to the World Health Organization (WHO) age classification, age was categorized as follows: 0–17 years as minors, 18–44 years as young adults, 45–59 years as middle-aged, and 60 years and older as elderly. Time from diagnosis to treatment in days recode was classified into “<1 month”, “1 to 3 months”, and “over 3 months”. Tumor size was classified into “0–29 mm”, “30–59 mm”, and “60+ mm”.

First, patients over 60 years of age who had undergone RC and whose cause of death was BCa were selected. The clinical characteristics were then subjected to univariate and multivariate Cox analysis. Finally, the variables that were statistically significant in the multivariate Cox analysis were incorporated into a machine learning model to predict the CSM of BCa patients after RC at 6-, 12-, 36- and 60-month.

XGBoost model

The advantage of the XGBoost algorithm over other machine learning algorithms is its extremely high computational efficiency and prediction accuracy, rapidly training large-scale data through parallel processing and distributed computing, while using techniques such as regularization to effectively prevent overfitting. Its flexibility makes it suitable for a variety of tasks (classification, regression, sorting, etc.), and it can handle missing values and adapt to various data types.

Published studies have demonstrated the superiority of machine learning algorithms over traditional statistical models (14-16), with the XGBoost model performing better (17,18). Therefore, we chose to develop the XGBoost model and also validate our choice by comparing different machine learning algorithms (logistic regression, support vector machine, and random forest).

All patients were randomly divided into training and validation cohort in a ratio of 7:3. Prior to training the model, the survival status of the patients was converted into numerical values in the form of label encoding (0= survival, 1= death), and Table S1 shows the label encoding of the variables used in the XGBoost model.

Model performance evaluation

Receiver operating characteristic (ROC) curve, area under the curve (AUC), calibration curve and confusion matrix are used to evaluate the performance of XGBoost model. Accuracy, precision, recall and F1 score are the main parameters evaluated in the confusion matrix.

We also present a visualization of how much each feature contributes to the model’s prediction results through a feature importance plot (19). It helps us to understand the decision-making process of the model, identifying the characteristics that have significant impact on the prediction, and also reveals which characteristics have less impact. In addition, we demonstrate the contribution of variables to the results using the Shapley Additive Explanations (SHAP) plot, which is a method for interpreting the predictions of machine learning models. It is used to quantify the contribution of each feature to the model predictions. By calculating the average marginal effect of each feature on the model output, SHAP values provide a fair and interpretable way to assess the importance of features.

Statistical analysis

All analyses were performed using R 4.2.3 (R foundation for Statistical Computing, Vienna, Austria). All statistical analyses were two-side with a P value <0.05 being considered statistically significant.

Results

Baseline characteristics

Finally, 11,763 patients were included in this study, of which 5,788 (49.21%) died from BCa. The clinicopathological characteristics of elderly BCa patients after RC (including primary cohort, training cohort, and validation cohort) are summarized in Table 1. All the patients were 60 years and older. The majority of patients were male (n=8,584, 72.97%) and white (n=10,435, 88.71%). Localized (n=4,635, 39.40%) and regional (n=6,007, 51.07%) stage was most common. Histologic type of transitional cell carcinoma (n=10,643, 90.48%) was the most prevalent. Only a few patients (n=467, 3.97%) received radiation therapy, while 5,557 (47.24%) patients received chemotherapy; 84.71% (n=9,964) patients received treatment within 1 month after diagnosis. Tumor size was 30–59 mm in most patients (n=5,726, 48.68%). The majority of patients (n=7,841, 66.66%) were married and 78.13% (n=9,191) patients had a median household income of $50,000–$100,000. Culminative incidence curve is shown in Figure S1.

Table 1

Baseline characteristics of patients in the training cohort and the validation cohort

Variables	Primary cohort (n=11,763)	Training cohort (n=8,234, 70%)	Validation cohort (n=3,529, 30%)
Sex
Male	8,584 (72.97)	5,967 (72.47)	2,617 (74.16)
Female	3,179 (27.03)	2,267 (27.53)	912 (25.84)
Race
White	10,435 (88.71)	7,312 (88.80)	3,123 (88.50)
Black	639 (5.43)	455 (5.53)	184 (5.21)
Other	689 (5.86)	467 (5.67)	222 (6.29)
Stage
In situ	161 (1.37)	123 (1.49)	38 (1.08)
Localized	4,635 (39.40)	3,240 (39.35)	1,395 (39.53)
Regional	6,007 (51.07)	4,217 (51.21)	1,790 (50.72)
Distant	960 (8.16)	654 (7.94)	306 (8.67)
Histologic type
Transitional cell carcinoma	10,643 (90.48)	7,460 (90.60)	3,183 (90.20)
Adenocarcinoma	509 (4.33)	363 (4.41)	146 (4.14)
Squamous cell carcinoma	231 (1.96)	161 (1.96)	70 (1.98)
Neuroendocrine carcinoma	281 (2.39)	188 (2.28)	93 (2.64)
Other epithelial tumors	99 (0.84)	62 (0.75)	37 (1.05)
Radiation therapy
Yes	467 (3.97)	320 (3.89)	147 (4.17)
No/unknown	11,296 (96.03)	7,914 (96.11)	3,382 (95.83)
Chemotherapy
Yes	5,557 (47.24)	3,889 (47.23)	1,668 (47.27)
No/unknown	6,206 (52.76)	4,345 (52.77)	1,861 (52.73)
Time from diagnosis to treatment
<1 month	9,964 (84.71)	6,955 (84.47)	3,009 (85.26)
1–3 months	1,484 (12.62)	1,059 (12.86)	425 (12.04)
Over 3 months	315 (2.68)	220 (2.67)	95 (2.69)
Tumor size
<30 mm	3,231 (27.47)	2,234 (27.13)	997 (28.25)
30–59 mm	5,726 (48.68)	4,041 (49.08)	1,685 (47.75)
60+ mm	2,806 (23.85)	1,959 (23.79)	847 (24.00)
Marital status
Married	7,841 (66.66)	5,462 (66.33)	2,379 (67.41)
Separated	77 (0.65)	50 (0.61)	27 (0.77)
Divorced	1100 (9.35)	773 (9.39)	327 (9.27)
Widowed	1,483 (12.61)	1,056 (12.82)	427 (12.10)
Unmarried	1,262 (10.73)	893 (10.85)	369 (10.46)
Median household income
<$50,000	604 (5.13)	426 (5.17)	178 (5.04)
$50,000–100,000	9,191 (78.13)	6,405 (77.79)	2,786 (78.95)
$100,000+	1,968 (16.73)	1,403 (17.04)	565 (16.01)
Year of diagnosis
2000–2005	1,654 (14.06)	1,172 (14.23)	482 (13.66)
2006–2010	2,285 (19.43)	1,601 (19.44)	684 (19.38)
2011–2015	3,041 (25.85)	2,116 (25.70)	925 (26.21)
2016–2021	4,783 (40.66)	3,345 (40.62)	1,438 (40.75)

Univariable and multivariable Cox regression analysis

Univariate Cox regression analysis was performed to identify the variables that had a significant effect on the CSM of the patients, including sex, race, tumor stage, histologic type, radiation therapy, chemotherapy, time from diagnosis to treatment, tumor size, marital status, and median household income (Table 2).

Table 2

Univariate and multivariate Cox analysis of characteristics extracted from SEER database

Variables	Univariate Cox analysis		Multivariate Cox analysis
Variables	HR [95% CI]	P	HR [95% CI]	P
Sex
Male	Reference		Reference
Female	1.198 [1.132–1.268]	<0.001	0.987 [0.928–1.049]	0.67
Race
White	Reference		Reference
Black	1.235 [1.108–1.377]	<0.001	1.051 [0.940–1.174]	0.38
Other	0.880 [0.785–0.987]	<0.001	0.873 [0.778–0.980]	<0.001
Stage
In situ	Reference		Reference
Localized	1.996 [1.333–2.990]	<0.001	2.315 [1.545–3.468]	<0.001
Regional	6.268 [4.195–9.364]	<0.001	7.311 [4.890–10.929]	<0.001
Distant	13.21 [8.797–19.837]	<0.001	16.087 [10.696–24.194]	<0.001
Histologic type
Transitional cell carcinoma	Reference		Reference
Adenocarcinoma	1.624 [1.450–1.819]	<0.001	1.014 [0.902–1.139]	0.82
Squamous cell carcinoma	1.195 [1.003–1.423]	<0.001	0.873 [0.732–1.041]	0.13
Neuroendocrine carcinoma	1.377 [1.175–1.613]	<0.001	1.641 [1.399–1.925]	<0.001
Other epithelial tumors	1.437 [1.114–1.854]	<0.001	1.335 [1.034–1.724]	<0.001
Radiation therapy
Yes	Reference		Reference
No/unknown	0.527 [0.472–0.588]	<0.001	0.726 [0.649–0.813]	<0.001
Chemotherapy
Yes	Reference		Reference
No/unknown	1.473 [1.397–1.553]	<0.001	1.829 [1.731–1.933]	<0.001
Time from diagnosis to treatment
<1 month	Reference		Reference
1–3 months	1.182 [1.098–1.273]	<0.001	1.064 [0.987–1.146]	0.10
Over 3 months	1.281 [1.108–1.482]	<0.001	1.053 [0.909–1.219]	0.49
Tumor size
<30 mm	Reference		Reference
30–59 mm	1.382 [1.295–1.475]	<0.001	1.271 [1.190–1.357]	<0.001
60+ mm	1.966 [1.828–2.115]	<0.001	1.586 [1.472–1.709]	<0.001
Marital status
Married	Reference		Reference
Separated	1.001 [0.714–1.403]	0.99	0.858 [0.611–1.203]	0.37
Divorced	1.169 [1.071–1.276]	<0.001	1.129 [1.033–1.234]	<0.001
Widowed	1.636 [1.522–1.758]	<0.001	1.386 [1.283–1.497]	<0.001
Unmarried	1.048 [0.960–1.145]	0.30	1.012 [0.926–1.106]	0.79
Median household income
<$50,000	Reference		Reference
$50,000–100,000	0.939 [0.838–1.053]	0.28	0.922 [0.822–1.033]	0.16
$100,000+	0.866 [0.762–0.986]	<0.001	0.922 [0.810–1.050]	0.22

CI, confidence interval; HR, hazard ratio; SEER, Surveillance, Epidemiology, and End Results.

We then performed multivariate Cox regression analysis to eliminate confounding factors and identify independent risk factors for CSM (Table 2). The results showed that sex, time from diagnosis to treatment, and median household income were not independent factors. While white and black have a higher risk of CSM. The risk of CSM increased significantly with increasing tumor stage. Neuroendocrine carcinoma had the highest risk of CSM of all histologic types. Patients who did not receive radiation therapy had a significantly lower risk of developing CSM, but patients who did not receive chemotherapy had a significantly higher risk of developing CSM. The risk of CSM increased with tumor size. Widowed patients have the highest risk of CSM.

XGBoost model development and performance evaluation

Based on independent risk factors identified according to multivariate Cox regression analysis, we built an XGBoost machine learning model to predict patients’ CSM at 6-, 12-, 36-, and 60-month. The primary patient cohort was divided into training and validation cohorts in a 7:3 ratio. To ensure the generation of stable XGBoost model with optimal performance, we used the train function from the caret package (version 6.0-94) for automated hyperparameter optimization, combined with iterative testing and tuning using ten-fold cross-validation in the training set (Table S2). The performance of the XGBoost model was visualized by plotting ROC curves at different follow-up time (Figure 2), and the corresponding AUC values were calculated. The XGBoost model we built performed well in predicting CSM in BCa patients after RC (Table S3): 6-month (train set: AUC =0.810; validation set: AUC =0.799), 12-month (train set: AUC =0.790; validation set: AUC =0.756), 36-month (train set: AUC =0.761; validation set: AUC =0.746), and 60-month (train set: AUC =0.773; validation set: AUC =0.745). The calibration curves are shown in Figure S2.

Figure 2 ROC curves for XGBoost model evaluation: train data (A,C,E,G), test data (B,D,F,H). AUC, area under the curve; ROC, receiver operating characteristic curve; XGBoost, extreme gradient boosting.

The accuracy, precision, recall and F1 score of the XGBoost model were then evaluated using the confusion matrix (Figure S3): 6-month CSM predicted by XGBoost model (accuracy: 0.817, precision: 0.847, recall: 0.920, F1 score: 0.882), 12-month CSM predicted by XGBoost model (accuracy: 0.779, precision: 0.786, recall: 0.960, F1 score: 0.864), 36-month CSM predicted by XGBoost model (accuracy: 0.735, precision: 0.742, recall: 0.928, F1 score: 0.825), 60-month CSM predicted by XGBoost model (accuracy: 0.720, precision: 0.735, recall: 0.859, F1 score: 0.792). Overall, our XGBoost model showed highly efficient and accurate prediction performance.

In addition, we visualize the importance score of the clinical features in the XGBoost model (Figure 3). The results showed that the top 4 risk factors were chemotherapy, tumor grade, marital status, and tumor size. Chemotherapy had the greatest impact on the development of CSM in the short-term prognostic model (6-month model: Figure 3A). While tumor stage had the highest importance score among the remaining XGBoost models (12-month model: Figure 3B, 36-month model: Figure 3C, 60-month model: Figure 3D). We also visualized the XGBoost model in the form of a SHAP plot. As shown in Figure 4 (A: 6-month model, B: 12-month model, C: 36-month model, D: 60-month model), the clinical features in the SHAP plot are organized in descending order of importance. For each feature, a single point is assigned to a single patient. The position of a point along the x-axis represents the impact that the feature had on the model’s output for that specific patient. The higher a feature is positioned in the SHAP plot, the more important it is for the model. Among them, tumor stage and chemotherapy were the most important contributing factor, followed by marital status, tumor size, radiation therapy, histologic type, and race.

Figure 3 Importance score of each clinical feature in the XGBoost prognostic model (ranked by their importance). XGBoost, extreme gradient boosting.

Figure 4 The SHAP plot of the XGBoost model (A: 6-month model; B: 12-month model; C: 36-month model; D: 60-month model). SHAP, Shapley Additive Explanations; XGBoost, extreme gradient boosting.

Discussion

BCa is widely recognized as one of the most prevalent malignant tumors of the urinary system and ranks as the tenth most common malignancy globally (4). The high rates of incidence, recurrence, and progression of BCa significantly contribute to the substantial morbidity experienced by patients (4,7). With the global increase in aging population, there is a yearly rise in the proportion of BCa cases observed in elderly individuals. Moreover, BCa in the elderly often presents at an advanced stage, influenced by economic, educational, and other determinants. Even after RC, elderly BCa patients continue to experience a high incidence of CSM attributable to their significant comorbidities (2). Limited research studies have specifically addressed CSM in this patient population. This study provides enhanced prognostic prediction of CSM after RC in elderly BCa patients by applying the XGBoost model. The modeling is based on an in-depth analysis of a large-scale dataset from the SEER database, which revealed key clinical and pathological factors that have a significant impact on long-term patient survival.

Previous studies have demonstrated that machine learning models outperform conventional models in cancer prognosis prediction (20,21). In this study, Cox regression was used for variable selection, while a superior performance CSM prediction model was constructed by the XGBoost algorithm. Compared to other machine learning models, the AUC value shows that the XGBoost model has the best performance in predicting patients’ CSM at 6-, 12-, 36- and 60-month. And the results of the validation set confirm the good predictive ability of our model.

In this study, we found that chemotherapy plays a crucial role in elderly BCa patients undergoing RC. In addition, tumor stage and marital status, as independent risk factors, had a significant impact on the prediction of CSM. Receiving chemotherapy appeared to significantly reduce the odds of developing CSM in elderly BCa patients after RC, whereas the effect of whether or not they received radiotherapy was 0. It has been shown that the use of RC combination chemotherapy is associated with greater pathologic downstaging in all age groups, including in elderly patients (22). For muscle-invasive BCa, it has been shown that neoadjuvant chemotherapy should be given whenever the patient can tolerate cisplatin-based chemotherapy (23,24). A meta-analysis published in 2021 showed an 18% relative reduction in the risk of CSM with cisplatin-adjuvant chemotherapy and a 6% absolute improvement in 5-year survival (25). Therefore, in conjunction with the NCCN guidelines (26), it is recommended that combination cisplatin adjuvant chemotherapy should be considered for elderly patients treated with RC who are medically fit enough to receive cisplatin therapy.

Tumor stage is an important factor in all CSM prediction models and is positively associated with the risk of developing CSM. And the impact of marital status on CSM was the third highest in all predictive models. The results shown that being widowed has a high positive impact on CSM in elderly patients, which is consistent with published studies (27-29). Epidemiologic studies have shown that psychosocial factors and social support play an important role in the relationship between marital status and survival, and that widowed patients have a greater lack of emotional support and social attention than married patients, which may contribute to an increased risk of CSM that they will experience (30,31).

A meta-analysis published in 2022 indicated that traditional risk prediction models have a high risk of bias and their applicability is uncertain (32). The machine learning employed in this study utilizes distinct data processing methods compared to traditional statistical analysis. Traditional statistical methods involve pre-designed models that accept only theoretically relevant or significant univariate parameters, while machine learning can identify patterns by processing a vast number of variables from extensive datasets. Subsequently, these patterns are encoded into a mathematical model and utilized for further validation with additional data (33). Machine learning applications can reveal concealed patterns that are not detectable through traditional statistical analysis (13,34). Moreover, machine learning algorithms can analyze diverse types of information (e.g., demographic data, radiographic data, and laboratory results), rendering them more scalable than traditional predictive models (35). Through the integration of the SEER database with the potent tool of machine learning, we assessed four machine learning algorithms and demonstrated the superior predictive performance of the XGBoost model.

Although this study is based on a large-scale dataset from the SEER database, the model has not been validated on an external dataset. In addition, due to the limitations of the SEER database, we failed to include some potential factors that may affect prognosis, such as smoking and drinking habits. In addition, a published study (36) has shown the importance of quality of life in patients treated with RC, as the SEER database does not yet contain data on quality of life, we were unable to conduct further research and analysis. Future studies should address these limitations by including a wider range of variables and performing external validation.

Conclusions

In summary, this study validated the effectiveness and superiority of the XGBoost model in predicting CSM after RC in elderly BCa patients. Our findings provide clinicians with a powerful tool to help them better assess and manage this patient population. Future work will focus on further optimization and external validation of the model to ensure its ability to generalize across different patient populations.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-2023/rc

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-2023/prf

Funding: None.

Conflicts of Interest: Both authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-2023/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Ferro M, Chiujdea S, Vartolomei MD, et al. Advanced Age Impacts Survival After Radical Nephroureterectomy for Upper Tract Urothelial Carcinoma. Clinical Genitourinary Cancer 2024;22:27-37. [Crossref] [PubMed]
Bizzarri FP, Scarciglia E, Russo P, et al. Elderly and bladder cancer: The role of radical cystectomy and orthotopic urinary diversion. Urologia 2024;91:500-4. [Crossref] [PubMed]
Bray F, Ferlay J, Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2018;68:394-424. [Crossref] [PubMed]
Siegel RL, Giaquinto AN, Jemal A. Cancer statistics, 2024. CA Cancer J Clin 2024;74:12-49. [Crossref] [PubMed]
Abbas NF, Aoude MR, Kourie HR, et al. Uncovering the epidemiology of bladder cancer in the Arab world: A review of risk factors, molecular mechanisms, and clinical features. Asian J Urol 2024;11:406-22. [Crossref] [PubMed]
Powles T, Bellmunt J, Comperat E, et al. Bladder cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann Oncol 2022;33:244-58. [Crossref] [PubMed]
Richters A, Aben KKH, Kiemeney LALM. The global burden of urinary bladder cancer: an update. World J Urol 2020;38:1895-904. [Crossref] [PubMed]
Wang G, Lam KM, Deng Z, et al. Prediction of mortality after radical cystectomy for bladder cancer by machine learning techniques. Comput Biol Med 2015;63:124-32. [Crossref] [PubMed]
Klén R, Salminen AP, Mahmoudian M, et al. Prediction of complication related death after radical cystectomy for bladder cancer with machine learning methodology. Scand J Urol 2019;53:325-31. [Crossref] [PubMed]
Su H, Xue X, Wang Y, et al. Competitive Risk Model for Specific Mortality Prediction in Patients with Bladder Cancer: A Population-Based Cohort Study with Machine Learning. J Oncol 2022;2022:9577904. [Crossref] [PubMed]
Li C, Liu M, Zhang Y, et al. Novel models by machine learning to predict prognosis of breast cancer brain metastases. J Transl Med 2023;21:404. [Crossref] [PubMed]
Bhambhvani HP, Zamora A, Shkolyar E, et al. Development of robust artificial neural networks for prediction of 5-year survival in bladder cancer. Urol Oncol 2021;39:193.e7-193.e12. [Crossref] [PubMed]
Hasnain Z, Mason J, Gill K, et al. Machine learning models for predicting post-cystectomy recurrence and survival in bladder cancer patients. PLoS One 2019;14:e0210976. [Crossref] [PubMed]
Liang Q, Xu X, Ding S, et al. Prediction of successful weaning from renal replacement therapy in critically ill patients based on machine learning. Ren Fail 2024;46:2319329. [Crossref] [PubMed]
Foroughi M, Arzehgar A, Seyedhasani SN, et al. Application of machine learning for antibiotic resistance in water and wastewater: A systematic review. Chemosphere 2024;358:142223. [Crossref] [PubMed]
Balian J, Sakowitz S, Verma A, et al. Machine learning based predictive modeling of readmissions following extracorporeal membrane oxygenation hospitalizations. Surg Open Sci 2024;19:125-30. [Crossref] [PubMed]
Cao Y, Forssten MP, Sarani B, et al. Development and Validation of an XGBoost-Algorithm-Powered Survival Model for Predicting In-Hospital Mortality Based on 545,388 Isolated Severe Traumatic Brain Injury Patients from the TQIP Database. J Pers Med 2023;13:1401. [Crossref] [PubMed]
Yuan W, Xiao M, Wang R, et al. XGBoost in the Prediction of 28-Day Mortality in Critical Elderly Patients with Hip Fracture: A MIMIC-IV Cohort Study. Altern Ther Health Med 2024;30:432-6.
Quinn KN, Wilber H, Townsend A, et al. Chebyshev Approximation and the Global Geometry of Model Predictions. Phys Rev Lett 2019;122:158302. [Crossref] [PubMed]
Song X, Liu X, Liu F, et al. Comparison of machine learning and logistic regression models in predicting acute kidney injury: A systematic review and meta-analysis. Int J Med Inform 2021;151:104484. [Crossref] [PubMed]
Kourou K, Exarchos TP, Exarchos KP, et al. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 2015;13:8-17. [Crossref] [PubMed]
Kohut-Jackson A, Orf J, Barresi D, et al. Age related trends in the utilization of neoadjuvant chemotherapy for muscle invasive bladder cancer. Urol Oncol 2024;42:160.e25-31. [Crossref] [PubMed]
Lopez-Beltran A, Cookson MS, Guercio BJ, et al. Advances in diagnosis and treatment of bladder cancer. BMJ 2024;384:e076743. [Crossref] [PubMed]
Hussain SA, Palmer DH, Lloyd B, et al. A study of split-dose cisplatin-based neo-adjuvant chemotherapy in muscle-invasive bladder cancer. Oncol Lett 2012;3:855-9. [Crossref] [PubMed]
Adjuvant Chemotherapy for Muscle-invasive Bladder Cancer. A Systematic Review and Meta-analysis of Individual Participant Data from Randomised Controlled Trials. Eur Urol 2022;81:50-61. [Crossref] [PubMed]
Flaig TW, Spiess PE, Abern M, et al. NCCN Guidelines® Insights: Bladder Cancer, Version 3.2024. J Natl Compr Canc Netw 2024;22:216-25. [Crossref] [PubMed]
Li Y, Wu G, Zhang Y, et al. Effects of marital status on survival of retroperitoneal liposarcomas stratified by age and sex: A population-based study. Cancer Med 2023;12:1779-90. [Crossref] [PubMed]
Sammon JD, Morgan M, Djahangirian O, et al. Marital status: a gender-independent risk factor for poorer survival after radical cystectomy. BJU Int 2012;110:1301-9. [Crossref] [PubMed]
Niu Q, Lu Y, Wu Y, et al. The effect of marital status on the survival of patients with bladder urothelial carcinoma: A SEER database analysis. Medicine (Baltimore) 2018;97:e11378. [Crossref] [PubMed]
Antoni MH, Lutgendorf SK, Cole SW, et al. The influence of bio-behavioural factors on tumour biology: pathways and mechanisms. Nat Rev Cancer 2006;6:240-8. [Crossref] [PubMed]
Goldzweig G, Andritsch E, Hubert A, et al. Psychological distress among male patients and male spouses: what do oncologists need to know? Ann Oncol 2010;21:877-83. [Crossref] [PubMed]
Sarrió-Sanz P, Martinez-Cayuelas L, Lumbreras B, et al. Mortality prediction models after radical cystectomy for bladder tumour: A systematic review and critical appraisal. Eur J Clin Invest 2022;52:e13822. [Crossref] [PubMed]
Scott IA. Demystifying machine learning: a primer for physicians. Intern Med J 2021;51:1388-400. [Crossref] [PubMed]
Kim JY, Kong HJ, Kim SH, et al. Machine learning-based preoperative datamining can predict the therapeutic outcome of sleep surgery in OSA subjects. Sci Rep 2021;11:14911. [Crossref] [PubMed]
Ngiam KY, Khor IW. Big data and machine learning algorithms for health-care delivery. Lancet Oncol 2019;20:e262-73. [Crossref] [PubMed]
Palermo G, Bizzarri FP, Scarciglia E, et al. The mental and emotional status after radical cystectomy and different urinary diversion orthotopic bladder substitution versus external urinary diversion after radical cystectomy: A propensity score-matched study. Int J Urol 2024;31:1423-8. [Crossref] [PubMed]

Cite this article as: Li G, Xia K. Enhanced prognostic prediction of cancer-specific mortality in elderly bladder cancer patients post-radical cystectomy: an XGBoost model study. Transl Cancer Res 2025;14(3):1902-1914. doi: 10.21037/tcr-24-2023

Enhanced prognostic prediction of cancer-specific mortality in elderly bladder cancer patients post-radical cystectomy: an XGBoost model study

Highlight box

Introduction

Methods

Population selection

Variable selection using Cox regression analysis

XGBoost model

Model performance evaluation

Statistical analysis

Results

Baseline characteristics

Table 1

Univariable and multivariable Cox regression analysis

Table 2

XGBoost model development and performance evaluation

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share