Development of prediction models for liver metastasis in colorectal cancer based on machine learning: a population-level study
Original Article

Development of prediction models for liver metastasis in colorectal cancer based on machine learning: a population-level study

Yuncan Xing#, Guanhua Yu#, Zheng Jiang, Zheng Wang

Department of Colorectal Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Contributions: (I) Conception and design: G Yu, Y Xing; (II) Administrative support: Z Jiang, Z Wang; (III) Provision of study materials or patients: G Yu; (IV) Collection and assembly of data: Y Xing; (V) Data analysis and interpretation: Y Xing; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

Correspondence to: Zheng Wang, PhD. Department of Colorectal Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 17 Panjiayuan Nanli, Chaoyang District, Beijing 100021, China. Email: wangzheng961601@163.com.

Background: Liver metastasis (LM) is of vital importance in making treatment-related decisions in patients with colorectal cancer (CRC). The aim of our study was to develop and validate prediction models for LM in CRC by making use of machine learning.

Methods: We selected patients diagnosed with CRC from 2010 to 2015 from the Surveillance, Epidemiology, and End Results (SEER) database. Four machine-learning methods, eXtreme gradient boost (XGB), decision tree (DT), random forest (RF), and support vector machine (SVM), were employed to develop a predictive model. The receiver operating characteristic (ROC) curves, decision curve analysis (DCA) curves and calibration curves were adopted to evaluate the model performance. The SHapley Additive exPlanation (SHAP) technique was chosen for visual analysis to enhance the interpretation of the outcomes of models.

Results: A total of 51,632 patients suffering from CRC were selected from the SEER database. Excellent accuracy of machine learning models was showed from ROC curves. In both the training and validation cohorts, calibration curves for the likelihood of LM demonstrated a high degree of concordance between model prediction and actual observation. The DCA indicated that each machine learning model can yield net benefits for both treat-none and treat-all strategies. Carcinoembryonic antigen (CEA) and N stage were identified as the most significant risk factors for LM based on the SHAP summary plot of the RF and XGB models.

Conclusions: The XGB and RF were the best machine learning models among the four algorithms, of which CEA and N stage were identified as the most important risk factors related to LM.

Keywords: Colorectal cancer (CRC); machine learning; prediction model; liver metastasis (LM)


Submitted Jul 13, 2024. Accepted for publication Sep 30, 2024. Published online Nov 18, 2024.

doi: 10.21037/tcr-24-1194


Highlight box

Key findings

• Machine learning-based models are capable to make a prediction of liver metastasis (LM) in colorectal cancer (CRC) patients. The eXtreme gradient boost and random forest were optimal machine learning models, while carcinoembryonic antigen and N stage were identified as the most significant LM-related risk factors among four algorithms.

What is known and what is new?

• LM is of vital importance in making treatment-related decisions in patients with CRC. Machine learning has been proven to be conducive to detect key features from complex datasets, which are applied in cancer researches for developing predictive models, resulting in increased effectiveness and precision in decision making.

• This study was based on the standard population-level clinical data of a large sample, and a relatively advanced machine learning method was used to construct a predictive model for LM in patients with CRC.

What is the implication, and what should change now?

• Identifying patients of CRC at high risk of developing LM may help improve prognosis by targeted management of this population.


Introduction

Colorectal cancer (CRC) has emerged as the third prevalent malignant tumors on the world scale. The latest data indicate that it is estimated to be more than 1.9 million new cases of CRC and 930,000 deaths globally by 2020 (1). CRC poses a severe risk to human life and health. Liver metastasis (LM) is the main cause of death in patients with CRC. The liver is the most prominent organ for the hematogenous metastasis of CRC, which substantially worsens the prognosis of CRC (2,3). Previous studies have indicated that 45–60% of patients with CRC develop LM during the progression of their disease approximately, with a median overall survival of 6.9 months. Besides, 15–25% of them, with a 5-year survival rate less than 10%, develop synchronous LM at the time of CRC diagnosis (4-8). The treatment standards for LM differ significantly from those for localized CRC, and liver surgery, radiofrequency ablation, microwave ablation, or stereotactic radiotherapy may be effective for LM (9). Therefore, it is crucial to predict LM in CRC patients.

Up to now, conventional statistical approaches have been unable to process complicated information, which remains a significant disadvantage that prevents further processing of data from a large database with numerous parameters. Currently, machine learning methods are applied in medical research, including image recognition, diagnosis, risk factor screening and predicting treatment outcomes (10-12). The advantages of machine learning technologies are that they are effective to detect nonlinear connections among the data and refine the model by enhancing the prediction accuracy. Machine learning has been proven to be conducive to detect key features from complex datasets, including eXtreme gradient boost (XGB), decision tree (DT), random forest (RF), and support vector machine (SVM), which were applied in cancer prevention and management for developing predictive models, resulting in increased effectiveness and precision in decision making (13,14). On the basis of computerized tomography-based radiomic features, machine learning models may provide beneficial biomarkers for identifying patients at high risk of developing LM (15,16).

Nevertheless, according to our best knowledge, no population-based study relying on machine learning methods has been conducted to determine the feasibility of machine learning models for LM prediction. Consequently, the aim of this study was to construct machine learning models for accurately predicting the risk of LM in patients with CRC and to evaluate their viability using a population-level database to facilitate personalized clinical decision-making before initiating clinical management. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-1194/rc).


Methods

Database source

The Surveillance, Epidemiology, and End Results (SEER) database is the largest and most typically publicly recognized available cancer recording database, encompassing nearly 48% of cancer patients in the United States. In our study, CRC data from 18 registries were collected from the SEER database between January 1, 2010 and December 31, 2015 by use of SEER*stat version 8.3.9. We do not require patients’ informed consent for access to and use of the SEER database because the data and information have been anonymized and deidentified ahead of their release. In addition, the approval of the ethics committee is not a prerequisite for our investigation, as the data are publicly accessible. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Study population

CRC was included in this study on the basis of the 3rd Edition of the International Classification of Diseases for Oncology (ICD-O-3). Data such as age at diagnosis, race, gender, primary site, grade, histologic, T stage, N stage, tumor size, carcinoembryonic antigen (CEA), tumor deposits, perineural invasion, and metastasis status were collected for cancer patient demographics and incidence. The grade is determined by pathologic examination or degree of differentiation of the tumor. The lower grade means the higher degree of differentiation. The tumor deposit refers to the microscopic or macroscopic tumor nodules in the lymphatic drainage area of the primary tumor. If there is no tumor deposit, it is recorded as negative. If there is no perineural invasion of the tumor, it is recorded as negative. Patients with unclear data for the aforementioned variables were excluded from the study. With regard to the presence or absence of LM, patients were grouped into two categories: the LM group consisted of CRC patients with LM, while the non-liver metastasis (NLM) group was composed of patients without LM. The patients were then divided in a 7:3 ratio between the training and validation sets. Categorical cohort data were analyzed using Fisher’s exact test and the chi-square test for comparison.

Model development and performance

The four machine learning methods, including XGB, DT, RF, and SVM, were applied to build a predictive model, which was then evaluated by using the validation set. The sensitivity and specificity were determined using the receiver operating characteristic (ROC) curve and area under the curve (AUC). In order to verify the clinical benefits, we subsequently plotted the decision curve analysis (DCA) curve of this nomogram model. The SHapley Additive exPlanation (SHAP) method was a framework developed upon additive feature attribution methods and used for visualization analysis to explain the impact of individual features on predictive models (17). The closer a feature is to the top of the figure, the greater its total influence on the model output. A SHAP value is intuitively understood to be the feature’s contribution to the outcome value. A positive SHAP value suggests that this feature has a positive impact by improving the value of outcome, while a negative SHAP value indicates that it reduces the value of outcome and brings about negative effects. The higher the SHAP value of a feature, the greater the likelihood of LM occurrence. This method can output the importance ranking of the features along with their relationship to the outcome.

Statistical analysis

Numerical variables are presented as mean and standard deviation, with comparisons conducted through independent sample t-test. Categorical variables are presented as numbers and percentages, and distinctions are assessed using the chi-squared test. The results of our analyses were conducted by R software (version 3.5.3) and Python software (version 3.7.11).


Results

Basic characteristics

A total of 187,974 patients with CRC were identified (Figure 1). On the basis of the inclusion criteria and exclusion criteria, 51,632 cases from the SEER database were included in this study. The baseline demographic and disease characteristics of those patients by metastasis status are depicted in Table 1. As illustrated in Table 1, the LM group included 5,886 patients, while the NLM group included 45,746 patients. Moreover, there were 36,142 patients remained to the training set after randomly dividing the data into training and validation sets with a 7:3 ratio, and 15,490 patients allocated in the validation set, which is shown in Tables S1,S2.

Figure 1 The process of patient selection. SEER database, Surveillance, Epidemiology, and End Results; CEA, carcinoembryonic antigen.

Table 1

Clinical characteristics of colorectal cancer patients

Characteristic Liver metastasis (−), N=45,746 Liver metastasis (+), N=5,886 Total, N=51,632 P value
Age at diagnosis (years), n (%) <0.001
   0–20 20 (0.04) 1 (0.02) 21 (0.04)
   21–40 1,816 (3.97) 332 (5.64) 2,148 (4.16)
   41–60 15,456 (33.79) 2,471 (41.98) 17,927 (34.72)
   61–80 21,281 (46.52) 2,557 (43.44) 23,838 (46.17)
   >80 7,173 (15.68) 525 (8.92) 7,698 (14.91)
Race, n (%) <0.001
   Black 5,240 (11.45) 914 (15.53) 6,154 (11.92)
   White 35,655 (77.94) 4,426 (75.20) 40,081 (77.63)
   Other 4,851 (10.60) 546 (9.28) 5,397 (10.45)
Sex, n (%) <0.001
   Female 22,377 (48.92) 2,642 (44.89) 25,019 (48.46)
   Male 23,369 (51.08) 3,244 (55.11) 26,613 (51.54)
Primary site, n (%) <0.001
   Ascending colon 7,569 (16.55) 833 (14.15) 8,402 (16.27)
   Cecum 8,478 (18.53) 1,211 (20.57) 9,689 (18.77)
   Descending colon 2,174 (4.75) 327 (5.56) 2,501 (4.84)
   Hepatic flexure 1,751 (3.83) 198 (3.36) 1,949 (3.77)
   Rectosigmoid junction 3,764 (8.23) 537 (9.12) 4,301 (8.33)
   Rectum 8,028 (17.55) 590 (10.02) 8,618 (16.69)
   Sigmoid colon 9,328 (20.39) 1,575 (26.76) 10,903 (21.12)
   Splenic flexure 1,179 (2.58) 188 (3.19) 1,367 (2.65)
   Transverse colon 3,475 (7.60) 427 (7.25) 3,902 (7.56)
Grade, n (%) <0.001
   1 3,382 (7.39) 211 (3.58) 3,593 (6.96)
   2 33,768 (73.82) 4,110 (69.83) 37,878 (73.36)
   3 7,112 (15.55) 1,266 (21.51) 8,378 (16.23)
   4 1,484 (3.24) 299 (5.08) 1,783 (3.45)
Histologic, n (%) <0.001
   Adenocarcinoma 33,780 (73.84) 4,841 (82.25) 38,621 (74.80)
   Adenocarcinoma in adenomatous polyp 2,858 (6.25) 193 (3.28) 3,051 (5.91)
   Squamous cell carcinoma 18 (0.04) 6 (0.10) 24 (0.05)
   Other 9,090 (19.87) 846 (14.37) 9,936 (19.24)
T stage, n (%) <0.001
   1 4,362 (9.54) 73 (1.24) 4,435 (8.59)
   2 7,369 (16.11) 156 (2.65) 7,525 (14.57)
   3 26,649 (58.25) 3,453 (58.66) 30,102 (58.30)
   4 7,366 (16.10) 2,204 (37.44) 9,570 (18.54)
N stage, n (%) <0.001
   0 25,557 (55.87) 957 (16.26) 26,514 (51.35)
   1 13,181 (28.81) 2,260 (38.40) 15,441 (29.91)
   2 7,008 (15.32) 2,669 (45.34) 9,677 (18.74)
Tumor size (cm), n (%) <0.001
   0.1–2.0 5,859 (12.81) 188 (3.19) 6,047 (11.71)
   2.1–4.0 15,649 (34.21) 1,598 (27.15) 17,247 (33.40)
   4.1–6.0 13,748 (30.05) 2,324 (39.48) 16,072 (31.13)
   6.1–8.0 6,404 (14.00) 1,106 (18.79) 7,510 (14.55)
   8.1–10.0 2,657 (5.81) 444 (7.54) 3,101 (6.01)
   >10.0 1,429 (3.12) 226 (3.84) 1,655 (3.21)
CEA, n (%) <0.001
   Negative 28,484 (62.27) 1,074 (18.25) 29,558 (57.25)
   Positive 17,262 (37.73) 4,812 (81.75) 22,074 (42.75)
Tumor deposits, n (%) <0.001
   Negative 40,079 (87.61) 3,672 (62.39) 43,751 (84.74)
   Positive 5,667 (12.39) 2,214 (37.61) 7,881 (15.26)
Perineural invasion, n (%) <0.001
   Negative 40,432 (88.38) 3,971 (67.47) 44,403 (86.00)
   Positive 5,314 (11.62) 1,915 (32.53) 7,229 (14.00)

CEA, carcinoembryonic antigen.

Model performance

The training set included 32,022 NLM patients and 4,120 LM patients. In the validation set, there were 1,766 LM patients and 13,724 NLM patients. XGB, DT, RF, and SVM were devised based on the 12-variable training set. Figure 2 reveals the performance evaluation outcomes in detail. The AUCs of XGB, DT, RF, and SVM in the training set were 0.87, 0.83, 0.84, and 0.80, respectively. In the validation set, the AUCs of XGB, DT, RF, and SVM were 0.85, 0.83, 0.84, and 0.71, respectively. The curve of AUC indicated that the XGB, DT and RF model may have a high predictive value and could be initially used for potential applications. In both the training and validation cohorts, calibration curves for probability of LM were shown in Figure S1. That indicated the XGB and DT had excellent agreement between prediction and actual observation compare to RF and SVM. DCA demonstrated that the above four models provided greater net benefits in deciding which patients to seek for specialized oncological medical care as compared to giving treatments to all patients or quitting treatments for all patients (Figure S2). In both the training set and the validation set, when the threshold probability of XGB, RF and DT was about 0.05–0.4, the net benefit of was higher than that of total treatment or no treatment. However, apparently, SVM had a smaller net benefit within the same threshold probability compared to other three models.

Figure 2 The ROC curves of the XGB, DT, RF and SVM models in the (A) training set and (B) validation set. ROC, receiver operating characteristic; XGB, eXtreme gradient boost; DT, decision tree; RF, random forest; SVM, support vector machine.

Above all, compared with the DT and SVM models, the XGB and RF models demonstrated superior performance in ROC curves, DCA curves and calibration curves.

LM influencing factor assessment in the RF and XGB models

A SHAP summary plot of the RF and XGB models was presented to identify and highlight the significance of each feature to the construction prediction model. In the RF model, CEA, N stage, tumor deposits, perineural invasion, T stage and tumor size were identified as significant risk factors for predicting the probability of LM (Figure 3). CEA, N stage, and tumor deposits ranked first, second, and third, respectively, among these LM-affecting variables. CEA was the most important indicator for LM. Positive CEA levels were associated with an elevated risk of LM. The relationship between advanced N stages (including N1 and N2 stages) and LM was positive. Positive tumor deposits had a positive relationship with LM. Moreover, a correlation existed between perineural invasion and LM. The risk of LM was positively correlated with advanced T stages (particularly T4) and larger tumor size.

Figure 3 The SHAP summary plot of the top 12 features of the RF model. A higher SHAP value for a feature indicates a greater probability of distant metastasis development. For each patient, a dot represents the feature’s attribution value in the model. Red dots correspond to higher feature values, while blue dots indicate lower values. CEA, carcinoembryonic antigen; SHAP, SHapley Additive exPlanation; RF, random forest.

In the XGB model, CEA, N stage and T stage were among the top three of the ranking list of all the risk factors for predicting LM in patients with CRC (Figure 4). Similar to the RF model, the CEA was the most influential risk factor on the probability of LM in the XGB model. In the XGB model, positive CEA had a high correlation with LM, whereas negative CEA was associated with a reduced probability of LM. Advanced N stages (including N1 and N2) were associated with an increased probability of LM in CRC patients. Furthermore, advanced T stages and probability of LM displayed a positive correlation in the XGB model.

Figure 4 The SHAP summary plot of the top 12 features of the XGB model. A higher SHAP value for a feature corresponds to an increased probability of distant metastasis development. For each patient, a dot represents the attribution value of each feature in the model. The color of the dot ranges from red, indicating higher feature values, to blue, representing lower feature values. CEA, carcinoembryonic antigen; SHAP, SHapley Additive exPlanation; XGB, eXtreme gradient boost.

Discussion

As the most frequent distant metastasis event in CRC, LM significantly deteriorates the prognosis of CRC. Therefore, it is imperative to analyze the factors that influence CRC LM and construct models of high quality to predict the risk of LM. However, no precise machine learning-based prediction model for LM existed prior to this investigation. Our study utilized a population-based database for the first time to construct and validate machine-learning models and analyze their feasibility in predicting LM in CRC patients. According to the risk of LM in CRC patients, physicians could make better clinical decisions and deliver more appropriate treatments using the prediction models developed in this study.

Numerous studies have investigated risk factors for LM and developed prediction models until now. Song et al. developed radiomic models based on magnetic resonance imaging to predict preoperative histopathological growth patterns of LM in CRC (18). Moreover, based on the parameters of enhanced abdominal computerized tomography, Xijun Luo et al. used elastic net and random survival forest algorithms to develop a radiomic signature for predicting disease-free survival in LM of CRC cases (19). However, the aforementioned models are based on digital images, making it tough to explain simplicity and causality and implement them in clinical practice. Hao M. et al. developed and validated a prognostic nomogram with high accuracy and excellent discrimination for predicting metachronous LM in CRC patients (20). Xiao et al. developed an LM risk score that was strongly associated with LM based on ResNet-50 and digital hematoxylin and eosin images. The integrated nomogram could identify stage I-III CRC patients with an elevated risk of LM after primary colectomy; thus, it could serve as a potential tool for selecting the optimal treatment to improve the prognosis of stage I–III CRC patients (21).

Compared to prior research on LM prediction, this study has the advantage of being a real-world risk assessment study based on 51,632 samples and performed by putting four machine learning algorithms into comparison. Among the four machine learning algorithms, there was a better generalizability benefit shown in the XGB and RF models. Moreover, by leveraging the benefits of a machine learning algorithm, the analysis can incorporate multiple kinds of indicators, such as demographic and pathologic characteristics, and can therefore be used to conduct a comprehensive analysis of the influencing factors. Further, the SHAP method is a reliable technique for clinically understanding the results of the XGB and RF models, which has been demonstrated to be effective in a number of studies (22-24). Clinicians can provide CRC patients with reasonable, individualized suggestions and health referral recommendations.

The study showed that CEA and N stage were the most important predictive factors in both RF and XGB models. CEA values represent tumor load and can change the microenvironment of tumor cells, promote the expression of certain adhesion molecules and the survival of malignant cells. More metastatic lymph nodes mean the later stage of the tumor, the higher possibility of metastasis and the worse prognosis. CEA and N stages have been proved to be independent predictors of LM in CRC patients (25,26). This is consistent with our findings. Surprisingly, factors such as the grade of tumor played a smaller predictive role in our model. This may be due to the fact that among the cases we included, the vast majority were graded 2–3, and the grade factor on the model output was not obvious. Therefore, a better predictive model is needed to predict LM in the future.

There are the limitations of our study. First, subject to the nature of the retrospective study, it is impossible to rule out the influence of confounding factors completely. Second, as our study is based on the north American population, there may be gaps in population applicability that need to be verified by further studies involving external populations. Third, the outcome variables of this study included only whether there was or was not a presence of LM in CRC patients. And it was impossible to distinguish between synchronous and asynchronous LM. The above limitations were obvious, and we hope that there will be further research to optimize these limitations.


Conclusions

Machine learning-based models are feasible to make a prediction of LM in CRC patients. XGB and RF were optimal machine learning models, while CEA and N stage were identified as the most significant LM-related risk factors among the four algorithms.


Acknowledgments

Funding: None.


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-1194/rc

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-1194/prf

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-1194/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Morgan E, Arnold M, Gini A, et al. Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from GLOBOCAN. Gut 2023;72:338-44. [Crossref] [PubMed]
  2. Ajithkumar P, Vasantharajan SS, Pattison S, et al. Exploring Potential Epigenetic Biomarkers for Colorectal Cancer Metastasis. Int J Mol Sci 2024;25:874. [Crossref] [PubMed]
  3. He K, Wang Z, Luo M, et al. Metastasis organotropism in colorectal cancer: advancing toward innovative therapies. J Transl Med 2023;21:612. [Crossref] [PubMed]
  4. Zeng K, Peng J, Xing Y, et al. A positive feedback circuit driven by m(6)A-modified circular RNA facilitates colorectal cancer liver metastasis. Mol Cancer 2023;22:202. [Crossref] [PubMed]
  5. Li S, Yang M, Teng S, et al. Chromatin accessibility dynamics in colorectal cancer liver metastasis: Uncovering the liver tropism at single cell resolution. Pharmacol Res 2023;195:106896. [Crossref] [PubMed]
  6. Ding Y, Han X, Zhao S, et al. Constructing a prognostic model for colorectal cancer with synchronous liver metastases after preoperative chemotherapy: a study based on SEER and an external validation cohort. Clin Transl Oncol 2024; Epub ahead of print. [Crossref] [PubMed]
  7. Ros J, Salva F, Dopazo C, et al. Liver transplantation in metastatic colorectal cancer: are we ready for it? Br J Cancer 2023;128:1797-806. [Crossref] [PubMed]
  8. Yuan M, Zhang X, Yue F, et al. CircNOLC1 Promotes Colorectal Cancer Liver Metastasis by Interacting with AZGP1 and Sponging miR-212-5p to Regulate Reprogramming of the Oxidative Pentose Phosphate Pathway. Adv Sci (Weinh) 2023;10:e2205229. [Crossref] [PubMed]
  9. Eng C, Yoshino T, Ruíz-García E, et al. Colorectal cancer. Lancet 2024;404:294-310. [Crossref] [PubMed]
  10. Feuerriegel S, Frauen D, Melnychuk V, et al. Causal machine learning for predicting treatment outcomes. Nat Med 2024;30:958-68. [Crossref] [PubMed]
  11. Lonsdale H, Gray GM, Ahumada LM, et al. Machine Vision and Image Analysis in Anesthesia: Narrative Review and Future Prospects. Anesth Analg 2023;137:830-40. [Crossref] [PubMed]
  12. Oikonomou EK, Khera R. Machine learning in precision diabetes care and cardiovascular risk prediction. Cardiovasc Diabetol 2023;22:259. [Crossref] [PubMed]
  13. Hassan MM, Hassan MM, Yasmin F, et al. A comparative assessment of machine learning algorithms with the Least Absolute Shrinkage and Selection Operator for breast cancer detection and prediction. Decision Analytics Journal 2023;7:100245. [Crossref]
  14. Zhang B, Shi H, Wang H. Machine Learning and AI in Cancer Prognosis, Prediction, and Treatment Selection: A Critical Approach. J Multidiscip Healthc 2023;16:1779-91. [Crossref] [PubMed]
  15. Stüber AT, Coors S, Schachtner B, et al. A Comprehensive Machine Learning Benchmark Study for Radiomics-Based Survival Analysis of CT Imaging Data in Patients With Hepatic Metastases of CRC. Invest Radiol 2023;58:874-81. [Crossref] [PubMed]
  16. Saber R, Henault D, Messaoudi N, et al. Radiomics using computed tomography to predict CD73 expression and prognosis of colorectal cancer liver metastases. J Transl Med 2023;21:507. [Crossref] [PubMed]
  17. Luo H, Xiang C, Zeng L, et al. SHAP based predictive modeling for 1 year all-cause readmission risk in elderly heart failure patients: feature selection and model interpretation. Sci Rep 2024;14:17728. [Crossref] [PubMed]
  18. Song C, Li W, Cui J, et al. Pre-operative prediction of histopathological growth patterns of colorectal cancer liver metastasis using MRI-based radiomic models. Abdom Radiol (NY) 2024; Epub ahead of print. [Crossref] [PubMed]
  19. Luo X, Deng H, Xie F, et al. Prognostication of colorectal cancer liver metastasis by CE-based radiomics and machine learning. Transl Oncol 2024;47:101997. [Crossref] [PubMed]
  20. Hao M, Li H, Wang K, et al. Predicting metachronous liver metastasis in patients with colorectal cancer: development and assessment of a new nomogram. World J Surg Oncol 2022;20:80. [Crossref] [PubMed]
  21. Xiao C, Zhou M, Yang X, et al. Accurate Prediction of Metachronous Liver Metastasis in Stage I-III Colorectal Cancer Patients Using Deep Learning With Digital Pathological Images. Front Oncol 2022;12:844067. [Crossref] [PubMed]
  22. Li W, Liu W, Hussain Memon F, et al. An External-Validated Prediction Model to Predict Lung Metastasis among Osteosarcoma: A Multicenter Analysis Based on Machine Learning. Comput Intell Neurosci 2022;2022:2220527. [Crossref] [PubMed]
  23. van den Bosch T, Warps AK, de Nerée Tot Babberich MPM, et al. Predictors of 30-Day Mortality Among Dutch Patients Undergoing Colorectal Cancer Surgery, 2011-2016. JAMA Netw Open 2021;4:e217737. [Crossref] [PubMed]
  24. Li W, Song Y, Chen K, et al. Predictive model and risk analysis for diabetic retinopathy using machine learning: a retrospective cohort study in China. BMJ Open 2021;11:e050989. [Crossref] [PubMed]
  25. Shao Y, Li Y, Li F, et al. Multifactorial risk prediction analysis of liver metastasis in colorectal cancer: incorporating programmed cell death ligand 1 combined positive score and other factors. J Gastrointest Surg 2024;28:1294-301. [Crossref] [PubMed]
  26. Seow-En I, Koh YX, Zhao Y, et al. Predictive modeling algorithms for liver metastasis in colorectal cancer: A systematic review of the current literature. Ann Hepatobiliary Pancreat Surg 2024;28:14-24. [Crossref] [PubMed]
Cite this article as: Xing Y, Yu G, Jiang Z, Wang Z. Development of prediction models for liver metastasis in colorectal cancer based on machine learning: a population-level study. Transl Cancer Res 2024;13(11):5943-5952. doi: 10.21037/tcr-24-1194

Download Citation