Original Article
Development and validation of an interpretable model based on ultrasound radiomics for predicting Ki-67 expression levels in breast cancer
Abstract
Background: Ki-67 is a critical proliferation marker in breast cancer, but its preoperative assessment is limited by the invasiveness and sampling bias of core needle biopsy. This study aimed to establish and validate non-invasive prediction models of Ki-67 status of breast cancer based on conventional ultrasound radiomics features, clinical features, or their combination.
Methods: Retrospective analysis was performed on 558 patients with breast cancer who underwent two-dimensional (2D) ultrasound and Ki-67 detection. Among them, 398 patients in the training set were from Zhejiang Cancer Hospital, and 160 patients in the external validation set were from Lishui Central Hospital. According to the 14% threshold, the patients were divided into Ki-67 low expression group and Ki-67 high expression group. Clinical parameters, conventional ultrasound characteristics, and 2D ultrasound images of the tumor’s maximum cross-section were collected. Radiomics features were extracted from the delineated regions of interest (ROIs) with the PyRadiomics package. We used univariate analysis, least absolute shrinkage and selection operator (LASSO) regression, and multivariate logistic regression to determine the independent predictors. Three models—clinical, radiomics, and a combined clinical-radiomics model—were developed. We constructed a nomogram based on the combined model. Model evaluation was undertaken via receiver operating characteristic (ROC) curve analysis [calculating area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), precision, recall, and F1 score], calibration curves, and decision curve analysis (DCA). In addition, SHapley Additive exPlanations (SHAP) were used to interpret the model.
Results: In the training (n=398) and external validation (n=160) sets, multivariate logistic regression identified age [odds ratio (OR) =0.971, P=0.026], maximum lesion diameter (OR =1.051, P<0.001), microcalcification (OR =1.548, P=0.109), and posterior echo (OR =0.358, P=0.001) as independent predictors of Ki‑67 expression in breast cancer. LASSO with 5‑fold cross‑validation selected three radiomics features (two texture, one shape). A clinical‑radiomics combined model achieved AUCs of 0.731 [95% confidence interval (CI): 0.670–0.792] and 0.709 (95% CI: 0.614–0.804) in the training and validation sets, with accuracies of 0.643 and 0.750 and F1 scores of 0.728 and 0.840, respectively. Calibration and decision curve analyses demonstrated good consistency and clinical net benefit. A visualized risk nomogram was constructed to estimate individual probabilities. SHAP analysis revealed that radiomics features (e.g., original_glcm_MaximumProbability) and clinical features (microcalcification and posterior echo) contributed most; positive microcalcification increased the likelihood of high Ki‑67 expression, whereas posterior echo attenuation decreased it.
Conclusions: Ultrasound-derived radiomics features provide incremental value for predicting Ki-67 expression in breast cancer. This comprehensive clinical-radiomics model demonstrates excellent diagnostic performance and has been interpreted using the SHAP method. It has the potential to serve as a non-invasive preoperative tool to complement core needle aspiration biopsy and play a complementary role in clinical decision-making.

