Multi-parameter magnetic resonance imaging (MRI) deep learning radiomics predicts complete response after induction immunochemotherapy in locally advanced nasopharyngeal carcinoma

Bifa Zhu; Liru Zhu; Kaihua Chen; Ling Li; Xiaodong Zhu

doi:10.21037/tcr-2025-1945

Original Article

Multi-parameter magnetic resonance imaging (MRI) deep learning radiomics predicts complete response after induction immunochemotherapy in locally advanced nasopharyngeal carcinoma

Bifa Zhu^1,2,3# , Liru Zhu^1,2,3# , Kaihua Chen^1,2,3# , Ling Li^1,2,3 , Xiaodong Zhu^1,2,3,4

¹Department of Radiation Oncology, Guangxi Medical University Cancer Hospital, Nanning, China; ²Guangxi Clinical Medicine Research Center of Nasopharyngeal Carcinoma, Nanning, China; ³Guangxi Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor, Nanning, China; ⁴Department of Oncology, Affiliated Wuming Hospital of Guangxi Medical University, Nanning, China

Contributions: (I) Conception and design: B Zhu, X Zhu; (II) Administrative support: X Zhu; (III) Provision of study materials or patients: B Zhu, K Chen; (IV) Collection and assembly of data: L Zhu, B Zhu; (V) Data analysis and interpretation: B Zhu, L Li; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work as co-first authors.

Correspondence to: Ling Li, PhD. Department of Radiation Oncology, Guangxi Medical University Cancer Hospital, No. 71 Hedi Road, Qingxiu District, Nanning 530021, China; Guangxi Clinical Medicine Research Center of Nasopharyngeal Carcinoma, Nanning, China; Guangxi Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor, Nanning, China. Email: lingli159@163.com; Xiaodong Zhu, PhD. Department of Radiation Oncology, Guangxi Medical University Cancer Hospital, No. 71 Hedi Road, Qingxiu District, Nanning 530021, China; Guangxi Clinical Medicine Research Center of Nasopharyngeal Carcinoma, Nanning, China; Guangxi Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor, Nanning, China; Department of Oncology, Affiliated Wuming Hospital of Guangxi Medical University, Nanning, China. Email: zhuxdonggxmu@126.com.

Background: Concurrent chemoradiotherapy (CCRT) following induction chemotherapy (IC) remains the conventional treatment regimen for patients with locally advanced nasopharyngeal carcinoma (LANPC). However, the complete response (CR) rate after IC is limited, and there is considerable heterogeneity in patient responses. The introduction of immunotherapy has shown potential to enhance treatment efficacy; nevertheless, there is a lack of reliable biological markers that can effectively predict whether induction immunochemotherapy will result in CR. Multiparametric magnetic resonance imaging (MRI) offers a non-invasive method to provide comprehensive information regarding tumor structure and function. This study aims to develop and validate a multimodal fusion model that integrates traditional MRI-based radiomics features with deep learning radiomics features to predict the early achievement of CR in patients with LANPC undergoing induction immunochemotherapy, thereby establishing a foundation for personalized treatment decisions.

Methods: We conducted a retrospective analysis of clinical and imaging data from 230 biopsy-confirmed LANPC patients who underwent induction immunochemotherapy at Guangxi Medical University Cancer Center between January 2021 and December 2024. The patients were randomly allocated into training (n=184) and testing (n=46) cohorts. Regions of interest (ROIs) for the lesions were delineated across multiple sequences, including T1-weighted imaging (T1), T2-weighted imaging (T2), and contrast-enhanced T1-weighted imaging (CE-T1). Traditional and deep learning radiomics features were extracted, followed by feature selection to identify the most discriminative features. Utilizing machine learning algorithms, we developed four types of models: clinical, traditional radiomics, deep learning radiomics, and multimodal fusion models. Model performance was evaluated through receiver operating characteristic (ROC) curve analysis, area under the curve (AUC), and decision curve analysis (DCA).

Results: The multimodal fusion model exhibited superior predictive performance in the testing cohort [AUC =0.844, 95% confidence interval (CI): 0.695–0.992], significantly outperforming both the traditional radiomics fusion model (AUC =0.721; 95% CI: 0.540–0.901) and the deep learning radiomics fusion model (AUC =0.725; 95% CI: 0.566–0.885).

Conclusions: The multimodal fusion model effectively predicts early CR in LANPC patients following induction immunochemotherapy, demonstrating significant potential for clinical application.

Keywords: Locally advanced nasopharyngeal carcinoma (LANPC); radiomics; deep learning; induction immunochemotherapy; treatment response

Submitted Sep 04, 2025. Accepted for publication Nov 25, 2025. Published online Feb 05, 2026.

doi: 10.21037/tcr-2025-1945

Highlight box

Key findings

• We developed a multimodal fusion model integrating traditional and deep learning radiomics features from multi-parameter magnetic resonance imaging (MRI) to predict complete response (CR) after induction immunochemotherapy in locally advanced nasopharyngeal carcinoma (LANPC). The model achieved an area under the curve (AUC) of 0.844 in the testing cohort, with 0.919 specificity and 0.869 accuracy, significantly outperforming single-modality models.

What is known and what is new?

• Standard treatment for LANPC includes induction chemotherapy (IC) followed by concurrent chemoradiotherapy, with immunotherapy recently showing improved CR rates. However, reliable biomarkers to predict immunotherapy response are lacking, and programmed cell death ligand 1 (PD-L1) expression remains controversial in nasopharyngeal carcinoma (NPC).

• While radiomics and deep learning have been used separately for cancer prediction, this is the first study to integrate both approaches using multi-parameter MRI [T1-weighted imaging (T1), T2-weighted imaging (T1), and contrast-enhanced T1-weighted imaging (CE-T1)] specifically to predict early CR following induction immunochemotherapy in LANPC.

What is the implication, and what should change now?

• This non-invasive model can early identify whether patients are sensitive to induction immunochemotherapy, enabling clinicians to optimize management strategies. For high responders, treatment intensity can be moderately reduced to improve quality of life; For low responders or non-responders, treatment plans should be upgraded in advance to improve the CR rate. The model supports personalized treatment decisions for LANPC patients.

• Future prospective multicenter validation studies should expand sample sizes and incorporate cervical lymph node features to enhance predictive performance and confirm clinical utility.

Introduction

Nasopharyngeal carcinoma (NPC) is a malignant tumor that originates from the epithelial cells of the nasopharynx. The incidence of NPC is significantly higher in Southeast Asia and southern China compared to the global average (1). Approximately 70% of NPC patients present with locally advanced disease at initial diagnosis (2). For locally advanced nasopharyngeal carcinoma (LANPC), current guidelines recommend induction chemotherapy (IC) followed by concurrent chemoradiotherapy (CCRT) as the standard regimen, which significantly improves 5-year survival rate to 87.9% (3). Furthermore, research indicates that patients with LANPC who achieve a complete response (CR) after induction therapy exhibit significantly lower recurrence and higher survival rates (4). Nevertheless, the CR rate for IC alone remains low at 4.7%, and 20% of patients develop recurrence or metastasis (5,6). Therefore, it is essential to further explore strategies to more effectively improve early CR and long-term prognosis in LANPC (7).

In recent years, immunotherapy has proven effective against various solid tumors, with substantial research advances in NPC (8). Currently, immunotherapy combined with chemotherapy has become the standard treatment for advanced NPC (9). Immunotherapy is anticipated to play a significant role in the future treatment of LANPC. Many clinical studies are exploring the application prospects of immunotherapy in LANPC by incorporating it into the standard regimens at different stages. The CONTINUUM study first to confirm that adding programmed death 1 (PD-1) monoclonal antibodies significantly improves survival rates in LANPC (10). The BEACON research demonstrated a statistically significant improvement in the CR rate (30.5% vs. 16.7%; P=0.0006) of standard IC combined with tislelizumab in high-risk LANPC, achieving its first primary endpoint (11). However, immunotherapy does not benefit all patients. Currently, there is a lack of research focused on identifying the populations that benefit from LANPC combined immunotherapy and predicting the likelihood of CR following induction immunochemotherapy. Therefore, new efficacy biomarkers and predictive models for induction immunochemotherapy must be developed to identify responders and optimize treatment plans for LANPC. With the development of artificial intelligence, numerous studies have reported using radiomics combined with deep learning features to construct feature fusion models that predict treatment response or prognosis in tumors. However, most of these studies rely on single-sequence imaging data, and few have focused on predicting the effectiveness of immunotherapy combined with chemotherapy for LANPC.

This study aims to develop a multimodal fusion model based on multi-parameter magnetic resonance imaging (MRI) features, incorporating traditional and deep learning radiomics, to precisely predict the early treatment response to induction immunochemotherapy in LANPC. The findings of this study will enhance the prediction of long-term outcomes for LANPC patients undergoing induction immunochemotherapy. Furthermore, the multimodal fusion model will provide imaging support for optimizing treatment strategies. We present this article in accordance with the TRIPOD reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1945/rc).

Methods

Patients

This study was approved by the Ethics Review Committee of Guangxi Medical University Cancer Hospital (No. KY2024886). As a retrospective analysis, informed consent was waived. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. We retrospectively analyzed 299 patients with LANPC who received induction immunochemotherapy at our center between January 2021 and December 2024. The inclusion criteria were as follows: (I) pathologically confirmed NPC; (II) stage III/IVA according to the American Joint Committee on Cancer (AJCC) 8th edition; (III) induction immunochemotherapy for 2–3 cycles; (IV) age ≥18 years and a Karnofsky performance status (KPS) ≥70. The exclusion criteria: (I) incomplete clinical or imaging data; (II) fewer than 2 cycles of induction immunochemotherapy; (III) presence of a second primary tumor. The patient selection flowchart is illustrated in Figure 1. Ultimately, a total of 230 patients were enrolled and randomly allocated to the training (n=184) and testing (n=46) cohorts at an 8:2 ratio. Specific details of the treatment plans for all patients are available at https://cdn.amegroups.cn/static/public/tcr-2025-1945-1.csv.

Figure 1 Flow chart of the patient selection.

MRI acquisition and preprocessing

All enrolled patients underwent both plain and contrast-enhanced MRI using a Siemens 1.5T MRI scanner prior to the initiation of induction immunochemotherapy. The primary scanning sequences included axial T1-weighted imaging (T1), T2-weighted imaging (T1) and contrast-enhanced T1-weighted imaging (CE-T1). Digital imaging and communications in medicine (DICOM) format images were downloaded from the picture archiving and communication system (PACS). All images underwent N4 bias field correction using the Simple ITK library, followed by Z-score normalization, and were finally resampled to a voxel size of 1 mm × 1 mm × 1 mm (x, y, z) (12). The magnetic resonance scanner specifications and scanning parameters are detailed in Table S1.

Tumor response assessment

Tumor response was evaluated using nasopharyngeal and neck contrast-enhanced MRI 3 weeks after the completion of two cycles of induction immunochemotherapy. According to the Response Evaluation Criteria In Solid Tumors (RECIST) 1.1 criteria, tumor responses were categorized as follows: CR, partial response (PR), progressive disease (PD), and stable disease (SD) (13). In this study, patients were divided into two groups: the CR group and the non-CR group, which encompassed PR, SD, and PD. Two radiologists independently assessed the tumor responses; in cases of discrepancy, a third senior radiologist facilitated discussions to achieve consensus.

Tumor segmentation

We utilized ITK-SNAP to manually segment the regions of interest (ROI) slice-by-slice from the axial CE-T1 image. Subsequently, automatic registration of the T1 and T1 sequences was conducted, using the CE-T1 as the reference. Two radiologists, with 5 and 10 years of experience in diagnosing NPC, independently performed the tumor segmentation without knowledge of the tumor response status. A specialized radiation oncologist refined the registration results to ensure optimal alignment. Any uncertainties were addressed by another senior radiologist with 20 years of experience in NPC diagnosis. Additionally, we randomly selected 30 patients for re-segmentation of the ROIs 2 weeks later. Intraclass correlation coefficients (ICCs) were calculated to evaluate intra- and inter-observer variability, with features exhibiting ICC <0.75 being excluded (14). The segmentation workflow is illustrated in Figure 2A.

Figure 2 Workflow diagram for deep learning radiomics analysis. (A) Tumor segmentation: MRI preprocessing, performing ROI delineation, and extracting the maximum area of the ROI; (B) feature extraction: traditional radiomics and ResNet18 deep learning radiomics feature extraction; (C) feature selection: Student’s t-test, Spearman correlation coefficient, and LASSO to select the most significant features; (D) model evaluation: assess the predictive performance in the testing cohort through ROC curves, calibration curves, and decision curve analysis. CE-T1, contrast-enhanced T1-weighted imaging; Grad-CAM, gradient-weighted class activation mapping; LASSO, least absolute shrinkage and selection operator; MSE, mean square error; MRI, magnetic resonance imaging; ROC, receiver operating characteristic; ROI, regions of interest; T1, T1-weighted imaging; T2, T2-weighted imaging.

Feature extraction

We used the “Py-radiomics” package to extract 1,015 traditional radiomics features from each of the three MRI sequences (T1, T2, and CE-T1), resulting in a total of 3,045 features per patient in accordance with the Imaging Biomarker Standardization Initiative (IBSI) guidelines (15). We employed the ResNet-18 convolutional neural network (CNN), which was pre-trained on the ImageNet dataset to extract deep learning radiomics features. The last fully connected layer of the CNN was removed and replaced with an average pooling layer for feature extraction. Each sequence yielded 512 features, culminating in a total of 1,536 deep learning radiomics features per patient. Guided gradient-weighted class activation mapping (Grad-CAM) is a technique that generates visual maps highlighting important regions in the final convolutional layer of the CNN (15). Consequently, we applied this method to emphasize specific sub-regions associated with the generation of deep learning features. The feature extraction process is shown in Figure 2B.

Feature selection

We excluded radiomics features with ICC <0.75. All traditional radiomics and deep learning radiomics features were standardized using Z-score. Feature selection included three methods. First, we employed Student’s t-test to filter features. Second, we assessed the linear correlation between features using the Spearman correlation coefficient to eliminate redundant features. If the correlation between any pair of features exceeded 0.9, we retained only one for further analysis (16). Third, we applied the least absolute shrinkage and selection operator (LASSO) with 10-fold cross-validation to identify the λ value corresponding to the minimum binomial deviation and selected the most significant features (17). The feature selection process is shown in Figure 2C.

Model building and evaluation

We used univariate and multivariate Cox proportional hazards regression analyses to assess the relationship between clinical characteristics and treatment response. Independent clinical factors were selected to construct a clinical prediction model. Following the screening of radiomics features from each sequence, we developed single-sequence traditional radiomics models (T1, T2, and CE-T1). Subsequently, we combined and filtered features from the three sequences to create a traditional radiomics fusion model. In a similar manner, we constructed single-sequence deep learning radiomics models (T1, T2, and CE-T1), as well as the deep learning radiomics fusion model. Ultimately, we integrated clinical features along with multi-sequence traditional and deep learning radiomics features to formulate the multimodal fusion model. The extreme gradient boosting (XGBoost) algorithm was used to establish these prediction models. For each model, the data was divided into training and test sets in an 8:2 ratio. The training set was subjected to five-fold cross-validation, and model performance was evaluated on the test set. Within each cross-validation training fold, we employed the synthetic minority over-sampling technique (SMOTE) to balance CR and non-CR samples. Hyperparameters of the multimodal fusion model were: booster = “gbtree”, objective = “binary:logistic”, eval_metric = “auc”, eta =0.03, max_depth =5, min child_weight =5, subsample =0.6, colsample_bytree =0.6, lambda =1, alpha =1. These parameters indicate that this is a strongly regularized setting designed to prevent overfitting. The specific values of other model hyperparameters refer to Table S2. The performance of the ten pretrained models was evaluated using metrics such as area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1 score for both the training and testing sets. Furthermore, we evaluated the accuracy of the three fusion models through calibration curves and evaluated their clinical value using decision curve analysis (DCA). The model evaluation process is shown in Figure 2D.

Statistical analysis

Differences in clinical variables between the training and test sets were analyzed using Fisher’s exact test or the chi-squared test for categorical data, and the Mann-Whitney test for continuous data. Univariate analysis was conducted to identify significant clinical predictors. The DeLong test was utilized to assess differences in AUC among the models. A P value of less than 0.05 was considered statistically significant. Statistical analyses and graphic production were performed using R (version 4.1.2).

Results

Clinical characteristics of the patients

This study retrospectively enrolled 299 patients with LANPC who received induction immunochemotherapy. After applying exclusion criteria, 230 eligible patients remained in the analysis. Patients were randomly assigned to the training cohorts (184 cases) and the testing cohort (46 cases) in an 8:2 ratio. Demographic and clinical data are summarized in Table 1. There was no significant difference in CR rates between the two groups following induction immunochemotherapy, with rates of 22% in the training group and 20% in the testing group (P=0.90). The baseline characteristics of the two cohorts were well balanced, with no significant differences observed (P>0.05).

Table 1

Baseline characteristics of the training and testing cohort

Characteristics	Training cohort (n=184)	Testing cohort (n=46)	P value
Age (years)	45.0 (35.0–52.0)	48.5 (41.25–55)	0.055
Gender			0.90
Female	36 [20]	10 [22]
Male	148 [80]	36 [78]
History			0.30
No	158 [86]	36 [78]
Yes	26 [14]	10 [22]
Smoke			>0.99
No	124 [67]	31 [67]
Yes	60 [33]	15 [33]
T category			0.39
T1	7 [4]	1 [2]
T2	38 [21]	15 [33]
T3	89 [48]	20 [43]
T4	50 [27]	10 [22]
N category			0.43
N0	5 [3]	1 [2]
N1	31 [17]	4 [9]
N2	76 [41]	18 [39]
N3	72 [39]	23 [50]
Overall stage			0.71
III	76 [41]	17 [37]
IVA	108 [59]	29 [63]
Response to therapy			0.90
CR	40 [22]	9 [20]
Non-CR	144 [78]	37 [80]
EBV DNA			0.68
Negative	48 [26]	14 [30]
Positive	136 [74]	32 [70]
BMI (kg/m²)	22.83 (20.76–24.99)	22.55 (20.78–23.88)	0.70
WBC (×10⁹/L)	6.84 (5.88–8.25)	6.82 (5.36–8.4)	0.52
LYM (×10⁹/L)	1.77 (1.46–2.24)	1.79 (1.4–2.44)	0.85
PLT (×10⁹/L)	263.5 (226.75–309)	285 (227.75–343.25)	0.30
PLR	148.37(116.93–183.02)	152.97 (106.36–201.11)	0.63
LDH (U/L)	164 (140.75–193.25)	165 (143.5–189.5)	0.98

Categorical data are reported as number of patients [percentage], continuous data were reported as median (interquartile range). BMI, body mass index; CR, complete response; LDH, lactate dehydrogenase; LYM, lymphocyte; N, node; PLR, platelet lymphocyte ratio; PLT, platelet; T, tumor; WBC, white blood cell.

Feature selection

Features with ICC <0.75 were excluded from the analysis. We integrated traditional and deep learning radiomics features derived from three sequences (T1, T1, and CE-T1), resulting in a total of 4,128 features. Initially, we applied Student’s t-test to screen all features, retaining 275 features with a P value of less than 0.05. Subsequently, we calculated Spearman’s rank correlation coefficient to assess the correlations among features and removed highly correlated redundant features, leaving 85 features. Finally, we applied the LASSO with 10-fold cross-validation to identify the λ value corresponding to the minimum binomial deviation, selecting the 25 most significant features, which included 18 radiomics features and 7 deep learning features (Figure 3). We used the XGBoost algorithm to construct a multimodal fusion model based on the most significant features identified. Both the traditional and deep learning fusion models employed the same feature selection methodology, while the single-sequence models (traditional-T1, traditional-T2, traditional-CE, deep-learning-T1, deep-learning-T2, and deep-learning-CE-T1) employed only LASSO for feature selection. Detailed feature selection results can be found in Figures S1-S9. We employed Grad-CAM to visualize the final convolutional layer of the ResNet18 CNN. As shown in Figure 4, the highly activated key subregions (red) differ across the three sequences in representative cases: T1 and T1 sequences exhibited diffuse moderate activation (yellow) within the tumor core, whereas the CE-T1 sequence showed localized high-activation regions (red) in the tumor core. These differential activation patterns primarily reflected the adaptive attention mechanism of the CNN model.

Figure 3 Feature selection of the multimodal fusion model using the least absolute shrinkage and selection operator algorithm. (A) Coefficients of LASSO 10-fold cross validation; (B) MSE of LASSO 10-fold validation; (C) Spearman correlation coefficients between features in the final subset, including 25 features. LASSO, least absolute shrinkage and selection operator; MSE, mean square error.

Figure 4 Feature heatmaps of a patient via the guided Grad-CAM. (A) T1-weighted imaging; (B) T2-weighted imaging; (C) contrast-enhanced T1-weighted imaging. The red area shows where the model pays most attention. Grad-CAM, gradient-weighted class activation mapping.

Construction and evaluation of different models

Table 2 and Figure 5 illustrate the performance metrics of various models. The clinical model exhibited AUC values of 0.572 and 0.566 in the training and testing cohorts, respectively, indicating the lowest performance. For the traditional radiomics models, AUC values in the training cohorts for T1, T2, and CE-T1 sequences were 0.724, 0.731, and 0.734, respectively. The corresponding AUC values in the testing cohorts were 0.707, 0.703, and 0.692. The traditional radiomics fusion model achieved AUC values of 0.751 and 0.721 in the training and testing cohorts, respectively, thereby outperforming the single-sequence models. In the deep learning radiomics models, the training cohort AUC values for T1, T2, and CE-T1 sequences were 0.749, 0.700, and 0.706, respectively, with testing cohort AUC values of 0.694, 0.685, and 0.691. Likewise, the deep learning radiomics fusion model demonstrated training and testing AUC values of 0.768 and 0.725, respectively, also surpassing the single-sequence models. Among the single-sequence models, traditional radiomics consistently outperformed deep learning radiomics, as evidenced by the superior performance of the T1 traditional radiomics model compared to the T1 deep learning radiomics model. In the comparison of multi-sequence fusion models, the deep learning radiomics fusion model outperformed the traditional radiomics fusion model. However, the DeLong test indicated no statistically significant difference between the deep learning radiomics fusion model and the other models (P>0.05). A detailed comparison of the DeLong test results among different models is provided in Table S3. Additionally, comprehensive information regarding sensitivity, specificity, PPV, and NPV for various models is presented in Table 2.

Table 2

Performance of different models

Cohort	Model	AUC	ACC	SENS	SPEC	PPV	NPV	F1 score
Training cohort	Clinical	0.572	0.457	0.775	0.368	0.254	0.855	0.383
	Traditional-T1	0.724	0.457	0.900	0.333	0.273	0.923	0.419
	Traditional-T2	0.731	0.484	0.825	0.389	0.273	0.889	0.410
	Traditional-CE-T1	0.734	0.739	0.600	0.778	0.429	0.875	0.500
	Traditional fusion	0.751	0.641	0.750	0.611	0.349	0.897	0.476
	Deep-learning-T1	0.749	0.728	0.575	0.771	0.411	0.867	0.479
	Deep-learning-T2	0.699	0.576	0.750	0.528	0.306	0.884	0.435
	Deep-learning-CE-T1	0.706	0.674	0.600	0.694	0.353	0.862	0.444
	Deep-learning fusion	0.768	0.679	0.725	0.667	0.377	0.897	0.496
	Multimodal fusion	0.862	0.832	0.675	0.875	0.600	0.906	0.635
Testing cohort	Clinical	0.566	0.369	0.889	0.243	0.222	0.900	0.356
	Traditional-T1	0.707	0.543	1.000	0.432	0.300	1.000	0.462
	Traditional-T2	0.703	0.565	1.000	0.459	0.310	1.000	0.474
	Traditional-CE-T1	0.692	0.760	0.556	0.810	0.417	0.882	0.476
	Traditional fusion	0.721	0.609	0.778	0.568	0.304	0.913	0.438
	Deep-learning-T1	0.694	0.630	0.778	0.594	0.318	0.917	0.452
	Deep-learning-T2	0.685	0.609	0.889	0.541	0.320	0.952	0.471
	Deep-learning-CE-T1	0.691	0.696	0.889	0.649	0.381	0.960	0.533
	Deep-learning fusion	0.725	0.326	0.111	0.378	0.042	0.636	0.060
	Multimodal fusion	0.844	0.870	0.667	0.919	0.667	0.919	0.667

ACC, accuracy; AUC, area under the curve; CE-T1, contrast-enhanced T1-weighted imaging; NPV, negative predictive value; PPV, positive predictive value; SENS, sensitivity; SPEC, specificity; T1, T1-weighted imaging; T2, T2-weighted imaging.

Figure 5 Evaluation of predictive performances for different models on patient response to induction immunochemotherapy. (A) ROC curves of different models in training cohort; (B) ROC curves of different models in testing cohort; (C) calibration curves of different models in the testing cohort. The gray dotted line represents an ideal prediction; (D) decision curve analysis for different models in the testing cohort. The x-axis is the threshold probability and the y-axis is the net income. AUC, area under the curve; CI, confidence interval; ROC, receiver operating characteristic.

We constructed a multimodal fusion model that integrates clinical features with both traditional and deep learning radiomics features. The multimodal fusion model achieved an AUC of 0.862, a sensitivity of 0.675, a specificity of 0.875, and an accuracy of 0.832 in the training set. In the test set, it recorded an AUC of 0.844, a sensitivity of 0.667, a specificity of 0.919, and an accuracy of 0.869 (Table 2, Figure 5A,5B). Compared with other models, the multimodal fusion model demonstrated superior performance. Moreover, the DeLong test showed a statistically significant difference between the multimodal fusion model and the clinical model (P<0.05). Detailed results of the DeLong test comparing different models were provided in Table S3. The calibration curve, shown in Figure 5C, assessed the consistency between model predictions and actual outcomes, indicating that the multimodal fusion model exhibited the best calibration. The DCA, presented in Figure 5D, evaluated the clinical utility of the predictive model, confirming that the multimodal fusion model provides optimal value for clinical decision-making. Additionally, we also visualized the multimodal fusion model using a nomogram, as shown in Figure 6.

Figure 6 Nomogram for assessing tumor response to induction immunochemotherapy in the testing cohort. DL, deep learning.

Discussion

In this study, we developed and validated a multimodal fusion model that integrates MRI-based traditional and deep learning radiomics features to predict the response to immunotherapy in LANPC. The multimodal fusion model demonstrates significantly higher AUC, accuracy, and specificity in predicting early treatment response to induction immunochemotherapy in LANPC patients.

Recent prospective clinical trial (18) has confirmed that the incorporation of immunotherapy into IC significantly enhances treatment efficacy in patients with LANPC. Research by Yu et al. (5) indicated that the CR rate was merely 4.8% in patients undergoing IC, whereas the CR rate for those receiving induction immunochemotherapy can reach as high as 34.4%. Our study reveals that the CR rate following induction immunochemotherapy in patients with LANPC is 21.3%. While our results indicate a significantly higher CR rate compared to patients receiving IC alone, it is slightly lower than the rates reported in prior studies of induction immunochemotherapy (11). We attribute this discrepancy primarily to the timing of MRI evaluations, as most patients were assessed before their third immunotherapy cycle rather than prior to radiation therapy. Consequently, the actual CR rate in this study may be underestimated.

Achieving CR after induction therapy is a crucial clinical indicator, reflecting the efficacy of the initial treatment and guiding subsequent management strategies. A retrospective study by Lee et al. (19) revealed a strong association between the CR rate following induction therapy in NPC patients and both progression-free survival (PFS) and overall survival (OS). Yu et al. (5) reported that patients achieving CR exhibited significantly improved OS compared to those with PR, SD or PD, with respective 5-year OS rates of 100%, 88.4%, and 61.5% (P=0.005). These findings underscore the importance of accurately predicting early immunotherapy responses, particularly the achievement of CR, for prognostic assessment. Furthermore, such predictions can assist clinicians in making informed decisions regarding the continuation or adjustment of immunotherapy.

In clinical practice, the programmed cell death ligand 1 (PD-L1) score is frequently utilized to guide immunotherapy selection. Several studies (20,21) have demonstrated that PD-L1 expression levels in lung cancer are closely correlated with the efficacy of PD-1/PD-L1 therapy. However, to date, no suitable biomarkers have been identified to predict the early therapeutic responses to immunotherapy in NPC. Although immunotherapy is recommended for advanced NPC, the predictive value of PD-L1 expression in this context remains controversial (22,23). Some prospective randomized controlled trials involving recurrent or metastatic NPC have found no significant correlation between PD-L1 expression and objective response rate or PFS (24,25). Recently, Mai et al. (26) demonstrated that MRI-based radiomics signatures outperform PD-L1 expression in predicting immunotherapy response in NPC. In this study, our institution did not routinely assess PD-L1 expression, and like many others, we therefore did not explore the relationship between CR rate and PD-L1 levels.

With the rapid development of artificial intelligence in the medical field, applications of radiomics and deep learning are increasingly utilized for disease diagnosis and the assessment of treatment efficacy. The integration of traditional and deep learning radiomics features enhances the extraction of imaging information for diagnosis, treatment response, and prognosis prediction (27,28). Liu et al. (29) and Gao et al. (30) found that deep learning models based on multimodal imaging can efficiently predict the pathological complete response (PCR) in cancer patients following neoadjuvant therapy. Research by Tao et al. (31) suggested that multimodal fusion model combining traditional and deep learning radiomics can accurately predict the chemoradiotherapy sensitivity of hypopharyngeal squamous cell carcinoma (HPSCC) patients using pre-treatment apparent diffusion coefficient (ADC) images. To date, no study has integrated traditional and deep learning radiomics features to predict the early response of LANPC to induction immunochemotherapy. Inspired by previous research, we developed a multimodal fusion model that integrates traditional and deep learning radiomics features extracted from multiple sequences (T1, T1, and CE-T1) and subsequently trained the model using the XGBoost algorithm. The results of this study indicated that the predictive performance of the multimodal fusion model was significantly superior to that of the radiomics-only and deep learning-only models, while multi-sequence models outperformed single-sequence models. the multimodal model demonstrates superior performance stability and clinical relevance. In the training set, radiomics-all shows significant advantage over clinic (P=0.002), but this disappears in the test set (P=0.20), suggesting overfitting. In contrast, the multimodal model maintains consistent superiority across both sets, proving its robustness. Furthermore, this study constructed a nomogram that incorporates clinical features, traditional radiomics features, and deep learning radiomics features to further visualize the multimodal fusion model. Therefore, the multimodal fusion model developed in this study demonstrates superior predictive performance for treatment response and may serve as a non-invasive and effective tool for predicting LANPC immunotherapy efficacy.

Predicting the early efficacy of induction therapy holds important clinical value. Early differentiation between patients sensitive and resistant to induction immunochemotherapy helps adjust the treatment plan in subsequent CCRT. For example, patients predicted not to achieve complete remission (non-CR) are considered resistant to immunochemotherapy drugs, indicating that combining immunotherapy during CCRT is not recommended to reduce adverse drug reactions. Radiation dose standards should be stricter, and treatment response monitoring should be more frequent to enable timely adjustments to the treatment plan, thereby improving patient outcomes.

However, there are several limitations in this study. First, LANPC immunotherapy is not yet included in relevant clinical guidelines, resulting in a limited number of patients with LANPC receiving immunotherapy, primarily in large teaching hospitals. Consequently, our sample size is restricted. We aim to expand the sample size and conduct multicenter studies in the future to further validate our findings. Second, due to the relatively short follow-up period, we were unable to analyze PFS and OS, thus our study concentrated solely on early efficacy prediction. We plan to further track patients’ long-term survival data in future investigations. Third, since the MRI evaluation was conducted before the third immunochemotherapy cycle, it may underestimate the CR rate of induction therapy in LANPC. This underestimation may affect the accuracy of labels in model training and impact the overall performance of the model. In future studies, we will prospectively collect patients who completed MRI assessments before radiotherapy to improve the model’s accuracy and reliability. Lastly, our study primarily focused on the primary tumor site of NPC and did not include radiomics features of cervical metastatic lymph nodes. In future research, we will incorporate features related to cervical lymph nodes to explore potential biomarkers more comprehensively, which will enhance the predictive performance of our immunotherapy efficacy model for LANPC.

Conclusions

In summary, the multimodal fusion model, which integrates traditional radiomics features derived from MRI with deep learning radiomics features, demonstrates a strong capability in predicting the early response of LANPC patients to induction immunochemotherapy. This model represents the first multimodal MRI approach designed to predict early treatment responses to induction immunochemotherapy in patients with LANPC. It significantly contributes to the advancement of precise decision-making in immunotherapy for this patient population.

Acknowledgments

We sincerely thank Prof. Xiaodong Zhu and all the members of the research group for their support and help.

Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1945/rc

Data Sharing Statement: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1945/dss

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1945/prf

Funding: This work was supported by the Key Research and Development Program Project of Guangxi Zhuang Autonomous Region (No. GuikeAB23026020), the Joint Project on Regional High-Incidence Diseases Research of Guangxi Natural Science Foundation (No. 2023GXNSFBA026012), the Guangxi Science and Technology Program (No. AD25069077), and the Independent Project of Key Laboratory of Early Prevention & Treatment for Regional High-Incidence-Tumor (Nos. GKE-ZZ202306 and GKE-ZZ202230).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-1945/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Review Committee of Guangxi Medical University Cancer Hospital (No. KY2024886) and individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Chen YP, Chan ATC, Le QT, et al. Nasopharyngeal carcinoma. Lancet 2019;394:64-80. [Crossref] [PubMed]
Pan JJ, Ng WT, Zong JF, et al. Prognostic nomogram for refining the prognostication of the proposed 8th edition of the AJCC/UICC staging system for nasopharyngeal cancer in the era of intensity-modulated radiotherapy. Cancer 2016;122:3307-15.
Chen YP, Tang LL, Yang Q, et al. Induction Chemotherapy plus Concurrent Chemoradiotherapy in Endemic Nasopharyngeal Carcinoma: Individual Patient Data Pooled Analysis of Four Randomized Trials. Clin Cancer Res 2018;24:1824-33. [Crossref] [PubMed]
Zhang Y, Chen L, Hu GQ, et al. Final Overall Survival Analysis of Gemcitabine and Cisplatin Induction Chemotherapy in Nasopharyngeal Carcinoma: A Multicenter, Randomized Phase III Trial. J Clin Oncol 2022;40:2420-5. [Crossref] [PubMed]
Yu YF, Lu GZ, Wang RJ, et al. Additional PD-1 inhibitor improves complete response to induction chemotherapy in locally advanced nasopharyngeal carcinoma. Front Immunol 2024;15:1415246. [Crossref] [PubMed]
Lian CL, Zhou R, Zhou Y, et al. Assessment of Response to Different Induction Chemotherapy Regimens in Locally Advanced Nasopharyngeal Carcinoma. Drug Des Devel Ther 2023;17:551-62. [Crossref] [PubMed]
Peng H, Chen L, Li WF, et al. Tumor response to neoadjuvant chemotherapy predicts long-term survival outcomes in patients with locoregionally advanced nasopharyngeal carcinoma: A secondary analysis of a randomized phase 3 clinical trial. Cancer 2017;123:1643-52. [Crossref] [PubMed]
Liang YL, Liu X, Shen LF, et al. Adjuvant PD-1 Blockade With Camrelizumab for Nasopharyngeal Carcinoma: The DIPPER Randomized Clinical Trial. JAMA 2025;333:1589-98. [Crossref] [PubMed]
Huang H, Yao Y, Deng X, et al. Immunotherapy for nasopharyngeal carcinoma: Current status and prospects Int J Oncol 2023;63:97. (Review). [Crossref] [PubMed]
Liu X, Zhang Y, Yang KY, et al. Induction-concurrent chemoradiotherapy with or without sintilimab in patients with locoregionally advanced nasopharyngeal carcinoma in China (CONTINUUM): a multicentre, open-label, parallel-group, randomised, controlled, phase 3 trial. Lancet 2024;403:2720-31. [Crossref] [PubMed]
Mai HQ, Liu SL, Chen QY, et al. Tislelizumab versus placebo combined with induction chemotherapy followed by concurrent chemoradiotherapy and adjuvant tislelizumab or placebo for locoregionally advanced nasopharyngeal carcinoma: Interim analysis of a multicenter, randomized, placebo-controlled, double-blind, phase 3 trial. J Clin Oncol 2024;42:6001.
Shi S, Jiang T, Liu H, et al. Habitat Radiomics Based on MRI for Predicting Metachronous Liver Metastasis in Locally Advanced Rectal Cancer: a Two center Study. Acad Radiol 2025;32:3370-83. [Crossref] [PubMed]
Eisenhauer EA, Therasse P, Bogaerts J, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer 2009;45:228-47. [Crossref] [PubMed]
Zheng Y, Qiu B, Liu S, et al. A transformer-based deep learning model for early prediction of lymph node metastasis in locally advanced gastric cancer after neoadjuvant chemotherapy using pretreatment CT images. EClinicalMedicine 2024;75:102805. [Crossref] [PubMed]
Fayyaz AM, Abdulkadir SJ, Talpur N, et al. Grad-CAM (Gradient-weighted Class Activation Mapping): A systematic literature review. Comput Biol Med 2025;198:111200. [Crossref] [PubMed]
Wu Y, Zhang W, Liang X, et al. Habitat radiomics analysis for progression free survival and immune-related adverse reaction prediction in non-small cell lung cancer treated by immunotherapy. J Transl Med 2025;23:393. [Crossref] [PubMed]
Chu F, Liang T, Chen CLP, et al. Compact Broad Learning System Based on Fused Lasso and Smooth Lasso. IEEE Trans Cybern 2024;54:435-48. [Crossref] [PubMed]
Yao Y, Ouyang Q, Wang S, et al. Incorporation of PD-1 blockade into induction chemotherapy improved tumor response in patients with locoregionally advanced nasopharyngeal carcinoma in a retrospective patient cohort. Oral Oncol 2024;154:106867. [Crossref] [PubMed]
Lee SJ, Kim YS, Kay CS, et al. The effect of adjuvant chemotherapy and early tumor regression on the outcome of nasopharyngeal cancer patients treated with concurrent chemoradiotherapy. Oral Oncol 2021;113:105130. [Crossref] [PubMed]
Sholl LM, Awad M, Basu Roy U, et al. Programmed Death Ligand-1 and Tumor Mutation Burden Testing of Patients With Lung Cancer for Selection of Immune Checkpoint Inhibitor Therapies: Guideline From the College of American Pathologists, Association for Molecular Pathology, International Association for the Study of Lung Cancer, Pulmonary Pathology Society, and LUNGevity Foundation. Arch Pathol Lab Med 2024;148:757-74. [Crossref] [PubMed]
Shi Y, Lei Y, Liu L, et al. Integration of comprehensive genomic profiling, tumor mutational burden, and PD-L1 expression to identify novel biomarkers of immunotherapy in non-small cell lung cancer. Cancer Med 2021;10:2216-31. [Crossref] [PubMed]
Mai HQ, Chen QY, Chen D, et al. Toripalimab or placebo plus chemotherapy as first-line treatment in advanced nasopharyngeal carcinoma: a multicenter randomized phase 3 trial. Nat Med 2021;27:1536-43. [Crossref] [PubMed]
Wang FH, Wei XL, Feng J, et al. Efficacy, Safety, and Correlative Biomarkers of Toripalimab in Previously Treated Recurrent or Metastatic Nasopharyngeal Carcinoma: A Phase II Clinical Trial (POLARIS-02). J Clin Oncol 2021;39:704-12. [Crossref] [PubMed]
Yang Y, Qu S, Li J, et al. Camrelizumab versus placebo in combination with gemcitabine and cisplatin as first-line treatment for recurrent or metastatic nasopharyngeal carcinoma (CAPTAIN-1st): a multicentre, randomised, double-blind, phase 3 trial. Lancet Oncol 2021;22:1162-74. [Crossref] [PubMed]
Mai HQ, Chen QY, Chen D, et al. Toripalimab Plus Chemotherapy for Recurrent or Metastatic Nasopharyngeal Carcinoma: The JUPITER-02 Randomized Clinical Trial. JAMA 2023;330:1961-70. [Crossref] [PubMed]
Mai H, Li L, Xin X, et al. Prediction of immunotherapy response in nasopharyngeal carcinoma: a comparative study using MRI-based radiomics signature and programmed cell death ligand 1 expression score. Eur Radiol 2025;35:4403-14. [Crossref] [PubMed]
Wang W, Liang H, Zhang Z, et al. Comparing three-dimensional and two-dimensional deep-learning, radiomics, and fusion models for predicting occult lymph node metastasis in laryngeal squamous cell carcinoma based on CT imaging: a multicentre, retrospective, diagnostic study. EClinicalMedicine 2024;67:102385. [Crossref] [PubMed]
Bao T, Li X, Deng Y, et al. Comparing radiomics, deep learning, and fusion models for predicting occult pleural dissemination in patients with non-small cell lung cancer: a retrospective multicenter study. BMC Cancer 2025;25:1670. [Crossref] [PubMed]
Liu Y, Wang Y, Hu X, et al. Multimodality deep learning radiomics predicts pathological response after neoadjuvant chemoradiotherapy for esophageal squamous cell carcinoma. Insights Imaging 2024;15:277. [Crossref] [PubMed]
Gao Y, Ventura-Diaz S, Wang X, et al. An explainable longitudinal multi-modal fusion model for predicting neoadjuvant therapy response in women with breast cancer. Nat Commun 2024;15:9613. [Crossref] [PubMed]
Tao H, Yang X, Chen M, et al. Classification of chemoradiotherapy sensitivity in hypopharyngeal squamous cell carcinoma based on deep-learning and radiomics feature fusion. Transl Cancer Res 2025;14:5142-54. [Crossref] [PubMed]

Cite this article as: Zhu B, Zhu L, Chen K, Li L, Zhu X. Multi-parameter magnetic resonance imaging (MRI) deep learning radiomics predicts complete response after induction immunochemotherapy in locally advanced nasopharyngeal carcinoma. Transl Cancer Res 2026;15(2):111. doi: 10.21037/tcr-2025-1945

Multi-parameter magnetic resonance imaging (MRI) deep learning radiomics predicts complete response after induction immunochemotherapy in locally advanced nasopharyngeal carcinoma

Highlight box

Introduction

Methods

Patients

MRI acquisition and preprocessing

Tumor response assessment

Tumor segmentation

Feature extraction

Feature selection

Model building and evaluation

Statistical analysis

Results

Clinical characteristics of the patients

Table 1

Feature selection

Construction and evaluation of different models

Table 2

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share