Association of serum vitamins and carotenoids with breast cancer status among adult women in NHANES 2017–2018: a cross-sectional study
Original Article

Association of serum vitamins and carotenoids with breast cancer status among adult women in NHANES 2017–2018: a cross-sectional study

Bohan Hu, Ji Liang

Fudan University Global Health Institute, School of Public Health, Fudan University, Key Laboratory of Health Technology Assessment of the Ministry of Health, Shanghai, China

Contributions: (I) Conception and design: B Hu; (II) Administrative support: B Hu; (III) Provision of study materials or patients: None; (IV) Collection and assembly of data: B Hu; (V) Data analysis and interpretation: B Hu; (VI) Manuscript writing: Both authors; (VII) Final approval of manuscript: Both authors.

Correspondence to: Ji Liang, PhD, MPH. Fudan University Global Health Institute, School of Public Health, Fudan University, Key Laboratory of Health Technology Assessment of the Ministry of Health, 130 Dong’an Road, Xuhui District, Shanghai 200032, China. Email: jliang@shmu.edu.cn.

Background: Breast cancer remains a major public health burden among women worldwide. Circulating vitamins and carotenoids may be related to oxidative stress, inflammation, and other biological processes relevant to breast cancer, but serum-based evidence remains inconsistent. The aim of this study is to examine the associations between serum vitamins and carotenoids and breast cancer status among women in National Health and Nutrition Examination Survey (NHANES) 2017–2018.

Methods: This cross-sectional study included women aged ≥20 years from NHANES 2017–2018. Breast cancer status was defined by self- or proxy-reported physician diagnosis in the Medical Conditions Questionnaire. Men, women aged <20 years, participants with other cancers or unknown cancer status, and those with excessive missing data on key covariates were excluded. A total of 2,521 women were included, comprising 91 with breast cancer and 2,430 without breast cancer. Serum vitamin C was measured by ultra-performance liquid chromatography-electrochemical detection (UPLC-ECD), whereas retinol, tocopherols, and carotenoids were measured by high-performance liquid chromatography-photodiode array detection (HPLC-PDA). The analytes included vitamin C, retinol, α-tocopherol, γ-tocopherol, total lycopene, α-carotene, α-cryptoxanthin, β-cryptoxanthin, and lutein and zeaxanthin. Baseline characteristics were described using standardized mean differences (SMDs), and associations were evaluated using Firth penalized logistic regression with adjustment for demographic, socioeconomic, metabolic, lifestyle, reproductive, and gynecologic covariates. Restricted cubic spline and sensitivity analyses were performed. Machine learning models were further used to explore the joint discriminative patterns of these biomarkers and related covariates.

Results: Compared with women without breast cancer, those with breast cancer were older [66.00 (11.73) vs. 49.42 (17.27) years; SMD =1.123], more likely to be widowed/divorced/separated (57.1% vs. 26.0%), non-Hispanic White (52.7% vs. 30.9%), hypertensive (60.4% vs. 38.8%), and hyperglycemic (56.0% vs. 42.2%). In the fully adjusted model, α-carotene, α-cryptoxanthin, β-cryptoxanthin, and lutein and zeaxanthin were inversely associated with breast cancer status. The most consistent association was observed for α-cryptoxanthin [Q2 vs. Q1: odds ratio (OR) =0.52, 95% confidence interval (CI): 0.30–0.89; Q3 vs. Q1: OR =0.38, 95% CI: 0.20–0.71; Q4 vs. Q1: OR =0.27, 95% CI: 0.12–0.56; P-trend <0.001]. Lutein and zeaxanthin were also inversely associated (Q3 vs. Q1: OR =0.39, 95% CI: 0.20–0.72; Q4 vs. Q1: OR =0.31, 95% CI: 0.15–0.60; P-trend <0.001). Restricted cubic spline analyses showed significant overall and nonlinear associations for α-cryptoxanthin (P-overall <0.001; P-nonlinear =0.005) and lutein and zeaxanthin (P-overall =0.002; P-nonlinear =0.03). Supplementary machine learning analyses also showed modest discriminative ability, with the best receiver operating characteristic-under the curve (ROC-AUC) of 0.824 (95% CI: 0.765–0.884).

Conclusions: Higher serum concentrations of α-cryptoxanthin, α-carotene, β-cryptoxanthin, and lutein and zeaxanthin were inversely associated with self-reported breast cancer status, with α-cryptoxanthin showing the most consistent association. Given the cross-sectional design and self-reported outcome, the findings should be interpreted cautiously.

Keywords: Breast cancer; vitamins and carotenoids; α-cryptoxanthin; lutein and zeaxanthin; machine learning


Submitted Oct 17, 2025. Accepted for publication Apr 09, 2026. Published online May 27, 2026.

doi: 10.21037/tcr-2025-aw-2276


Highlight box

Key findings

• Higher serum concentrations of α-cryptoxanthin, α-carotene, β-cryptoxanthin, and lutein and zeaxanthin were inversely associated with self-reported breast cancer status among adult women in NHANES 2017–2018.

• α-cryptoxanthin showed the most consistent inverse association across multivariable models, and restricted cubic spline analysis suggested a nonlinear concentration-response pattern.

• Supplementary machine learning analyses showed that serum vitamins, carotenoids, and related covariates jointly had modest discriminative ability for breast cancer status.

What is known and what is new?

• Vitamins and carotenoids have antioxidant-related functions and may be linked to biological pathways relevant to breast cancer, but previous epidemiologic findings have been inconsistent.

• This study simultaneously evaluated multiple serum vitamins and carotenoids in a standardized NHANES framework and combined multivariable regression, spline analysis, sensitivity analyses, and supplementary machine learning.

What is the implication, and what should change now?

• The findings provide hypothesis-generating evidence on serum carotenoid patterns associated with breast cancer status.

• Because of the cross-sectional design and self-reported outcome, the results should be interpreted cautiously and require confirmation in prospective studies.


Introduction

Breast cancer is the most commonly diagnosed cancer among women worldwide (1) and remains a major public health challenge because of its high incidence and substantial disease burden (2). Identifying factors associated with breast cancer status remains important for improving the understanding of disease-related biological patterns and for generating clues for prevention-oriented research. Among the factors of interest, circulating micronutrients with antioxidant-related functions, including vitamins and carotenoids, have attracted increasing attention because of their potential links with oxidative stress, inflammation, immune regulation, and cell differentiation (3).

Several vitamins and carotenoids may plausibly be related to breast carcinogenesis through multiple biological pathways. Vitamin C, vitamin E, and carotenoids are involved in antioxidant defense, modulation of reactive oxygen species, and maintenance of cellular redox balance (4-6). In addition, some of these compounds may influence inflammatory responses, immune function, and differentiation-related signaling pathways, all of which have been implicated in tumor development and progression (7). From this perspective, abnormal circulating levels of vitamins and carotenoids may be associated with biological processes relevant to breast cancer.

However, the relationship between serum vitamins/carotenoids and breast cancer is likely to be bidirectional and should be interpreted cautiously. On the one hand, altered vitamin or carotenoid status may be related to mechanisms involved in carcinogenesis (3,8). On the other hand, breast cancer status itself may also influence circulating biomarker levels through treatment-related effects, changes in dietary intake, and post-diagnosis behavioral modification (9,10). Therefore, observed differences in serum concentrations may reflect not only exposure-related variation but also disease-related or survivorship-related changes, particularly in cross-sectional settings where temporal sequence cannot be established.

Previous epidemiologic studies exploring the relationship between vitamins or carotenoids and breast cancer have reported inconsistent findings (3,11). Such inconsistencies may be attributable to differences in study design, study populations, exposure assessment methods, covariate adjustment strategies, and the specific analytes selected for analysis (3). In particular, many previous studies have focused on dietary intake rather than circulating biomarkers (12,13), or have examined only one or a few micronutrients at a time. By contrast, serum-based analyses may better reflect the integrated result of intake, absorption, metabolism, and physiological status at the time of measurement, and simultaneous evaluation of multiple related analytes within the same standardized survey framework may help characterize their comparative association patterns more clearly.

Although prospective cohort studies and case-control studies are better suited to evaluating temporal relationships, cross-sectional biomarker analyses may still provide useful supplementary evidence when interpreted appropriately. Rather than establishing causality, such analyses can characterize concurrent serum concentration patterns associated with breast cancer status, assess whether multiple analytes show consistent or divergent association signals within the same dataset, and provide hypothesis-generating evidence for future longitudinal or mechanistic studies (14). In addition, nationally collected datasets with standardized laboratory measurements offer an opportunity to compare multiple serum vitamins and carotenoids under a common analytic framework (15).

Therefore, using data from female participants aged 20 years and older in NHANES 2017–2018, this study aimed to examine the associations between serum vitamins and carotenoids and self-reported breast cancer status. Specifically, we compared serum concentrations between women with and without breast cancer, evaluated their associations using multivariable models, and explored potential concentration-response relationships for selected analytes. In addition, we used supplementary machine learning analyses to assess whether serum vitamins, carotenoids, and related covariates jointly exhibited distinguishable patterns according to breast cancer status. We present this article in accordance with the STROBE reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2276/rc).


Methods

Study population

This study included female participants aged 20 years or older from the 2017–2018 cycle of the National Health and Nutrition Examination Survey (NHANES), with the study period selected according to the availability of data on serum vitamins and carotenoids. NHANES is a nationwide cross-sectional survey conducted in the United States using a complex, multistage probability sampling design to assess the health and nutritional status of the non-institutionalized U.S. population through questionnaires, physical examinations, and laboratory tests. The NHANES protocol was approved by the Ethics Review Board of the National Center for Health Statistics, and written informed consent was obtained from all participants. Breast cancer status was based on self-report or proxy report collected through the Medical Conditions Questionnaire using the Computer-Assisted Personal Interviewing System (CAPI). Participants who reported having breast cancer were included in the breast cancer group, while those who did not report breast cancer were included in the non-breast cancer group. Information on clinically verified diagnosis, diagnosis timing, and incident versus long-term survivor status was not available in the present dataset. Male participants, female participants younger than 20 years, those with other cancers or unknown cancer status, and those with excessive missing data for important covariates were excluded. Finally, 91 women with self-reported breast cancer and 2,430 women without self-reported breast cancer were included in the analysis. The participant selection process is shown in Figure 1. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Figure 1 Flowchart of screening the study population. NHANES, National Health and Nutrition Examination Survey.

Detection of serum vitamins and carotenoids

NHANES collects blood, urine, and other types of specimens from respondents at the Mobile Examination Center (MEC). After processing and storage, serum specimens are transported to the Division of Laboratory Sciences, National Center for Environmental Health, Centers for Disease Control and Prevention in Atlanta, Georgia. They are stored under appropriate freezing conditions (−30 ℃) until being shipped to the National Center for Environmental Health for testing.

The method used to analyze vitamin C was ultra-performance liquid chromatography-electrochemical detection (UPLC-ECD), while vitamin A, vitamin E, and carotenoids were analyzed using high-performance liquid chromatography-photodiode array detection (HPLC-PDA). For the analytes included in this study, the laboratory minimum detection limits were 300.0 µg/L for vitamin C (ascorbic acid), 10.0 µg/L for retinol, 400.0 µg/L for α-tocopherol, 110.0 µg/L for γ-tocopherol, 10.0 µg/L for total lycopene, 7.0 µg/L for trans-lycopene, 2.0 µg/L for α-carotene, 9.0 µg/L for α-cryptoxanthin, 11.0 µg/L for β-cryptoxanthin, 2.4 µg/L for trans-β-carotene, and 8.0 µg/L for lutein and zeaxanthin, respectively. Values below the minimum detection limit were replaced by the minimum detection limit divided by the square root of two (16).

Covariates

Covariates included age, race, education level, marital status, ratio of family income to poverty, smoking, drinking, body mass index (BMI), age at menarche, number of pregnancies, oral contraceptive use, menopausal hormone therapy (HRT) use, history of hysterectomy, history of bilateral oophorectomy, sedentary time, sleep disorders, hypertension, hyperglycemia, and dietary factors. Race was categorized into five groups: Mexican American, other Hispanic, non-Hispanic White, non-Hispanic Black, and other race including multi-racial. Education level was divided into five categories: less than 9th grade, 9–11th grade (including 12th grade with no diploma), high school graduate/General Educational Development (GED) or equivalent, some college or Associate of Arts (AA) degree, and college graduate or above. Marital status was classified as married/living with partner, widowed/divorced/separated, or never married. Smoking status was categorized as never smoker, former smoker, or current smoker, while drinking status was categorized as never drinker, former drinker, or current drinker. Daily sedentary time was measured in minutes. Sleep disorders were defined based on self-reported questionnaire responses. Hypertension was defined as mean systolic blood pressure ≥140 mmHg, mean diastolic blood pressure ≥90 mmHg, or use of antihypertensive medications. Hyperglycemia was defined as glycated hemoglobin level ≥5.7% or the use of hypoglycemic medications. The Dietary Inflammatory Index (DII) was calculated using dietary intake data collected at the MEC and was derived from the weighted sum of standardized inflammatory scores for 28 dietary components to reflect the potential inflammatory effect of an individual’s dietary pattern.

Statistical analysis

Multiple imputation was used to handle missing values.The missing data for key variables are summarized in Table S1. Baseline characteristics of the breast cancer group and the non-breast cancer group were described using standardized mean differences (SMDs) rather than P values, in order to avoid misleading significance caused by the substantial imbalance in group sizes. Continuous variables are presented as mean [standard deviation (SD)] or median (P25, P75) [M (P25, P75)], as appropriate, and categorical variables are presented as n (%). The rank-sum test was used to compare serum concentrations of vitamins and carotenoids between the two groups, and the median, P25, P75, Z value, and P value were reported. Box plots were drawn for analytes with significant between-group differences to further visualize their distributions. Before multivariable analysis, correlation analysis was performed for the included analytes, and a correlation coefficient matrix was plotted. Highly correlated variables were excluded from subsequent analyses to reduce multicollinearity.

Serum concentrations of the included vitamins and carotenoids were categorized into quartiles (Q1–Q4), with Q1 as the reference group. Associations between serum vitamins/carotenoids and breast cancer status were evaluated using Firth penalized logistic regression, which was chosen to reduce bias and improve the stability of parameter estimation in the presence of a relatively small number of breast cancer cases and marked outcome imbalance. Three models were constructed: Model 1 was unadjusted; Model 2 was adjusted for demographic variables, including age, education level, marital status, race, and ratio of family income to poverty; and Model 3 was further adjusted for BMI, hypertension, hyperglycemia, drinking, smoking, sleep disorders, sedentary time, DII, age at menarche, number of pregnancies, history of hysterectomy, history of bilateral oophorectomy, HRT use, and oral contraceptive use. Odds ratios (ORs), 95% confidence intervals (95% CIs), and P values were reported. Trend tests were additionally performed by entering the quartile categories as ordinal variables in the Firth logistic regression models.

To further explore potential dose-response relationships, restricted cubic spline analyses were performed for variables showing significant associations in the multivariable analyses after adjustment for all covariates. Overall and nonlinear P values were used to assess the shape and statistical significance of the concentration-response relationships.

Two sensitivity analyses were conducted to assess the robustness of the main findings. First, propensity score matching (PSM) was performed using all covariates, including age, race, education level, marital status, ratio of family income to poverty, smoking, drinking, BMI, age at menarche, number of pregnancies, oral contraceptive use, HRT use, history of hysterectomy, history of bilateral oophorectomy, sedentary time, sleep disorders, hypertension, hyperglycemia, and DII. After matching, the associations between serum vitamins/carotenoids and breast cancer status were re-evaluated. Second, a weighted analysis based on the NHANES complex sampling design was conducted as an additional sensitivity analysis. Given that the primary aim of this study was to assess cross-sectional associations rather than to generate nationally representative estimates for the U.S. population, the weighted analysis was treated as a sensitivity analysis rather than the primary analysis.

To further assess whether serum vitamins, carotenoids, and related covariates jointly exhibited a distinguishable pattern according to breast cancer status, four machine learning models, including Logistic regression, support vector machine (SVM), XGBoost, and Gaussian Naive Bayes (GaussianNB), were developed and compared. These models were selected to represent linear discriminative, nonlinear discriminative, tree-based, and generative approaches, respectively. Model performance was evaluated from multiple aspects, including discrimination, performance under low-prevalence conditions, calibration, and enrichment of high-risk strata, using area under the curve (AUC), area under the precision-recall curve (PR-AUC), Brier score, calibration intercept, calibration slope, and top 5% lift. Corresponding 95% CIs were calculated for all performance metrics. Receiver operating characteristic (ROC) curves and risk-stratified plots were used to visualize the discriminative ability of the models. Given the cross-sectional nature of the data, this machine learning analysis was intended as a supplementary pattern-recognition analysis rather than a temporal prediction analysis.

IBM SPSS SPSS Statistics 26.0, R 4.2.2, and Python 3.12.4 were used for statistical analysis.


Results

Basic information

A total of 2,521 women were included in this study, including 91 with breast cancer and 2,430 without breast cancer. Given the substantial imbalance in group size, baseline characteristics were compared using SMDs rather than P values. Detailed baseline characteristics are presented in Table 1. Compared with women without breast cancer, those with breast cancer were older [66.00 (11.73) vs. 49.42 (17.27) years, SMD =1.123], had a higher proportion of widowed/divorced/separated participants (57.1% vs. 26.0%), but lower proportions of married/living with a partner (38.5% vs. 55.8%) and never-married participants (4.4% vs. 18.2%), and were more likely to be non-Hispanic White (52.7% vs. 30.9%). Higher proportions of hypertension (60.4% vs. 38.8%), hyperglycemia (56.0% vs. 42.2%), former drinking (29.7% vs. 19.0%), hysterectomy (36.3% vs. 20.4%), and bilateral oophorectomy (20.9% vs. 10.4%) were observed, whereas the proportion of current drinkers was lower (53.8% vs. 66.3%). By contrast, between-group differences in BMI, smoking status, sleep disorders, sedentary time, DII, age at menarche, number of pregnancies, HRT use, and oral contraceptive use were relatively small (all SMD <0.20).

Table 1

Basic information of the study population

Variable Non-breast cancer (n=2,430) Breast cancer (n=91) SMD
Age, years 49.42±17.27 66.00±11.73 1.123
Education level 0.247
   Less than 9th grade 213 (8.8) 3 (3.3)
   9–11th grade (includes 12th grade with no diploma) 247 (10.2) 11 (12.1)
   High school graduate/GED or equivalent 549 (22.6) 20 (22.0)
   Some college or AA degree 830 (34.2) 31 (34.1)
   College graduate or above 591 (24.3) 26 (28.6)
Marital status 0.725
   Married/living with partner 1,355 (55.8) 35 (38.5)
   Widowed/divorced/separated 633 (26.0) 52 (57.1)
   Never married 442 (18.2) 4 (4.4)
Race 0.484
   Mexican American 333 (13.7) 10 (11.0)
   Other Hispanic 255 (10.5) 4 (4.4)
   Non-Hispanic White 751 (30.9) 48 (52.7)
   Non-Hispanic Black 602 (24.8) 18 (19.8)
   Other race—including multi-racial 489 (20.1) 11 (12.1)
Ratio of family income to poverty 2.45±1.60 2.80±1.65 0.220
BMI, kg/m2 30.31±7.97 29.36±6.39 0.131
Hypertension 0.443
   N 1,487 (61.2) 36 (39.6)
   Y 943 (38.8) 55 (60.4)
Hyperglycemia 0.279
   N 1,404 (57.8) 40 (44.0)
   Y 1,026 (42.2) 51 (56.0)
Drinking 0.276
   Never drinker 358 (14.7) 15 (16.5)
   Former drinker 461 (19.0) 27 (29.7)
   Current drinker 1,611 (66.3) 49 (53.8)
Smoking 0.172
   Never smoker 1,684 (69.3) 64 (70.3)
   Former smoker 384 (15.8) 18 (19.8)
   Current smoker 362 (14.9) 9 (9.9)
Sleep disorders 0.121
   N 1,713 (70.5) 59 (64.8)
   Y 717 (29.5) 32 (35.2)
Sedentary time, hours/day 5.36±3.26 5.50±2.99 0.044
DII 1.32±1.79 1.44±1.65 0.070
Age at menarche, years 12.70±1.82 12.78±1.98 0.041
Number of pregnancies 2.93±2.16 3.13±2.04 0.094
Had a hysterectomy 0.358
   N 1,935 (79.6) 58 (63.7)
   Y 495 (20.4) 33 (36.3)
Had both ovaries removed 0.291
   N 2,177 (89.6) 72 (79.1)
   Y 253 (10.4) 19 (20.9)
HRT use 0.096
   N 2,039 (83.9) 73 (80.2)
   Y 391 (16.1) 18 (19.8)
Oral contraceptive use 0.019
   N 1,607 (66.1) 61 (67.0)
   Y 823 (33.9) 30 (33.0)

Continuous variables are presented as mean ± SD, and categorical variables are presented as n (%). AA, Associate of Arts; BMI, body mass index; DII, Diet Inflammatory Index; GED, General Educational Development; HRT, menopausal hormone therapy; N, no; SD, standard deviation; SMD, standardized mean difference; Y, yes.

Comparison of serum vitamins and carotenoids between groups

As shown in Table 2, the serum concentrations [M (P25, P75)] of vitamin C, retinol, α-tocopherol, γ-tocopherol, total lycopene, trans-lycopene, α-carotene, α-cryptoxanthin, β-cryptoxanthin, lutein and zeaxanthin, and trans-β-carotene in the breast cancer group (BC, n=91) were 990.00 (592.50, 1,355.00), 50.30 (41.65, 63.80), 1,340.00 (1,045.00, 1,530.00), 180.00 (112.50, 242.50), 32.00 (21.10, 48.10), 17.80 (10.65, 26.65), 2.89 (1.31, 5.67), 1.86 (1.30, 2.64), 6.15 (3.67, 10.85), 15.30 (10.01, 24.50), and 15.50 (8.84, 29.70) µg/dL, respectively. In the non-breast cancer group (N-BC, n=2,430), the serum concentrations [M (P25, P75)] of the above-mentioned vitamins were 963.00 (615.00, 1,280.00), 46.90 (39.30, 56.80), 1,160.00 (966.00, 1,430.00), 170.00 (115.00, 230.00), 34.60 (23.60, 47.30), 18.70 (12.60, 25.60), 3.46 (1.67, 7.05), 2.54 (1.77, 3.64), 7.38 (4.47, 14.20), 18.00 (12.20, 26.40), and 15.95 (8.62, 29.70) µg/dL, respectively.

Table 2

Comparison on the serum concentrations of vitamins and carotenoids

Vitamin and carotenoid Serum concentration M (P25, P75)/(μg/dL) Z P
BC (n=91) N-BC (n=2,430)
Vitamin C 990.00 (592.50, 1,355.00) 963.00 (615.00, 1,280.00) −0.48 0.63
Retinol 50.30 (41.65, 63.80) 46.90 (39.30, 56.80) −2.76 0.006*
Alpha-tocopherol 1,340.00 (1,045.00, 1,530.00) 1,160.00 (966.00, 1,430.00) −3.02 0.003*
Gamma-tocopherol 180.00 (112.50, 242.50) 170.00 (115.00, 230.00) −0.13 0.90
Total lycopene 32.00 (21.10, 48.10) 34.60 (23.60, 47.30) −0.82 0.41
Trans-lycopene 17.80 (10.65, 26.65) 18.70 (12.60, 25.60) −0.88 0.38
Alpha-carotene 2.89 (1.31, 5.67) 3.46 (1.67, 7.05) −1.71 0.09
Alpha-cryptoxanthin 1.86 (1.30, 2.64) 2.54 (1.77, 3.64) −4.88 <0.001*
Beta-cryptoxanthin 6.15 (3.67, 10.85) 7.38 (4.47, 14.20) −2.47 0.01*
Lutein and zeaxanthin 15.30 (10.01, 24.50) 18.00 (12.20, 26.40) −1.88 0.06
Trans-beta-carotene 15.50 (8.84, 29.70) 15.95 (8.62, 29.70) −0.05 0.96

*, P<0.05. BC, breast cancer; M, median; N-BC, non-breast cancer.

No statistically significant differences were observed in the concentrations of vitamin C, γ-tocopherol, total lycopene, trans-lycopene, α-carotene, lutein and zeaxanthin, and trans-β-carotene between the breast cancer group and the non-breast cancer group (P>0.05). In contrast, there were statistically significant differences in the distribution of retinol, α-tocopherol, α-cryptoxanthin, and β-cryptoxanthin concentrations between the two groups (P<0.05).

As shown in Figure 2, the box plots indicated that the median concentrations of α-cryptoxanthin and β-cryptoxanthin in the non-breast cancer group were significantly higher than those in the breast cancer group, while the median concentrations of α-tocopherol and retinol in the breast cancer group were significantly higher than those in the non-breast cancer group.

Figure 2 Box plots comparing serum concentrations of selected vitamins and carotenoids. CI, confidence interval.

Correlation analysis

Before conducting multivariate analysis, correlation analysis was performed on the serum concentrations of the 11 vitamins and carotenoids, and a correlation matrix was plotted to exclude highly correlated variables. As shown in Figure 3, the serum concentrations of most vitamins and carotenoids exhibited a positive correlation, whereas the serum concentration of γ-tocopherol showed a negative correlation with some others. Specifically, total lycopene had a correlation coefficient as high as 0.97 with trans-lycopene, and trans-β-carotene had a correlation coefficient as high as 0.77 with α-carotene. Consequently, trans-lycopene and trans-β-carotene were removed from subsequent analyses.

Figure 3 Correlation plot of 11 vitamins and carotenoids.

Multivariate logistic regression

Multivariable logistic regression analyses were performed to examine the associations between serum vitamins and carotenoids and breast cancer status. Three models were constructed: Model 1 was unadjusted; Model 2 was adjusted for demographic variables, including age, education level, marital status, race, and ratio of family income to poverty; and Model 3 was further adjusted for BMI, hypertension, hyperglycemia, drinking, smoking, sleep disorders, sedentary time, DII, age at menarche, number of pregnancies, history of hysterectomy, history of bilateral oophorectomy, HRT use, and oral contraceptive use The detailed results are presented in Table 3.

Table 3

Multivariable Firth penalized logistic regression analyses of the associations between serum vitamins/carotenoids and breast cancer status

Antioxidant Group Serum concentration interval, μg/dL BC (n) Non-BC (n) Model 1 Model 2 Model 3
OR (95% CI) P value P trend OR (95% CI) P value P trend OR (95% CI) P value P trend
Vitamin C Q1 <610.00 24 606 Ref. 0.61 Ref. 0.39 Ref. 0.31
Q2 [610.00, 960.00) 20 610 0.83 (0.45–1.51) 0.54 0.97 (0.52–1.82) 0.93 0.93 (0.49–1.75) 0.83
Q3 [960.00, 1,280.00) 18 599 0.76 (0.41–1.41) 0.39 0.72 (0.38–1.37) 0.32 0.71 (0.36–1.38) 0.31
Q4 ≥1,280.00 29 615 1.19 (0.69–2.07) 0.54 0.81 (0.45–1.46) 0.49 0.76 (0.41–1.41) 0.38
Alpha-carotene Q1 <1.68 28 601 Ref. 0.046 Ref. 0.04 Ref. 0.03
Q2 [1.68, 3.41) 21 608 0.75 (0.42–1.32) 0.31 0.56 (0.3–1.02) 0.058 0.54 (0.29–0.99) 0.05
Q3 [3.41, 6.95) 28 604 1 (0.58–1.7) 0.99 0.87 (0.49–1.54) 0.64 0.82 (0.45–1.48) 0.51
Q4 ≥6.95 14 617 0.5 (0.25–0.93) 0.03 0.41 (0.2–0.81) 0.01 0.38 (0.18–0.79) 0.009
Alpha-cryptoxanthin Q1 <1.73 44 578 Ref. <0.001 Ref. <0.001 Ref. <0.001
Q2 [1.73, 2.51) 22 611 0.48 (0.28–0.8) 0.004 0.56 (0.32–0.96) 0.03 0.52 (0.3–0.89) 0.02
Q3 [2.51, 3.57) 15 618 0.33 (0.18–0.57) <0.001 0.43 (0.22–0.78) 0.005 0.38 (0.2–0.71) 0.002
Q4 ≥3.57 10 623 0.22 (0.1–0.42) <0.001 0.31 (0.15–0.62) 0.001 0.27 (0.12–0.56) <0.001
Beta-cryptoxanthin Q1 <4.41 38 591 Ref. 0.001 Ref. 0.03 Ref. 0.02
Q2 [4.41, 7.30) 21 609 0.54 (0.31–0.92) 0.02 0.63 (0.35–1.1) 0.11 0.59 (0.33–1.03) 0.07
Q3 [7.30, 13.80) 19 612 0.49 (0.28–0.84) 0.01 0.65 (0.35–1.16) 0.14 0.6 (0.32–1.09) 0.10
Q4 ≥13.80 13 618 0.34 (0.17–0.61) <0.001 0.43 (0.21–0.85) 0.01 0.39 (0.18–0.79) 0.009
Gamma-tocopherol Q1 <115.00 22 603 Ref. 0.48 Ref. 0.16 Ref. 0.16
Q2 [115.00, 170.00) 20 610 0.9 (0.49–1.66) 0.74 1.23 (0.65–2.32) 0.53 1.34 (0.7–2.56) 0.38
Q3 [170.00, 230.00) 23 606 1.04 (0.58–1.88) 0.90 1.65 (0.89–3.09) 0.11 1.79 (0.95–3.4) 0.07
Q4 ≥230.00 26 611 1.16 (0.66–2.08) 0.61 1.49 (0.81–2.76) 0.20 1.54 (0.82–2.93) 0.18
Lutein and zeaxanthin Q1 <12.00 34 592 Ref. 0.007 Ref. 0.001 Ref. <0.001
Q2 [12.00, 17.80) 24 608 0.69 (0.4–1.17) 0.17 0.68 (0.39–1.18) 0.17 0.61 (0.34–1.07) 0.08
Q3 [17.80, 26.40) 17 614 0.49 (0.27–0.87) 0.01 0.45 (0.24–0.83) 0.01 0.39 (0.2–0.72) 0.003
Q4 ≥26.40 16 616 0.46 (0.25–0.82) 0.008 0.36 (0.18–0.67) 0.001 0.31 (0.15–0.6) <0.001
Total lycopene Q1 <23.10 30 596 Ref. 0.25 Ref. 0.95 Ref. 0.97
Q2 [23.10, 34.20) 21 613 0.69 (0.39–1.2) 0.19 0.87 (0.48–1.56) 0.64 0.89 (0.49–1.6) 0.70
Q3 [34.20, 47.30) 18 612 0.59 (0.32–1.05) 0.08 0.84 (0.45–1.53) 0.57 0.86 (0.45–1.59) 0.63
Q4 ≥47.30 22 609 0.72 (0.41–1.25) 0.25 1.03 (0.56–1.86) 0.93 1.02 (0.55–1.86) 0.95
Retinol Q1 <39.20 14 610 Ref. 0.001 Ref. 0.77 Ref. 0.91
Q2 [39.20, 47.00) 18 613 1.27 (0.63–2.59) 0.50 0.91 (0.45–1.9) 0.81 0.89 (0.44–1.86) 0.76
Q3 [47.00, 57.00) 24 609 1.69 (0.89–3.35) 0.11 0.88 (0.44–1.81) 0.73 0.9 (0.45–1.86) 0.78
Q4 ≥57.00 35 598 2.5 (1.37–4.79) 0.002 0.87 (0.45–1.76) 0.70 0.91 (0.46–1.86) 0.79
Alpha-tocopherol Q1 <970.00 17 612 Ref. 0.005 Ref. 0.33 Ref. 0.32
Q2 [970.00, 1,170.00) 13 599 0.79 (0.38–1.61) 0.52 0.54 (0.25–1.13) 0.10 0.55 (0.25–1.15) 0.11
Q3 [1,170.00, 1,440.00) 29 617 1.67 (0.93–3.11) 0.09 0.78 (0.42–1.5) 0.45 0.79 (0.41–1.52) 0.47
Q4 ≥1,440.00 32 602 1.89 (1.06–3.48) 0.03 0.6 (0.32–1.18) 0.14 0.6 (0.31–1.19) 0.14

Model 1, unadjusted; Model 2, adjusted for demographic variables, including age, education level, marital status, race, and ratio of family income to poverty; Model 3, further adjusted for BMI, hypertension, hyperglycemia, drinking, smoking, sleep disorders,sedentary time, DII, age at menarche, number of pregnancies, history of hysterectomy, history of bilateral oophorectomy, HRT use, and oral contraceptive use. BC, breast cancer; BMI, body mass index; CI, confidence interval; DII, Dietary Inflammatory Index; HRT, menopausal hormone therapy; OR, odds ratio.

In Model 1, higher serum concentrations of alpha-carotene, alpha-cryptoxanthin, beta-cryptoxanthin, and lutein and zeaxanthin were associated with lower odds of breast cancer, whereas higher concentrations of retinol and alpha-tocopherol were associated with higher odds of breast cancer. In contrast, no significant associations were observed for vitamin C, gamma-tocopherol, or total lycopene.

After adjustment for demographic variables in Model 2, the inverse associations for alpha-carotene, alpha-cryptoxanthin, beta-cryptoxanthin, and lutein and zeaxanthin remained. For alpha-carotene, participants in Q4 had lower odds of breast cancer than those in Q1 (OR =0.41, 95% CI: 0.20–0.81, P=0.01; P-trend =0.04). For alpha-cryptoxanthin, the associations were more pronounced, with significantly lower odds observed in Q2, Q3, and Q4 compared with Q1 (Q2: OR =0.56, 95% CI: 0.32–0.96, P=0.03; Q3: OR =0.43, 95% CI: 0.22–0.78, P=0.005; Q4: OR =0.31, 95% CI: 0.15–0.62, P=0.001; P-trend <0.001). For beta-cryptoxanthin, only the highest quartile remained significantly associated with lower odds of breast cancer (Q4: OR =0.43, 95% CI: 0.21–0.85, P=0.01; P-trend =0.03). For lutein and zeaxanthin, significant inverse associations were observed in Q3 and Q4 (Q3: OR =0.45, 95% CI: 0.24–0.83, P=0.01; Q4: OR =0.36, 95% CI: 0.18–0.67, P=0.001; P-trend =0.001). By contrast, the positive associations of retinol and alpha-tocopherol were attenuated after demographic adjustment and were no longer statistically significant.

In the fully adjusted Model 3, the overall pattern remained largely unchanged. Alpha-carotene remained inversely associated with breast cancer status, with significant associations observed in Q2 (OR =0.54, 95% CI: 0.29–0.99, P<0.05) and Q4 (OR =0.38, 95% CI: 0.18–0.79, P=0.009), and a significant trend across quartiles (P-trend =0.03). Alpha-cryptoxanthin showed the most stable inverse association across models; compared with Q1, the ORs for breast cancer were 0.52 (95% CI: 0.30–0.89) in Q2, 0.38 (95% CI: 0.20–0.71) in Q3, and 0.27 (95% CI: 0.12–0.56) in Q4 (all P<0.05; P-trend <0.001). For beta-cryptoxanthin, only Q4 was significantly associated with lower odds of breast cancer (OR =0.39, 95% CI:0.18–0.79, P=0.009; P-trend =0.02). Lutein and zeaxanthin also remained negatively associated with breast cancer status, with significant associations in Q3 (OR =0.39, 95% CI: 0.20–0.72, P=0.003) and Q4 (OR =0.31, 95% CI: 0.15–0.60, P<0.001), and a significant dose-response trend (P-trend <0.001). Vitamin C, gamma-tocopherol, total lycopene, retinol, and alpha-tocopherol were not significantly associated with breast cancer status after full adjustment.

Overall, after sequential adjustment for demographic and other covariates, alpha-carotene, alpha-cryptoxanthin, beta-cryptoxanthin, and lutein and zeaxanthin remained negatively associated with breast cancer status, among which alpha-cryptoxanthin showed the most consistent association across the three models.

Restricted cubic spline analysis

Restricted cubic spline analyses were further performed for alpha-carotene, alpha-cryptoxanthin, beta-cryptoxanthin, and lutein and zeaxanthin after adjustment for all covariates, in order to explore the potential dose-response relationships between their serum concentrations and breast cancer status. As shown in Figure 4, the overall association was statistically significant for alpha-cryptoxanthin (P-overall <0.001) and lutein and zeaxanthin (P-overall =0.002), whereas no statistically significant overall association was observed for alpha-carotene (P-overall =0.21) or beta-cryptoxanthin (P-overall =0.09). In addition, significant nonlinear associations were detected for alpha-cryptoxanthin (P-nonlinear =0.005) and lutein and zeaxanthin (P-nonlinear =0.03). For both variables, the fitted curves showed that the OR decreased rapidly with increasing concentration in the lower range and then gradually leveled off at higher concentrations. By contrast, neither alpha-carotene (P-nonlinear =0.18) nor beta-cryptoxanthin (P-nonlinear =0.15) showed evidence of a statistically significant nonlinear association after full covariate adjustment. Overall, the restricted cubic spline analyses suggested that alpha-cryptoxanthin and lutein and zeaxanthin had clearer concentration-response relationships with breast cancer status, whereas the dose-response evidence for alpha-carotene and beta-cryptoxanthin remained limited.

Figure 4 Adjusted restricted cubic spline curves for serum alpha-carotene, alpha-cryptoxanthin, beta-cryptoxanthin, and lutein and zeaxanthin in relation to breast cancer status. CI, confidence interval; OR, odds ratio.

Sensitivity analysis

Sensitivity analysis based on PSM

To further assess the robustness of the main findings, PSM was performed using all covariates, including age, race, education level, marital status, ratio of family income to poverty, smoking, drinking, BMI, age at menarche, number of pregnancies, oral contraceptive use, HRT use, history of hysterectomy, history of bilateral oophorectomy, sedentary time, sleep disorders, hypertension, hyperglycemia, and DII. After matching, 91 breast cancer cases and 364 non-breast cancer controls were included in the analysis. Detailed balance test results are provided in Table S2. The balance diagnostics showed that covariate balance between the two groups was substantially improved after matching.

As shown in Table 4, the inverse associations of alpha-cryptoxanthin and lutein and zeaxanthin with breast cancer status remained significant after PSM. For alpha-cryptoxanthin, compared with Q1, the odds of breast cancer were significantly lower in Q3 (OR =0.40, 95% CI: 0.21–0.77, P=0.006) and Q4 (OR =0.27, 95% CI: 0.12–0.56, P<0.001), with a significant dose-response trend (P-trend <0.001). For lutein and zeaxanthin, significant inverse associations were observed in Q3 (OR =0.43, 95% CI: 0.22–0.84, P=0.01) and Q4 (OR =0.35, 95% CI: 0.17–0.72, P=0.004), with a significant trend across quartiles (P-trend =0.004). No significant associations were observed for vitamin C, alpha-carotene, beta-cryptoxanthin, gamma-tocopherol, total lycopene, retinol, or alpha-tocopherol after matching. These findings were generally consistent with the main analysis.

Table 4

Associations of serum vitamins and carotenoids with breast cancer status after propensity score matching

Vitamin and carotenoid Group Serum concentration, μg/dL BC (n) N-BC (n) OR (95% CI) P P-trend
Vitamin C Q1 <630.00 25 89 Ref. Ref. 0.27
Q2 [630.00, 1040.00) 24 87 0.99 (0.52–1.86) 0.97
Q3 [1040.00, 1380.00) 22 94 0.82 (0.42–1.59) 0.55
Q4 ≥1380.00 20 94 0.69 (0.34–1.38) 0.29
Alpha-carotene Q1 <1.69 28 86 Ref. Ref. 0.17
Q2 [1.69, 3.40) 21 92 0.70 (0.36–1.34) 0.28
Q3 [3.40, 6.53) 24 90 0.79 (0.41–1.52) 0.48
Q4 ≥6.53 18 96 0.55 (0.26–1.14) 0.11
Alpha-cryptoxanthin Q1 <1.42 34 79 Ref. Ref. <0.001
Q2 [1.42, 2.12) 25 87 0.63 (0.34–1.16) 0.14
Q3 [2.12, 3.14) 18 98 0.40 (0.21–0.77) 0.006
Q4 ≥3.14 14 100 0.27 (0.12–0.56) <0.001
Beta-cryptoxanthin Q1 <3.59 25 89 Ref. Ref. 0.24
Q2 [3.59, 5.94) 25 88 0.98 (0.52–1.85) 0.95
Q3 [5.94, 11.70) 22 92 0.82 (0.42–1.59) 0.55
Q4 ≥11.70 19 95 0.66 (0.30–1.39) 0.27
Gamma-tocopherol Q1 <106.00 17 96 Ref. Ref. 0.17
Q2 [106.00, 163.00) 21 89 1.29 (0.64–2.65) 0.48
Q3 [163.00, 241.00) 28 89 1.84 (0.94–3.68) 0.08
Q4 ≥241.00 25 90 1.59 (0.79–3.24) 0.19
Lutein and zeaxanthin Q1 <11.60 32 81 Ref. Ref. 0.004
Q2 [11.60, 17.40) 25 89 0.68 (0.37–1.24) 0.21
Q3 [17.40, 27.50) 18 95 0.43 (0.22–0.84) 0.01
Q4 ≥27.50 16 99 0.35 (0.17–0.72) 0.004
Total lycopene Q1 <20.20 22 90 Ref. Ref. 0.85
Q2 [20.20, 32.10) 25 90 1.11 (0.58–2.13) 0.75
Q3 [32.10, 45.10) 19 93 0.84 (0.42–1.70) 0.63
Q4 ≥45.10 25 91 1.12 (0.57–2.20) 0.74
Retinol Q1 <43.20 28 85 Ref. Ref. 0.26
Q2 [43.20, 51.10) 22 92 0.72 (0.38–1.35) 0.30
Q3 [51.10, 63.60) 19 94 0.61 (0.31–1.18) 0.15
Q4 ≥63.60 22 93 0.67 (0.34–1.30) 0.24
Alpha-tocopherol Q1 <1,075.00 26 88 Ref. Ref. 0.68
Q2 [1,075.00, 1,300.00) 16 91 0.62 (0.30–1.22) 0.17
Q3 [1,300.00, 1,575.00) 29 91 1.06 (0.57–1.99) 0.85
Q4 ≥1,575.00 20 94 0.74 (0.36–1.50) 0.40

BC, breast cancer; CI, confidence interval; N-BC, non-breast cancer; OR, odds ratio.

Sensitivity analysis based on NHANES weighting

A weighted analysis based on the NHANES sampling design was further conducted as a second sensitivity analysis. As shown in Table 5, several associations identified in the primary unweighted models were attenuated after weighting. For alpha-cryptoxanthin, although the associations were no longer statistically significant in the weighted analysis, the effect estimates across Q2–Q4 remained below 1.00, indicating a directionally consistent inverse association with breast cancer status compared with the main analysis. For lutein and zeaxanthin, participants in Q3 had significantly lower odds of breast cancer than those in Q1 (OR =0.19, 95% CI: 0.06–0.62, P=0.009), whereas the association for Q4 did not reach statistical significance. In addition, for total lycopene, Q2 was associated with lower odds of breast cancer compared with Q1 (OR =0.43, 95% CI: 0.22–0.84, P=0.02). No statistically significant weighted associations were observed for vitamin C, alpha-carotene, beta-cryptoxanthin, gamma-tocopherol, retinol, or alpha-tocopherol.

Table 5

Weighted associations of serum vitamins and carotenoids with breast cancer status in NHANES

Antioxidant Group Serum concentration interval, μg/dL Weighted BC (%) Weighted N-BC (%) OR (95% CI) P value P-trend
Vitamin C Q1 <610.00 2.77 97.23 Ref. 0.64
Q2 [610.00, 960.00) 4.75 95.25 1.92 (0.80–4.57) 0.13
Q3 [960.00, 1,280.00) 2.69 97.31 0.82 (0.21–3.18) 0.75
Q4 ≥1,280.00 5.77 94.23 1.05 (0.51–2.12) 0.90
Alpha-carotene Q1 <1.68 3.09 96.91 Ref. 0.39
Q2 [1.68, 3.41) 3.88 96.12 0.83 (0.38–1.83) 0.62
Q3 [3.41, 6.95) 5.82 94.18 1.53 (0.66–3.56) 0.30
Q4 ≥6.95 3.37 96.63 0.55 (0.14–2.23) 0.38
Alpha-cryptoxanthin Q1 <1.73 6.56 93.44 Ref. 0.19
Q2 [1.73, 2.51) 3.96 96.04 0.70 (0.27–1.81) 0.44
Q3 [2.51, 3.57) 2.61 97.39 0.34 (0.06–1.87) 0.20
Q4 ≥3.57 2.54 97.46 0.32 (0.06–1.78) 0.18
Beta-cryptoxanthin Q1 <4.41 4.10 95.90 Ref. 0.35
Q2 [4.41, 7.30) 4.90 95.10 1.27 (0.51–3.15) 0.58
Q3 [7.30, 13.80) 3.74 96.26 0.67 (0.20–2.32) 0.51
Q4 ≥13.80 2.99 97.01 0.59 (0.16–2.21) 0.41
Gamma-tocopherol Q1 <115.00 3.71 96.29 Ref. 0.053
Q2 [115.00, 170.00) 3.44 96.56 1.34 (0.48–3.75) 0.55
Q3 [170.00, 230.00) 4.26 95.74 2.53 (0.99–6.45) 0.052
Q4 ≥230.00 4.63 95.37 2.21 (0.90–5.42) 0.08
Lutein and zeaxanthin Q1 <12.00 4.87 95.13 Ref. 0.20
Q2 [12.00, 17.80) 4.87 95.13 0.54 (0.14–2.17) 0.36
Q3 [17.80, 26.40) 2.36 97.64 0.19 (0.06–0.62) 0.009
Q4 ≥26.40 3.77 96.23 0.36 (0.11–1.23) 0.10
Total lycopene Q1 <23.10 5.97 94.03 Ref. 0.17
Q2 [23.10, 34.20) 2.06 97.94 0.43 (0.22–0.84) 0.02
Q3 [34.20, 47.30) 3.12 96.88 0.93 (0.34–2.54) 0.89
Q4 ≥47.30 5.40 94.60 1.44 (0.60–3.45) 0.39
Retinol Q1 <39.20 2.41 97.59 Ref. 0.86
Q2 [39.20, 47.00) 2.22 97.78 0.55 (0.16–1.89) 0.32
Q3 [47.00, 57.00) 4.04 95.96 0.66 (0.18–2.36) 0.49
Q4 ≥57.00 6.72 93.28 0.83 (0.17–4.07) 0.80
Alpha-tocopherol Q1 <970.00 1.92 98.08 Ref. 0.31
Q2 [970.00, 1,170.00) 2.69 97.31 0.91 (0.22–3.74) 0.89
Q3 [1,170.00, 1,440.00) 5.75 94.25 1.09 (0.38–3.15) 0.86
Q4 ≥1,440.00 5.27 94.73 0.67 (0.31–1.46) 0.29

BC, breast cancer; CI, confidence interval; N-BC, non-breast cancer; NHANES, National Health and Nutrition Examination Survey; OR, odds ratio.

Overall, the weighted analysis suggested attenuation of several associations identified in the primary models; however, the inverse effect direction for alpha-cryptoxanthin remained consistent with the main analysis, and the association for lutein and zeaxanthin was partially retained. These findings indicate that the main results were generally directionally robust, although some associations were weakened after incorporating the complex survey weighting and should therefore be interpreted with caution.

Joint discriminative pattern of serum vitamins, carotenoids, and covariates

To further assess whether serum vitamins, carotenoids, and related covariates jointly showed a distinguishable pattern according to breast cancer status, four machine learning models, including Logistic regression, SVM, XGBoost, and GaussianNB, were developed and compared. Detailed performance metrics are summarized in Table 6. All four models showed some ability to discriminate breast cancer cases from non-cases, suggesting that the included variables jointly carried discriminative information related to breast cancer status. Among the models, Logistic regression showed the most favorable overall performance, SVM and XGBoost showed intermediate performance, and GaussianNB performed relatively less well.

Table 6

Performance comparison of machine learning models for discriminating breast cancer

Model AUC PR-AUC Brier score Calibration intercept Calibration slope Top 5% lift
Logistic 0.824 (0.765 to 0.884) 0.165 (0.083 to 0.334) 0.0325 (0.0218 to 0.0438) 1.719 (0.358 to 3.210) 1.584 (1.159 to 2.102) 5.903 (2.344 to 8.688)
SVM 0.780 (0.691 to 0.860) 0.135 (0.066 to 0.266) 0.0333 (0.0223 to 0.0451) 2.486 (0.250 to 4.553) 1.808 (1.122 to 2.464) 5.165 (2.213 to 8.931)
XGBoost 0.774 (0.701 to 0.843) 0.133 (0.063 to 0.289) 0.0331 (0.0221 to 0.0446) 0.945 (−0.484 to 2.352) 1.337 (0.902 to 1.813) 4.427 (1.422 to 7.556)
GaussianNB 0.760 (0.680 to 0.832) 0.084 (0.051 to 0.142) 0.0337 (0.0229 to 0.0456) 0.995 (−0.528 to 2.576) 1.335 (0.891 to 1.842) 1.476 (0.000 to 3.557)

Data are presented as value (95% CI). AUC, area under the curve; CI, confidence interval; GaussianNB, Gaussian Naive Bayes; PR-AUC, area under the precision-recall curve; SVM, support vector machine.

The comparative results across AUC, PR-AUC, and top 5% lift consistently favored the Logistic model. As shown in Figure 5, the ROC curves showed a similar ranking across models. Moreover, as shown in Figure 6, the observed breast cancer event rate generally increased with increasing model-assigned risk strata, particularly for the logistic, SVM, and XGBoost models, indicating that the joint information from serum vitamins, carotenoids, and covariates could separate participants into strata with different breast cancer status distributions.

Figure 5 ROC curves of different machine learning models. AUC, area under the curve; CI, confidence interval; GaussianNB, Gaussian Naive Bayes; ROC, receiver operating characteristic; SVM, support vector machine.
Figure 6 Risk-stratified observed breast cancer event rates across different machine learning models. CI, confidence interval; GaussianNB, Gaussian Naive Bayes; SVM, support vector machine.

Taken together, these findings indicate that the serum vitamins, carotenoids, and covariates included in this study jointly formed a meaningful discriminative pattern for breast cancer status. Given the cross-sectional nature of the data, these results should not be interpreted as evidence of temporal prediction; instead, they provide supplementary pattern-level support for the robustness of the main association analyses.


Discussion

In this cross-sectional analysis of women in NHANES 2017–2018, we found that higher serum concentrations of alpha-carotene, alpha-cryptoxanthin, beta-cryptoxanthin, and lutein and zeaxanthin were associated with lower odds of self-reported breast cancer status after covariate adjustment, with alpha-cryptoxanthin showing the most consistent inverse association across models. In the fully adjusted model, alpha-cryptoxanthin remained significantly associated with lower odds of breast cancer status across Q2–Q4, while alpha-carotene and lutein and zeaxanthin also showed significant inverse associations, and beta-cryptoxanthin remained significant only in the highest quartile. By contrast, the positive crude associations observed for retinol and alpha-tocopherol were attenuated after covariate adjustment, suggesting that those initial associations were likely influenced by demographic or clinical differences between groups. Restricted cubic spline analyses further showed clearer concentration-response patterns for alpha-cryptoxanthin and lutein and zeaxanthin, with inverse nonlinear associations that were more pronounced in the lower concentration range. Supplementary machine learning analyses suggested that the biomarkers and related covariates jointly showed some discriminative pattern consistent with the main association findings.

As a naturally occurring carotenoid, α-cryptoxanthin is widely present in various fruits and vegetables (17). As a carotenoid with long-chain conjugated double bonds, its molecular structure containing carbon-carbon double bonds endows it with light absorption capacity and antioxidant properties (16). It can neutralize free radicals in the body and prevent cell damage caused by oxidative stress. Its long-chain conjugated double bond structure enables it to effectively absorb and disperse the energy of free radicals, reducing free radical attacks on DNA, proteins, and lipids (18). A metabolomic analysis revealed that carotenoids are associated with metabolites involved in immune regulation, redox balance, membrane signaling, and β-oxidation, and increased dietary intake of carotenoids helps reduce the risk of breast cancer (19).

Previous studies on the relationship between α-cryptoxanthin and breast cancer are scarce, but the antioxidant function of α-cryptoxanthin has been confirmed in the prevention and treatment of various diseases. α-Cryptoxanthin can induce mammalian phase 2 proteins to protect cells from damage by oxidants and electrophiles, thereby playing a positive role in cancer prevention (20). A multi-ethnic cohort study found a significant negative correlation between serum α-cryptoxanthin levels and non-Hodgkin lymphoma (NHL) (21). A case-control study from Texas showed that α-cryptoxanthin helps reduce the risk of prostate cancer (22). Another study using NHANES data found that the antioxidant effect of α-cryptoxanthin plays a key role in reducing the risk of metabolic dysfunction-associated fatty liver disease (MAFLD) (23). Other studies have indicated that α-cryptoxanthin, as an antioxidant, can significantly reduce the risk of sleep disorders (24) and chronic obstructive pulmonary disease (25). Antioxidants including α-cryptoxanthin can play an important role in inhibiting inflammatory cytokines in elderly women (26), and some studies have found that α-cryptoxanthin can improve cervical dysplasia in women (27), suggesting that α-cryptoxanthin may exert a positive effect on women’s health. The biological mechanism by which α-cryptoxanthin affects breast cancer requires further clinical investigation.

Lutein and zeaxanthin are non-provitamin A xanthophyll carotenoids widely present in dark green leafy vegetables, corn, egg yolk, and other plant foods, and their polyene chain with multiple conjugated double bonds enables them to quench singlet oxygen and scavenge reactive oxygen species, thereby limiting oxidative damage to lipids, proteins, and DNA (16,28). In addition to their direct antioxidant activity, accumulating evidence suggests that these xanthophylls also modulate redox-sensitive signaling pathways. Experimental studies in breast cancer cells have shown that lutein can suppress proliferation and promote apoptosis through effects on antioxidant defense response-related survival signaling, accompanied by increased caspase-3 activity and reduced Bcl-2 and PARP expression (29).

Previous epidemiologic studies on lutein and zeaxanthin and breast cancer have yielded generally supportive, though not entirely uniform, results. In a large prospective analysis of circulating carotenoids, higher blood levels of lutein + zeaxanthin were associated with lower breast cancer risk, suggesting that circulating carotenoid status may capture biologically relevant exposure more directly than diet alone (30). More recently, a Korean case-control study reported that higher dietary carotenoid intake, particularly lutein/zeaxanthin, was associated with a lower risk of breast cancer (12). A 2025 case-control analysis also supported an inverse association between dietary intake of specific carotenoids, especially lycopene and lutein/zeaxanthin, and breast cancer risk (13). However, earlier meta-analytic evidence suggested that while total carotenoid exposure and some individual carotenoids were inversely related to breast cancer risk, the association for dietary lutein/zeaxanthin alone was less consistent across studies, possibly due to differences in dietary assessment, bioavailability, food matrices, population characteristics, menopausal status, and residual confounding (31). This inconsistency is not unexpected, because circulating biomarkers may better reflect absorption, metabolism, and short- to medium-term physiological status than questionnaire-based dietary estimates (8,30).

As a provitamin A carotenoid, α-carotene may exert anticancer effects not only through direct antioxidant activity, but also through its metabolic conversion to retinoid-related compounds that participate in the regulation of cell differentiation, proliferation, and apoptosis (30). Recent metabolomic evidence further indicates that circulating carotenoids are associated with metabolites involved in redox balance, immune regulation, membrane signaling, and β-oxidation, supporting the view that carotenoid-rich exposure may influence several biological processes related to breast cancer development rather than acting through a single pathway (19). In this context, α-carotene may contribute to a less pro-oxidant and less pro-inflammatory internal milieu, thereby inhibiting tumor initiation or early progression. A 2024 meta-analysis of circulating carotenoids reported that higher circulating α-carotene was associated with a lower risk of breast cancer, together with several other carotenoids, supporting the relevance of biomarker-based exposure assessment (11). Similarly, a case-control study nested in the Cancer Prevention Study II Nutrition Cohort found that higher plasma α-carotene was associated with lower risk of invasive breast cancer (32). However, a recent large case-control study indicated that associations for α-carotene were weaker or less consistent than those observed for some other carotenoids, implying that the magnitude of association may vary across populations and study designs (13).

β-Cryptoxanthin is a xanthophyll carotenoid with provitamin A activity, and like other carotenoids, it can quench reactive oxygen species and reduce oxidative injury to DNA, proteins, and membrane lipids (30,33). A 2024 meta-analysis of circulating carotenoids reported that higher circulating β-cryptoxanthin was associated with lower breast cancer risk overall (11). Earlier prospective data from the SU.VI.MAX cohort also suggested that higher plasma β-cryptoxanthin concentrations were inversely associated with both overall cancer risk and breast cancer risk (34). In Chinese women, higher dietary β-cryptoxanthin intake was associated with lower breast cancer risk in a case-control study, and a 2016 reinterpretation of pooled analyses likewise concluded that β-cryptoxanthin may have a protective effect on overall breast cancer (35,36). However, not all studies have been fully consistent. A 2025 case-control study found that β-cryptoxanthin showed weaker or less consistent associations than some other carotenoids, suggesting heterogeneity across populations (13). Such discrepancies may arise from variation in smoking status, dietary patterns, food sources, assay methods, and the limited correlation between dietary intake and circulating concentration (11,30). Nevertheless, β-cryptoxanthin has shown inverse associations with risk in other cancer-related settings, including colorectal cancer and cervical intraepithelial neoplasia, lending broader support to its potential anticarcinogenic role (37,38). Overall, the current literature suggests that β-cryptoxanthin is a plausible protective carotenoid in breast cancer, but more high-quality prospective studies are needed to clarify dose-response patterns and subtype-specific associations.

The sensitivity analyses provided additional context for interpreting the robustness of the main findings. After PSM using all covariates, the inverse associations for alpha-cryptoxanthin and lutein and zeaxanthin remained significant, which supports the relative stability of these two findings. In contrast, the weighted NHANES analysis attenuated several associations identified in the primary unweighted models. Nevertheless, the inverse effect estimates for alpha-cryptoxanthin remained directionally consistent across quartiles, and the association for lutein and zeaxanthin was partially retained. These results suggest that the main findings are not entirely driven by a single modeling approach, but they also indicate that some associations are sensitive to weighting and should therefore be interpreted with caution.

There are several strengths in this study. First, it used standardized laboratory measurements from NHANES, which allowed simultaneous evaluation of multiple serum vitamins and carotenoids within the same analytic framework. Second, compared with studies based only on dietary intake, serum biomarker analysis may better reflect the integrated result of intake, absorption, metabolism, and physiological status at the time of measurement. Third, the use of sequentially adjusted Firth logistic regression models, restricted cubic spline analyses, and two sensitivity analyses helped provide a more comprehensive assessment of the robustness of the observed associations. Fourth, the supplementary machine learning analyses offered an additional perspective on whether the included biomarkers and covariates jointly formed distinguishable patterns according to breast cancer status.

Several limitations should also be acknowledged. First, because this was a cross-sectional study, temporal sequence could not be established and causal inference is not possible. Reverse causality is a major concern, since serum biomarker levels may have been influenced by the disease itself, treatment exposure, or post-diagnosis behavioral changes. Second, breast cancer status was based on self-report rather than clinical verification, and the dataset did not allow distinction between newly diagnosed cases and long-term survivors, which may have introduced outcome misclassification and further complicated interpretation of serum biomarker patterns. Third, the number of women with breast cancer was relatively small, which may have limited statistical power for some analyses and contributed to uncertainty in subgroup estimates across quartiles. Fourth, some important breast cancer-related factors, particularly family history and genetic susceptibility markers, were unavailable in the present dataset and therefore could not be included in adjustment or supplementary modeling, leaving the possibility of residual confounding. Fifth, although we added a weighted NHANES analysis as a sensitivity analysis, the main findings were derived from unweighted models because the primary aim was to assess cross-sectional associations rather than generate nationally representative estimates; accordingly, generalizability to the broader U.S. population should be interpreted with caution. Finally, serum biomarkers were measured at a single time point and may not reflect longer-term exposure patterns.

Overall, the present study provides supplementary biomarker-based evidence on the concurrent serum patterns of vitamins and carotenoids in relation to self-reported breast cancer status within a standardized survey framework. These findings enrich the existing literature by allowing comparison of multiple analytes in the same dataset, but should be interpreted cautiously given the cross-sectional design and the possibility of reverse causality.


Conclusions

In this cross-sectional analysis of women in NHANES 2017–2018, higher serum concentrations of alpha-cryptoxanthin, alpha-carotene, beta-cryptoxanthin, and lutein and zeaxanthin showed inverse associations with self-reported breast cancer status after covariate adjustment, among which alpha-cryptoxanthin showed the most consistent association. Restricted cubic spline analysis and sensitivity analyses provided additional support for selected findings, and supplementary machine learning analyses suggested that these biomarkers and covariates jointly carried discriminative information related to breast cancer status. However, given the cross-sectional design, self-reported outcome, potential reverse causality, residual confounding, and attenuation of some associations in the weighted analysis, these findings should be interpreted cautiously. Further prospective studies with clearer temporal information and more comprehensive covariate assessment are needed to clarify whether these serum patterns reflect etiologically relevant exposures, disease-related changes, or both.


Acknowledgments

None.


Footnote

Reporting Checklist: The authors have completed the STROBE reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2276/rc

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2276/prf

Funding: None.

Conflicts of Interest: Both authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2025-aw-2276/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Kim J, Harper A, McCormack V, et al. Global patterns and trends in breast cancer incidence and mortality across 185 countries. Nat Med 2025;31:1154-62. [Crossref] [PubMed]
  2. Gu H, Wang R, Beeraka NM, et al. Global burden and trends of breast cancer: GLOBOCAN 2022 estimates of incidence and mortality in 185 countries. Chin Med J (Engl) 2026;139:404-14. [Crossref] [PubMed]
  3. Forma A, Grunwald A, Zembala P, et al. Micronutrient Status and Breast Cancer: A Narrative Review. Int J Mol Sci 2024;25:4968. [Crossref] [PubMed]
  4. Zhang T, Yi X, Li J, et al. Vitamin E intake and multiple health outcomes: an umbrella review. Front Public Health 2023;11:1035674. [Crossref] [PubMed]
  5. Chen Z, Huang Y, Cao D, et al. Vitamin C Intake and Cancers: An Umbrella Review. Front Nutr 2021;8:812394. [Crossref] [PubMed]
  6. Didier AJ, Stiene J, Fang L, et al. Antioxidant and Anti-Tumor Effects of Dietary Vitamins A, C, and E. Antioxidants (Basel) 2023;12:632. [Crossref] [PubMed]
  7. Han X, Zhao R, Wang Y, et al. Dietary Vitamin A Intake and Circulating Vitamin A Concentrations and the Risk of Three Common Cancers in Women: A Meta-Analysis. Oxid Med Cell Longev 2022;2022:7686405. [Crossref] [PubMed]
  8. Sui J, Guo J, Pan D, et al. The Efficacy of Dietary Intake, Supplementation, and Blood Concentrations of Carotenoids in Cancer Prevention: Insights from an Umbrella Meta-Analysis. Foods 2024;13:1321. [Crossref] [PubMed]
  9. Flore G, Deledda A, Lombardo M, et al. Effects of Functional and Nutraceutical Foods in the Context of the Mediterranean Diet in Patients Diagnosed with Breast Cancer. Antioxidants (Basel) 2023;12:1845. [Crossref] [PubMed]
  10. Buro AW, Nguyen T, Abaskaron M, et al. Lifestyle interventions with dietary strategies after breast cancer diagnosis: a systematic review. Breast Cancer Res Treat 2024;206:1-18. [Crossref] [PubMed]
  11. Dehnavi MK, Ebrahimpour-Koujan S, Lotfi K, et al. The Association between Circulating Carotenoids and Risk of Breast Cancer: A Systematic Review and Dose-Response Meta-Analysis of Prospective Studies. Adv Nutr 2024;15:100135. [Crossref] [PubMed]
  12. Park SH, Lee J, Jung SY, et al. Association between dietary carotenoid intake and breast cancer risk: a case-control study among Korean women. Int J Food Sci Nutr 2024;75:496-508. [Crossref] [PubMed]
  13. Darouei B, Bohn T, Vahid F, et al. Dietary carotenoids and breast cancer risk: evidence from a large population-based incident case-control study. Nutr Metab (Lond) 2025;22:107. [Crossref] [PubMed]
  14. Pérez-Guerrero EE, Guillén-Medina MR, Márquez-Sandoval F, et al. Methodological and Statistical Considerations for Cross-Sectional, Case-Control, and Cohort Studies. J Clin Med 2024;13:4005. [Crossref] [PubMed]
  15. Johnson AF, Lamontagne N, Bhupathiraju SN, et al. Workshop summary: building an NHANES for the future. Am J Clin Nutr 2024;119:1075-81. [Crossref] [PubMed]
  16. Medina-García M, Baeza-Morales A, Martínez-Peinado P, et al. Carotenoids and Their Interaction with the Immune System. Antioxidants (Basel) 2025;14:1111. [Crossref] [PubMed]
  17. Perveen R, Suleria HA, Anjum FM, et al. Tomato (Solanum lycopersicum) Carotenoids and Lycopenes Chemistry; Metabolism, Absorption, Nutrition, and Allied Health Claims--A Comprehensive Review. Crit Rev Food Sci Nutr 2015;55:919-29. [Crossref] [PubMed]
  18. Baeza-Morales A, Medina-García M, Martínez-Peinado P, et al. The Antitumour Mechanisms of Carotenoids: A Comprehensive Review. Antioxidants (Basel) 2024;13:1060. [Crossref] [PubMed]
  19. Peng C, Zeleznik OA, Shutta KH, et al. A Metabolomics Analysis of Circulating Carotenoids and Breast Cancer Risk. Cancer Epidemiol Biomarkers Prev 2022;31:85-96. [Crossref] [PubMed]
  20. Fahey JW, Stephenson KK, Dinkova-Kostova AT, et al. Chlorophyll, chlorophyllin and related tetrapyrroles are significant inducers of mammalian phase 2 cytoprotective genes. Carcinogenesis 2005;26:1247-55. [Crossref] [PubMed]
  21. Ollberding NJ, Maskarinec G, Conroy SM, et al. Prediagnostic circulating carotenoid levels and the risk of non-Hodgkin lymphoma: the Multiethnic Cohort. Blood 2012;119:5817-23. [Crossref] [PubMed]
  22. Chang S, Erdman JW Jr, Clinton SK, et al. Relationship between plasma carotenoids and prostate cancer. Nutr Cancer 2005;53:127-34. [Crossref] [PubMed]
  23. Guo P, Yu J. Association of multiple serum minerals and vitamins with metabolic dysfunction-associated fatty liver disease in US adults: National Health and Nutrition Examination Survey 2017-2018. Front Nutr 2024;11:1335831. [Crossref] [PubMed]
  24. Tang L, Liu M, Mu J, et al. Association between circulating antioxidants and sleep disorders: comprehensive results from NHANES 2017-2018. Food Funct 2024;15:6657-72. [Crossref] [PubMed]
  25. Huang Q, Peng Z, Li S, et al. Association between carotenoids and the prevalence of chronic obstructive pulmonary disease in the United States. Heart Lung 2024;65:93-100. [Crossref] [PubMed]
  26. Walston J, Xue Q, Semba RD, et al. Serum antioxidants, inflammation, and total mortality in older women. Am J Epidemiol 2006;163:18-26. [Crossref] [PubMed]
  27. Goodman MT, Kiviat N, McDuffie K, et al. The association of plasma micronutrients with the risk of cervical dysplasia in Hawaii. Cancer Epidemiol Biomarkers Prev 1998;7:537-44.
  28. Sanlier N, Yildiz E, Ozler E. An Overview on the Effects of Some Carotenoids on Health: Lutein and Zeaxanthin. Curr Nutr Rep 2024;13:828-44. [Crossref] [PubMed]
  29. Kavalappa YP, Gopal SS, Ponesakki G. Lutein inhibits breast cancer cell growth by suppressing antioxidant and cell survival signals and induces apoptosis. J Cell Physiol 2021;236:1798-809. [Crossref] [PubMed]
  30. Eliassen AH, Hendrickson SJ, Brinton LA, et al. Circulating carotenoids and risk of breast cancer: pooled analysis of eight prospective studies. J Natl Cancer Inst 2012;104:1905-16. [Crossref] [PubMed]
  31. Hu F, Wang Yi B, Zhang W, et al. Carotenoids and breast cancer risk: a meta-analysis and meta-regression. Breast Cancer Res Treat 2012;131:239-53. [Crossref] [PubMed]
  32. Wang Y, Gapstur SM, Gaudet MM, et al. Plasma carotenoids and breast cancer risk in the Cancer Prevention Study II Nutrition Cohort. Cancer Causes Control 2015;26:1233-44. [Crossref] [PubMed]
  33. Ávila-Román J, García-Gil S, Rodríguez-Luna A, et al. Anti-Inflammatory and Anticancer Effects of Microalgal Carotenoids. Mar Drugs 2021;19:531. [Crossref] [PubMed]
  34. Pouchieu C, Galan P, Ducros V, et al. Plasma carotenoids and retinol and overall and breast cancer risk: a nested case-control study. Nutr Cancer 2014;66:980-8. [Crossref] [PubMed]
  35. Bae JM. Reinterpretation of the results of a pooled analysis of dietary carotenoid intake and breast cancer risk by using the interval collapsing method. Epidemiol Health 2016;38:e2016024. [Crossref] [PubMed]
  36. Huang JP, Zhang M, Holman CD, et al. Dietary carotenoids and risk of breast cancer in Chinese women. Asia Pac J Clin Nutr 2007;16:437-42.
  37. Lu MS, Fang YJ, Chen YM, et al. Higher intake of carotenoid is associated with a lower risk of colorectal cancer in Chinese adults: a case-control study. Eur J Nutr 2015;54:619-28. [Crossref] [PubMed]
  38. Schiff MA, Patterson RE, Baumgartner RN, et al. Serum carotenoids and risk of cervical intraepithelial neoplasia in Southwestern American Indian women. Cancer Epidemiol Biomarkers Prev 2001;10:1219-22.
Cite this article as: Hu B, Liang J. Association of serum vitamins and carotenoids with breast cancer status among adult women in NHANES 2017–2018: a cross-sectional study. Transl Cancer Res 2026;15(5):402. doi: 10.21037/tcr-2025-aw-2276

Download Citation