Safety net hospital risk model demonstrates stronger, population-specific applicability in characterizing lung cancer risk
Original Article

Safety net hospital risk model demonstrates stronger, population-specific applicability in characterizing lung cancer risk

Adriana A. Rodriguez Alvarez1 ORCID logo, Benjamin Crosby1, Sarah Singh2, Janice Weinberg3, Nicole Byrne1, Aniket Vazirani4, Kei Suzuki5,6 ORCID logo

1Department of Clinical Research, Boston University Chobanian and Avedisian School of Medicine, Boston, MA, USA; 2Department of Surgery, University of California Davis, Sacramento, CA, USA; 3Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; 4Department of Surgery, Boston Medical Center, Boston, MA, USA; 5Department of Thoracic Surgery, Inova Fairfax Medical Campus, Falls Church, VA, USA; 6Clinical Administration, Inova Fairfax Medical Campus, Falls Church, VA, USA

Contributions: (I) Conception and design: J Weinberg, N Byrne, AA Rodriguez Alvarez; (II) Administrative support: K Suzuki, A Vazirani, N Byrne; (III) Provision of study materials or patients: K Suzuki, B Crosby, S Singh; (IV) Collection and assembly of data: B Crosby, S Singh; (V) Data analysis and interpretation: N Byrne, J Weinberg, AA Rodriguez Alvarez; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Kei Suzuki, MD, MS. Department of Thoracic Surgery, Inova Fairfax Medical Campus, Falls Church, VA, USA; Clinical Administration, Inova Fairfax Medical Campus, 8095 Innovation Park Dr., Building B, 3rd Floor, Falls Church, VA 22031, USA. Email: Kei.suzuki@inova.org.

Background: Determining lung cancer (LC) risk using personalized risk stratification may improve screening effectiveness. While the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) is a well-established stratification model for LC screening, it was derived from a predominantly Caucasian population and its effectiveness in a safety net hospital (SNH) population is unknown. We have developed a model more tailored to the SNH population and compared its performance to the PLCO model in a SNH setting.

Methods: Retrospective dataset was compiled from patients screened for LC at SNH from 2015 to 2019. Descriptive statistics were calculated using the following variables: age, sex, race, education, body mass index (BMI), smoking history, personal cancer history, family LC history, chronic obstructive pulmonary disease (COPD), and emphysema. Variables distribution was compared using t- and chi-square tests. LC risk scores were calculated using SNH and PLCO models and categorized as low (scores <0.65%), moderate (0.65–1.49%), and high (>1.5%). Linear regression was applied to evaluate the relationship between models and covariates.

Results: Of 896 individuals, 38 were diagnosed with LC. Data reflected the SNH patient demographics, which predominantly were African American (53.5%), current smokers (69.9%), and with emphysema (70.1%). Among the non-LC cohort, SNH model most frequently categorized patients as low risk, while PLCO model most frequently classified patients as moderate risk. Among the LC cohort, there was no significant difference between mean scores or risk stratification. SNH model showed 92.1% sensitivity and 96.8% specificity while PLCO model showed 89.4% sensitivity and 26.1% specificity. Emphysema demonstrated a strong association in SNH model (P<0.001) while race showed no relation.

Conclusions: SNH model demonstrated greater specificity for characterizing LC risk in a SNH population. The results demonstrated the importance of study sample representation when identifying risk factors in a stratification model.

Keywords: Diversity population; emphysema; low-dose computed tomography (LDCT); Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial model (PLCO model); safety net hospital model (SNH model)


Submitted Dec 15, 2023. Accepted for publication Mar 14, 2024. Published online Apr 12, 2024.

doi: 10.21037/tcr-23-2304


Highlight box

Key findings

• The significance of study sample representation in identifying risk factors within a stratification model for enhanced generalizability.

What is known and what is new?

• Although the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) serves as a reputable stratification model for lung cancer (LC) screening, it was originally developed using data primarily from a Caucasian demographic.

• PLCO applicability and effectiveness within the context of a safety net hospital (SNH) population remain uncertain. SNH model demonstrated greater specificity for characterizing LC risk in a SNH population.

What is the implication, and what should change now?

• This study alludes to the influence of study sample representation when identifying risk factors, and the methodology used to derive the SNH model may offer guidance for other SNHs to improve their own LC risk prediction accuracy. Future studies should assess the impact of these models on a larger sample in those who have LC in a wider and geographically different population.


Introduction

Lung cancer (LC) is the leading cause of cancer deaths globally (1). Specifically, the United States (US) has a disproportionate impact on African American individuals, leading to higher occurrence, diagnosis at advanced stages, and lower survival rates (2). The National Lung Screening Trial (NLST) has shown that low-dose computed tomography (LDCT) as a screening tool for LC has significantly improved detection and survival rates (2-5). However, the study cohort consisted of 91% White and 4.5% African American participants (2). In addition, by current screening criteria, only 50% of those who will develop LC are currently eligible for LDCT monitoring (5). The United States Preventive Services Task Force (USPSTF) offers recommendations for identifying individuals at risk of LC, advocating for LDCT screening. Their criteria for LC screening appear to disproportionately favor whites and the male population (2,6,7). Neglecting to consider racial and sex differences in LC risk can lead to inadequate screening for minorities, such as African Americans, and females (2,6,7). A systematic review in 2012 revealed that approximately 20% of individuals undergoing each screening round had positive results requiring follow-up, and only one percent of the study population had LC (8). This highlights the potential need for additional markers of LC risk to improve eligibility criteria.

Furthermore, other challenges associated with LDCT LC screening include its high costs, patient stress associated with false-positive results, and excessive radiation exposure among patients with minimal LC risk. The refinement of screening specificity may help to reduce these concerns (1,8). Therefore, accurate LC risk prediction models would be more cost-effective than NLST-like criteria and more sensitive in identifying high-risk individuals who acquire LC. Hence, this process will require fewer tests to prevent LC mortality (1,9-12).

Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial m2012 (PLCOm2012) is a well-established risk prediction model that estimates 6-year LC risk using various risk factors and showed promise after demonstrating an increase in sensitivity, specificity, and positive predictive value for risk characterization (3,11). In 2013, PLCO risk factors were expanded using data from the NLST to include race/ethnicity, educational attainment, body mass index (BMI), and history of emphysema (10,13). Although an expansion of LC risk factors demonstrated improved specificity in detecting LC, there is a potential limitation of its applicability with consideration that NLST patient demographics differ significantly from those of safety net hospitals (SNHs) across such as race/ethnicity, educational attainment, BMI, and incidence of chronic diseases such as emphysema. A retrospective study in 2022 showed that PLCO has a greater sensitivity in predicting LC among African American population, women, and men compared to USPSTF. However, this study has been conducted in populations already diagnosed with LC (7). Further research is needed to enhance sensitivity in identifying individuals at a higher risk of LC for effective LC screening.

Although several LC risk prediction models have been created using nationally representative study cohorts, external validation and direct comparisons between models have been limited due to a lack of data or methodological restrictions (10,13-21). As such, it remains unclear how these preexisting models perform in diverse safety-net institutions or the general US population (13). In anticipation of limitations in applying the PLCO model to SNH patients, a tailored LC risk model was developed and recently published (SNH model) (13). This study determined the strongest predictors for Lung Imaging Reporting and Data System (Lung-RADS) one, Lung-RADS four, and LC within their diverse and underrepresented screening population. Additionally, supported the inclusion of additional risk factors like COPD in screening criteria and advocated for expanding the USPSTF guidelines to include younger patients with fewer pack years (13).

This current study aims to compare the performance of the PLCO and SNH models among the safety-net patients at our institution. Both in terms of overall detection and the influence of specific patient risk factors, with the objective of assessing the benefits and limitations of generalizing risk models across hospital populations with differing patient demographics. We present this article in accordance with the STROBE reporting checklist (available at https://tcr.amegroups.com/article/view/10.21037/tcr-23-2304/rc).


Methods

Study design and subject population

This retrospective cross-sectional study collected and examined de-identified data from the Boston Medical Center (BMC) database of patients who received LC screening with LDCT between 2015 to 2019. We used this database to apply both PLCO and SNH models. The SNH model has been exclusively implemented within the specific demographic of the BMC safety-net patient sample, while the PLCO model has been used throughout the literature. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Institutional Review Board (IRB), Integrated Network for Subject Protection in Research II (INSPIR II) (H-35216, approved August 3rd, 2021), and individual consent for this retrospective analysis was waived.

PLCOm2012 risk assessment was calculated using the following variables: age, race/ethnicity, BMI, smoking status, smoking history, history of chronic obstructive pulmonary disease (COPD), emphysema/chronic bronchitis, personal history of cancer, family history of LC, and the highest degree of educational attainment (10,13). Through previous work using the same population, we established the following variables for the SNH model: age, history of COPD, history of emphysema and its severity (mild/moderate/severe), family history of LC, and pack-year history (13). Diagnosis of emphysema was confirmed from chest computed tomography (CT) scan readouts.

Defining risk classification for the probability models (SNH and PLCO models)

The SNH and PLCO models were applied to the BMC dataset to obtain an overall risk score for each patient per model. Patients were then stratified into the following risk classifications: low risk was defined as scores <0.65%, moderate risk as 0.65–1.49%, and high risk as >1.5% (11). This risk classification breakdown mirrors parameters utilized by PLCO authorship, as specifically referenced from the PLCOm2012 Lung Cancer Risk Calculator (LCRC) app (11,22). This score yielded the probability of LC in 6 years (11,13).

Statistical analysis

The study cohort was stratified by whether patients were diagnosed with LC. Descriptive analysis used demographic variables and covariates to examine the patient makeup of the sample. Student t-test and Pearson chi-square test were used to compare the distributions for continuous and categorical variables.

The primary objective of the study was to compare the performance of the PLCO and the SNH models. This was accomplished by evaluating the models’ scores within each group (LC and non-LC) using descriptive statistics. Student t-test was run on the mean score to determine whether the models differed in each group. Following score submission, the risk was then classified as either low, moderate, or high. The risk classification was described by using counts (total number of scores in each risk) and their percentage within each model for both groups.

Per risk classification, data points for sensitivity, positive and negative predictive value, and specificity were gathered to determine which model was best tailored to the population under review. Sensitivity was defined as the probability of being in the moderate or high-risk group among patients with LC. Similarly, specificity was interpreted as the probability of being in the low-risk group among patients without LC. Positive predictive value defined the probability that a patient who has moderate or high-risk classification actually has LC, while negative predictive value was interpreted as the probability that a subject with low risk actually does not have LC.

Univariate linear and logistic regression were conducted to evaluate the association between the differences among the SNH and PLCO models and the individual covariates. Multivariate regression was applied to control for confounders (age, sex, race, BMI, education, emphysema, COPD, personal history of LC, family history of LC, smoking status, and pack-year). In this analysis, the coefficient determined the increase and decrease of the difference between the models (SNH and PLCO). A positive coefficient was defined as the increase in the difference between the models. Hence, the variable that obtained a positive coefficient demonstrated the association it had in the SNH model, but not with the PLCO model. On the other hand, a negative coefficient was defined as the decrease in difference; therefore, variables with these results had the same association of influence to predict the risk of LC within the models. For all the analyses, results were considered statistically significant if P<0.05. All statistical analyses were performed using R studio version 1.3.1093 (RStudio, PBC).


Results

Study cohort

In total, 3,055 individuals with LDCT were identified from the BMC database. Of these, 896 had complete data regarding age, sex, race, emphysema, COPD, pack years, and smoking status, and were included in the analysis. LC was diagnosed in 38 of these participants.

Most BMC patients were males (59.0%), graduated from high school/obtained a general education development (GED) (42.9%), identified as African American (53.5%), were overweight (BMI, 28.7 kg/m2), and were current everyday smokers (59.2%) with mild emphysema (58.6%). In addition, 54.1% of the patients had COPD, 17.5% had a personal history of cancer, 12.7% had a family history of LC, and had a mean smoking pack-year history of 31.4 (Table 1).

Table 1

Demographic characteristics of BMC population

Demographic variables Overall (n=896) LC diagnosis (n=38) Non-LC (n=858) P
Age (years) 63.3 (6.3) 66.6 (6.5) 63.2 (6.3) 0.003*
Sex 0.15
   Female 367 (41.0) 20 (52.6) 347 (40.4)
   Male 529 (59.0) 18 (47.4) 511 (59.6)
Education 0.24
   Did not attend school 26 (2.9) 5 (13.2) 21 (2.4)
   8th grade or less 43 (4.8) 43 (5.0)
   Some high school 250 (27.9) 9 (23.7) 241 (28.1)
   Graduated high school/GED 384 (42.9) 17 (44.7) 367 (42.8)
   Some college/vocational technical education 101 (11.3) 4 (10.5) 97 (11.3)
   Graduated college/post-graduated 88 (9.8) 3 (7.9) 85 (9.9)
   Other 4 (0.4) 4 (0.5)
Race 0.50
   White 374 (41.7) 15 (39.5) 359 (41.8)
   Black/African American 479 (53.5) 19 (50.0) 460 (53.6)
   Hispanic or Latino 20 (2.2) 3 (7.9) 17 (2.0)
   Asian 22 (2.5) 1 (2.6) 21 (2.4)
   Middle Eastern 1 (0.1) 1 (0.1)
BMI mean (kg/m2) 28.7 (6.7) 28.47 (5.3) 28.66 (6.8) 0.80
Smoking status 0.16
   Never smoker 35 (3.9) 35 (4.1)
   Former smoker 235 (26.2) 9 (23.7) 226 (26.3)
   Current some day smoker 96 (10.7) 3 (7.9) 93 (10.8)
   Current every day smoker 530 (59.2) 26 (68.4) 504 (58.7)
Pack years 31.4 (24.0) 44.35 (28.6) 30.6 (24.5) 0.005*
Emphysema 0.013*
   None 268 (29.9) 3 (7.9) 265 (30.9)
   Mild 525 (58.6) 30 (78.9) 495 (57.7)
   Moderate 61 (6.8) 1 (2.6) 60 (7.0)
   Severe 42 (4.7) 4 (10.5) 38 (4.4)
COPD 485 (54.1) 30 (78.9) 455 (53.0) <0.001*
Personal history of cancer 157 (17.5) 7 (18.4) 150 (17.5) 0.88
Family history of LC 114 (12.7) 8 (21.1) 106 (12.4) 0.20
Lung-RADS categories <0.001*
   Category 1 152 (17.0) 0 (0.0) 152 (17.7)
   Category 2 574 (64.1) 9 (23.7) 565 (65.9)
   Category 3 102 (11.4) 4 (10.5) 98 (11.4)
   Category 4 66 (7.4) 25 (65.8) 41 (4.8)

Data was reported as n (%) in categorical variables and mean (SD) for continuous variables; percentages may not add up to 100 due to rounding. *, P value is statistically significant if P<0.05. The percentages within the non-LC group may not sum up to 100% due to the exclusion of two subjects who did not display the Lung-RADS category. Lung-RADS Categories 4A, 4B, and 4X have been consolidated into a single category (Category 4) to facilitate more streamlined analysis. BMC, Boston Medical Center; LC, lung cancer; GED, general education development; BMI, body mass index; COPD, chronic obstructive pulmonary disease; SD, standard deviation; Lung-RADS, Lung Imaging Reporting and Data System.

Patients in the LC group were predominantly female (52.6%), while the non-LC group were predominantly male (59.5%). Additionally, LC subjects were also older (mean age 66.6 vs. 63.2 years, P=0.003), had more pack-years (P=0.005), more severe emphysema (P=0.013), and had a diagnosis of COPD (P<0.001) relative to the non-LC group. There was no difference in education (P=0.24), race (P=0.50), BMI (P=0.80), smoking status (P=0.16), personal history of cancer (P=0.88), or family history of LC (P=0.20) between the LC and non-LC groups. Within the LC group, Category 4 predominated. Conversely, within the non-LC group, Lung-RADS Category 2 emerged as the most prevalent, followed by Category 1 (P<0.001) (Table 1).

Probability models (PLCO vs. SNH)

The SNH model had a broader score range than the PLCO model. Those with LC had a higher mean score (7.2%) compared to the non-LC group (0.02%) in the SNH model. Between models, SNH yielded a higher mean score among LC patients (SNH 7.2% vs. PLCO 6.6%, P=0.82) and a lower mean score among non-LC patients (SNH 0.02% vs. PLCO 0.03%, P<0.001) (Table 2).

Table 2

LC vs. non-LC risk score comparison between probability models

Variables LC Non-LC
PLCO model SNH model P PLCO model SNH model P
Risk score
   Minimum (%) 0.13 0.12 0 0.06
   Maximum (%) 22 75 39 40
   Mean score (SD) (%) 6.6 (0.06) 7.2 (0.15) 0.82 0.03 (0.04) 0.02 (0.02) <0.001*
Risk classification count 0.19 <0.001*
   Low risk 4 (10.50) 3 (7.80) 224 (26.10) 831 (96.00)
   Moderate risk 29 (76.31) 33 (86.80) 611 (71.21) 25 (2.90)
   High risk 5 (13.15) 2 (5.20) 23 (2.60) 2 (0.23)

Risk score reported the minimum score and maximum score as percentage of each LC probability model according and group (LC and non-LC). Additionally, the mean score and standard deviation of each LC probability model was reported. Risk classification was reported as number and percentage per probability model and group. *, P value is statistically significant if P<0.05. LC, lung cancer; PLCO, Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial; SNH, safety net hospital; SD, standard deviation.

Both probability models identified patients with LC as moderate risk, the most frequent risk classification. Among patients without LC, the SNH model yielded low risk as the most common risk classification (96.00%), whereas the majority of PLCO non-LC patients were categorized as moderate risk (71.21%). In addition, there was a statistically significant difference between each model in the risk classification (P<0.001) (Table 2). The SNH model had a specificity of 96.8%, a sensitivity of 92.1%, a positive predictive value of 56%, and a negative predictive value of 99.6%. The PLCO model had a specificity of 26.1%, a sensitivity of 89.4%, a positive predictive value of five percent and a negative predictive value of 98%.

Regression analysis summary of the difference between models (SNH-PLCO)

The LC group showed positive coefficients for age, sex, race, BMI, and emphysema; however, only emphysema was statistically significant (P<0.001) (Table 3). After adjusting for all variables, emphysema remained statistically significant (P<0.001) (Table 4). On the other hand, the presence of COPD, personal history of cancer, family history of LC, and greater smoking pack-years was associated with a reduction in the difference between the two models (Table 3). Thus, both models demonstrated a similar precision regarding LC prediction, when the individual had a high pack-year history (P=0.01). After adjusting for all variables, there was no statistically significant difference among this variable (P=0.38).

Table 3

Univariate regression analysis of the difference between models (SNH-PLCO) and the variables

Variables LC Non-LC
Coefficient (B) Std. error P Coefficient (B) Std. error P
Age 0.0011 0.0043 0.81 −0.0021 0.00018 <0.001*
Sex (male vs. female) 0.011 0.056 0.85 −0.0022 0.0025 0.38
Race (Asian vs. other races) 0.028 0.039 0.47 0.0012 0.0018 0.51
BMI 0.0082 0.0051 0.12 0.00084 0.00018 <0.001*
Education 0.02 0.021 0.35 0.0022 0.001 0.03*
Emphysema 0.15 0.03 <0.001* −0.00023 0.0017 0.89
COPD −0.076 0.067 0.27 0.00086 0.0024 0.72
Personal history of cancer −0.095 0.07 0.18 −0.02 0.0031 <0.001*
Family history of LC −0.065 0.067 0.34 0.01 0.0037 0.006*
Smoking status 0.037 0.032 0.26 −0.0026 0.0012 0.03*
Pack years −0.0022 0.00091 0.01* −0.00005 <0.0001 <0.001*
Quit time years −0.00016 0.0043 0.98 0.00027 0.00029 0.35

A positive coefficient was defined as the increase in difference between the models. Hence, the variable that obtained a positive coefficient demonstrated the association it had in the SNH model, but not with the PLCO model. On the other hand, a negative coefficient was defined as the decrease in difference; therefore, variables with these results had the same association of influence to predict the risk of LC within the models. *, P value is statistically significant if P<0.05. SNH, safety net hospital; PLCO, Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial; LC, lung cancer; std., standard; BMI, body mass index; COPD, chronic obstructive pulmonary disease.

Table 4

Multivariate regression analysis summary of the difference between models (SNH-PLCO) and the variables

Variables LC Non-LC
Coefficient (B) Std. Error P Coefficient (B) Std. error P
Age 0.00018 0.0039 0.96 −0.002 0.00018 <0.001*
Sex −0.034 0.05 0.51 −0.0014 0.0022 0.54
Race 0.038 0.039 0.34 0.0023 0.0017 0.17
BMI 0.0072 0.0048 0.15 0.00065 0.00016 <0.001*
Education 0.028 0.019 0.14 0.0013 0.00091 0.17
Emphysema 0.15 0.033 <0.001* 0.0025 0.0015 0.10
COPD −0.015 0.066 0.82 0.0037 0.0023 0.10
Personal history of cancer −0.039 0.059 0.52 −0.017 0.0029 <0.001*
Family history of LC −0.02 0.059 0.73 0.079 0.0034 0.001*
Smoking status 0.018 0.027 0.52 −0.0044 0.0011 <0.001*
Pack years −0.0008 0.0009 0.38 −6.058e−04 <0.0001 <0.001*

A positive coefficient was defined as the increase in difference between the models. Hence, the variable that obtained a positive coefficient demonstrated the association it had in the SNH model, but not with the PLCO model. On the other hand, a negative coefficient was defined as the decrease in difference; therefore, variables with these results had the same association of influence to predict the risk of LC within the models. *, P value is statistically significant if P<0.05. SNH, safety net hospital; PLCO, Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial; LC, lung cancer; std., standard; BMI, body mass index; COPD, chronic obstructive pulmonary disease.

Among the non-LC cohort, the alignment between SNH and PLCO risk scores per patient was optimized with the use of several patient variables (Table 3), with statistical significance specifically observed in the patient characteristics of age (P<0.001), personal history of cancer (P<0.001), and pack year (P<0.001). When controlling for all the variables, age (P<0.001), personal history of cancer (P<0.001), and pack year (P<0.001) maintained their statistical significance. Given the negative coefficient values derived from univariate regression analysis of these three variables, these results indicate that both models performed similarly among patients of a younger age, without a personal history of cancer, and with less smoking pack years. Whereas the difference between models increased with high BMI (P<0.001), other education (P=0.03), and family history of LC (P=0.006) (Table 3). After controlling for all the variables, BMI (P<0.001) and family history of LC (P=0.001) continued to show statistical significance with respect to an associated difference between the models (Table 4). Thus, the SNH model was more accurate to predict non-LC when the patient had low BMI and no family history of LC.


Discussion

In this study, we compared the performance of SNH LC risk prediction model to the well-established PLCO model in an SNH setting. The SNH model performed better than the PLCO model, particularly in its ability to derive low risk among non-LC patients. Although the overall results were not particularly revealing given that the SNH model was developed from the same study database used in this study, further analysis revealed that the covariables drove this difference in performance. A notable point of comparison is the marked difference in specificity values between SNH and PLCO models (96.8% and 26.1%, respectively), which reflects PLCO’s limited ability to identify low LC risk in a safety net population. Demonstration of the difference in specificity between models is similarly reflected when comparing the risk classification breakdown among patients without LC, with 96% of patients being identified as low risk and a combined 3.1% as moderate and high risk by SNH, compared to 26.1% as low risk and a combined 73.8% as moderate and high risk by PLCO.

Further analysis elucidated not only an association between demographic features of the SNH patient database and risk model variables but also the difference in SNH and PLCO model performances overall. Emphysema is significantly associated with risk assessment among those who were diagnosed with LC in the SNH model, and this same association was not observed in the PLCO model despite both risk models including emphysema as a risk factor. Emphysema is much more prevalent in the SNH patient database as compared to the NLST database that was used to derive the PLCO model (64.9% vs. 7.7%). It is important to note that this association was not seen among patients who were not diagnosed with LC and is therefore not a result of higher rate of emphysema in the SNH overall database. This suggested the strong association between emphysema and LC. There are various theories that have been postulated towards this association, including a hypothesis that the inflammation and scarring from emphysema/COPD may increase the risk of LC onset (23). Another hypothesis is the potential presence of a shared risk factor afflicting this geographical area that has contributed to both emphysema and LC risk. This fact suggests emphysema may be a surrogate marker of LC risk (23). Future research should further investigate emphysema and its role as a surrogate marker.

While emphysema was strongly associated with LC risk classification in the SNH model, patient race and ethnicity had no influence on this model. This observation is particularly interesting given that African American and American Indian race/ethnicity were deemed LC risk factors by the PLCO model, while race/ethnicity was not found to have any association with LC risk in the SNH model. This observation is mirrored by a marked difference in racial and ethnic diversity between NLST and SNH patient databases, with the NLST patient database comprising a predominantly Caucasian population (90.9% White and 4.5% Black patients), and the SNH patient database comprising a more diverse population (39.9% White and 38.2% Black patients) (13). As such, in a patient database with greater racial/ethnic diversity, the race/ethnicity of minority patient populations was not correlated with LC risk. These observations come about at a time the medical community has reckoned with the consideration of race/ethnicity as an approximation of health, as demonstrated by the American Medical Association’s 2020 policies committed to “ending the practice of using race as proxy for biology in medical education, research and clinical practice” (24), and instead prioritizing focus on the known health risk factors of racism and social determinants of health. As markers of disease risk are studied in medicine, adequate patient representation is a critical component to avoid an inflated risk assessment among populations that comprise a minority of study samples. Pasquinelli et al. also affirm the significance of employing broader prediction models in racially diverse populations to address disparities in LC screening and outcomes (2).

A strength of the study was the similarity between demographics (race, sex, and education) between the groups (LC and non-LC) to yield a more comparable sample. Another and a highly important strength, as mentioned in the previous paragraph, was the diversity in the study population constructing more generalizable results.

A study limitation was the small sample size of the LC group. This was expected given that the target population was LDCT individuals and the significant amount of missing values in the overall dataset. Moreover, our small sample size is consistent with existing literature, suggesting that a significant portion of screened individuals showed positive outcomes, with only a minimal percentage of the study population being diagnosed with LC (6). Although LDCT screening rate is on the rise (Massachusetts 9.4–18.0%), it continues to have an overall low national rate (6.0%) (4,25). Future studies should focus on increasing the study sample size to strengthen study power and have a better understanding of probability LC risk model among communities with varying patient demographics and environmental risk factors. Another limitation was using the same study population that was used to develop the SNH model, leading to selection bias. This is due to the limitation of the database restricted to only one hospital in Massachusetts. However, the SNH model demonstrated the importance of developing a tailored probability LC risk model to identify high-risk individuals in a diverse population. Lastly, patients in the non-LC group who remained undiagnosed during the study period are still susceptible to developing malignancies in the future. It’s also worth considering that the study duration and follow-up time might not fully capture the possibility of transitioning to the LC group over time. Further studies could implement this model to observe its accuracy in a different population.


Conclusions

The SNH model showed greater specificity in predicting LC risk among the SNH population than the PLCO model. SNH performance was particularly enhanced by consideration of emphysema severity while PLCO usage of race/ethnicity as a LC risk factor did not strengthen its risk characterization. Therefore, emphysema should be considered an important risk factor in probability risk models for LC in the safety net population. However, the models’ probability mean scores in the LC group did not statistically differ from each other. The results from this study allude to the influence of study sample representation when identifying risk factors, and the methodology used to derive the SNH model may offer guidance for other SNHs to improve their own LC risk prediction accuracy. Future studies should assess the impact of these models on a larger sample in those who have LC in a wider and geographically different population.


Acknowledgments

Meeting presentation: abstract oral presentation at CHEST 2022 on October 16–19, 2022, Nashville, TN, USA.

Funding: None.


Footnote

Reporting Checklist: The authors have completed the STROBE reporting checklist. Available at https://tcr.amegroups.com/article/view/10.21037/tcr-23-2304/rc

Data Sharing Statement: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-23-2304/dss

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-23-2304/prf

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-23-2304/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Institutional Review Board (IRB), Integrated Network for Subject Protection in Research II (INSPIR II) (H-35216, approved August 3rd, 2021) and individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Tammemägi MC, Ten Haaf K, Toumazis I, et al. Development and Validation of a Multivariable Lung Cancer Risk Prediction Model That Includes Low-Dose Computed Tomography Screening Results: A Secondary Analysis of Data From the National Lung Screening Trial. JAMA Netw Open 2019;2:e190204. [Crossref] [PubMed]
  2. Pasquinelli MM, Tammemägi MC, Kovitz KL, et al. Risk Prediction Model Versus United States Preventive Services Task Force Lung Cancer Screening Eligibility Criteria: Reducing Race Disparities. J Thorac Oncol 2020;15:1738-47. [Crossref] [PubMed]
  3. Lebrett MB, Balata H, Evison M, et al. Analysis of lung cancer risk model (PLCO(M2012) and LLP(v2)) performance in a community-based lung cancer screening programme. Thorax 2020;75:661-8. [Crossref] [PubMed]
  4. Richards TB, Soman A, Thomas CC, et al. Screening for Lung Cancer - 10 States, 2017. MMWR Morb Mortal Wkly Rep 2020;69:201-6. [Crossref] [PubMed]
  5. National Lung Screening Trial Research Team. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011;365:395-409. [Crossref] [PubMed]
  6. Pasquinelli MM, Tammemägi MC, Kovitz KL, et al. Addressing Sex Disparities in Lung Cancer Screening Eligibility: USPSTF vs PLCOm2012 Criteria. Chest 2022;161:248-56. [Crossref] [PubMed]
  7. Williams RM, Kareff SA, Sackstein P, et al. Race & sex disparities related to low-dose computed tomography lung cancer screening eligibility criteria: A lung cancer cases review. Lung Cancer 2022;169:55-60. [Crossref] [PubMed]
  8. Bach PB, Mirkin JN, Oliver TK, et al. Benefits and harms of CT screening for lung cancer: a systematic review. JAMA 2012;307:2418-29. [Crossref] [PubMed]
  9. Kovalchik SA, Tammemagi M, Berg CD, et al. Targeting of low-dose CT screening according to the risk of lung-cancer death. N Engl J Med 2013;369:245-54. [Crossref] [PubMed]
  10. Tammemägi MC, Katki HA, Hocking WG, et al. Selection criteria for lung-cancer screening. N Engl J Med 2013;368:728-36. [Crossref] [PubMed]
  11. Tammemägi MC, Church TR, Hocking WG, et al. Evaluation of the lung cancer risks at which to screen ever- and never-smokers: screening rules applied to the PLCO and NLST cohorts. PLoS Med 2014;11:e1001764. [Crossref] [PubMed]
  12. Cressman S, Peacock SJ, Tammemägi MC, et al. The Cost-Effectiveness of High-Risk Lung Cancer Screening and Drivers of Program Efficiency. J Thorac Oncol 2017;12:1210-22. [Crossref] [PubMed]
  13. Singh S, Pavesi F, Steiling K, et al. Risk Factors for Lung Cancer in an Underrepresented Safety-Net Screening Cohort. Clin Lung Cancer 2022;23:e165-70. [Crossref] [PubMed]
  14. Ten Haaf K, Jeon J, Tammemägi MC, et al. Risk prediction models for selection of lung cancer screening candidates: A retrospective validation study. PLoS Med 2017;14:e1002277. [Crossref] [PubMed]
  15. Collins GS, Moons KG. Comparing risk prediction models. BMJ 2012;344:e3186. [Crossref] [PubMed]
  16. Altman DG, Vergouwe Y, Royston P, et al. Prognosis and prognostic research: validating a prognostic model. BMJ 2009;338:b605. [Crossref] [PubMed]
  17. Siontis GC, Tzoulaki I, Siontis KC, et al. Comparisons of established risk prediction models for cardiovascular disease: systematic review. BMJ 2012;344:e3318. [Crossref] [PubMed]
  18. Vergouwe Y, Steyerberg EW, Eijkemans MJ, et al. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol 2005;58:475-83. [Crossref] [PubMed]
  19. Collins GS, Ogundimu EO, Altman DG. Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med 2016;35:214-26. [Crossref] [PubMed]
  20. D'Amelio AM Jr, Cassidy A, Asomaning K, et al. Comparison of discriminatory power and accuracy of three lung cancer risk models. Br J Cancer 2010;103:423-9. [Crossref] [PubMed]
  21. Li K, Hüsing A, Sookthai D, et al. Selecting High-Risk Individuals for Lung Cancer Screening: A Prospective Evaluation of Existing Risk Models and Eligibility Criteria in the German EPIC Cohort. Cancer Prev Res (Phila) 2015;8:777-85. [Crossref] [PubMed]
  22. Tammemägi MC, Borondy-Kitts A. PLCOm2012 LCRC. Available online: https://apps.apple.com/us/app/plcom2012-lcrc/id1553760241
  23. Brenner DR, McLaughlin JR, Hung RJ. Previous lung diseases and lung cancer risk: a systematic review and meta-analysis. PLoS One 2011;6:e17479. [Crossref] [PubMed]
  24. American Medical Association. The AMA’s strategic plan to embed racial justice and advance health equity. 2022. Accessed September 29, 2022. Available online: https://www.ama-assn.org/about/leadership/ama-s-strategic-plan-embed-racial-justice-and-advance-health-equity?gclid=CjwKCAjw4c-ZBhAEEiwAZ105ReaJ9Oczz22kwvKrkZrpTl4uyWKDOo2b1Wrrb6Ecmen5h9PiXqy4phoC3REQAvD_BwE
  25. Lung.org. Massachusetts Ranks Among Best States for Screening, Early-Stage Diagnosis, and Surgery for Lung Cancer According to New Report. 2022. Accessed 30 March 2022. Available online: https://www.lung.org/media/press-releases/solc-ma-2022
Cite this article as: Rodriguez Alvarez AA, Crosby B, Singh S, Weinberg J, Byrne N, Vazirani A, Suzuki K. Safety net hospital risk model demonstrates stronger, population-specific applicability in characterizing lung cancer risk. Transl Cancer Res 2024;13(4):1596-1605. doi: 10.21037/tcr-23-2304

Download Citation