Translation is not exportation: why European Prospective Investigation into Cancer and Nutrition rewrites the 4Kscore story
Blood-based prostate cancer biomarkers are typically celebrated in a clinically meaningful—yet statistically forgiving—context: men already being considered for prostate biopsy. In that enriched setting, disease prevalence is higher, diagnostic verification is imminent, and the translational value is straightforward—reducing unnecessary biopsies while maintaining detection of clinically significant disease. The 4Kscore (OPKO Diagnostics), which integrates four kallikrein markers [total prostate-specific antigen (PSA), free PSA, intact PSA, and human kallikrein-related peptidase 2 (hK2)] with age and clinical variables as originally intended, was developed explicitly for this short-horizon decision problem. Used as a reflex test following an elevated PSA result, it addresses a practical question: not “who might develop prostate cancer someday?” but “who requires a biopsy now?” In biopsy-referral cohorts, its discriminatory performance has been strong; a large meta-analysis reported an area under the receiver operating characteristic curve (AUC) of approximately 0.81 for high-grade prostate cancer (1).
Against this backdrop, the analysis by Smith-Byrne et al. within the European Prospective Investigation into Cancer and Nutrition (EPIC) represents more than another validation study (2). Rather, it constitutes a deliberate stress test of mission creep—an exercise that is both uncomfortable and highly valuable for translation. Its significance lies in compelling the field to confront this mission creep with empirical evidence rather than optimism.
EPIC addresses a more demanding question than most commercial algorithms were designed to answer: what happens when a biopsy-optimized model is moved upstream into an unselected population and repurposed for long-horizon prediction of aggressive outcomes years before diagnosis, under heterogeneous screening and referral pathways? Using baseline blood samples from 1,658 prostate cancer cases and 1,658 matched controls across five European countries, with a median interval of 8.6 years between blood draw and diagnosis, the investigators compared the performance of the 4Kscore with that of a simpler model incorporating total PSA and age. The principal findings are notable for what they do not demonstrate. The 4Kscore did not outperform total PSA for predicting high-grade disease [AUC, 0.69 vs. 0.75; difference, −0.06; 95% confidence interval (CI): −0.09 to −0.03], advanced-stage disease (AUC, 0.71 vs. 0.77; difference, −0.06; 95% CI: −0.10 to −0.02), or aggressive prostate cancer, and it performed significantly worse for overall prostate cancer (AUC, 0.75 vs. 0.82; difference, −0.07; 95% CI: −0.09 to −0.06) (2). In a field long attracted to incremental “PSA-plus” advances, these results establish important boundaries. They challenge a tacit assumption that often drives mission creep in biomarker deployment: that success as a biopsy-referral triage tool necessarily translates into value as a population-level screening enhancer, or that biomarker performance is a stable property transferable across time horizons, populations, and clinical pathways. The EPIC findings suggest the opposite. Biomarker performance is emergent, contingent on who is tested, when testing occurs, how outcomes are verified, and which clinical decisions the test is intended to inform.
The most consequential translational interpretation, therefore, is not that the “4Kscore fails”, but that “transportability fails”. Underperformance in a new setting does not necessarily refute kallikrein biology; rather, it more plausibly reflects a mismatch between the biological signal and the mapping that translates that signal into actionable risk. Four-kallikrein models were tuned to estimate the probability of detecting Gleason grade ≥7 prostate cancer at biopsy among men with elevated PSA and clinical suspicion—an endpoint defined within a specific and relatively constrained verification regime. EPIC simultaneously alters the decision problem along three dimensions. First, the target shifts from biopsy-detected disease to long-horizon risk of aggressive, advanced-stage, or fatal outcomes. Second, the population shifts from biopsy-referred men to broadly healthy individuals at baseline, many of whom fall within a low-PSA regime (median PSA among controls approximately 0.81 ng/mL). Third, outcome verification becomes heterogeneous across countries and over time, shaped by variation in screening intensity, access to care, clinician thresholds, evolving guideline recommendations, and the gradual adoption of magnetic resonance imaging (MRI)-based triage. A model trained to extract incremental information in the “PSA already elevated” region is being asked to discriminate risk years earlier, where the biological signal is plausibly weaker and interindividual physiological variation is plausibly stronger. In this regime, limited incremental value is not paradoxical but predictable. Additional kallikrein markers, such as intact PSA and hK2, may contribute little beyond total PSA, either because the variance they capture is not tightly coupled to aggressive carcinogenesis at early stages or because the early malignant signal is obscured by competing benign processes and the stochasticity of prolonged preclinical phases. It should also be acknowledged that the observed results may partly reflect a genuine biological ceiling: additional kallikrein markers may carry inherently limited incremental prognostic information for long-horizon aggressive outcomes, independent of modeling or recalibration considerations. What EPIC ultimately challenges is not kallikrein biology itself, but the assumption that a deployed mapping from biomarkers to clinical decisions remains valid when the underlying clinical question has fundamentally changed.
This distinction between biomarker biology and model transportability has direct implications for future research and clinical translation. If the underlying biology is invalid, the biomarker should be abandoned. If the mapping is misspecified for a new time horizon or clinical pathway, the appropriate response is recalibration or redevelopment for the intended use case. Although EPIC, by design, cannot fully disentangle these alternatives, its findings strongly support the transportability hypothesis. These results underscore that performance in a biopsy setting should not be treated as a surrogate for performance in population screening, and that the translational burden lies in re-establishing evidence when the clinical horizon and pathway change.
EPIC also foregrounds a second translational failure mode that is often obscured in methods sections: pathway-dependent verification. In biopsy cohorts, ascertainment is relatively standardized and temporally proximate—men undergo biopsy, and outcomes are observed with minimal delay through a fairly uniform mechanism. In contrast, in population cohorts, endpoints are observed only when men enter the diagnostic pathway, and that entry is nonrandom. It depends on screening intensity, healthcare access, patient preferences, clinician thresholds, and the adoption of MRI-integrated workflows. These pathway differences can distort apparent test performance because observed cases are filtered through a heterogeneous detection process. A biomarker that effectively refines biopsy selection within a standardized pathway may not perform similarly when the pathway itself is the dominant source of heterogeneity in who is diagnosed, when diagnosis occurs, and at what stage. Translational readers should therefore interpret long-horizon discrimination metrics from population cohorts as informative but not self-sufficient; such metrics reflect how a model ranks risk within a particular historical detection ecosystem, not how it would function within a contemporary, protocolized screening algorithm. Translation occurs within pathways, not within receiver operating characteristic curves.
Several practical constraints further clarify what EPIC does—and does not—test. Key clinical variables integral to the commercial 4Kscore—including digital rectal examination findings, prior biopsy history, family history, and, depending on implementation, ethnicity (1,3)—were unavailable. This forced evaluation of the model outside its intended clinical context; reported AUCs may therefore represent a lower bound rather than the upper limit of discriminative performance. Pre-analytical factors may also attenuate performance: EPIC used citrated plasma rather than serum, and concordance for intact PSA in a small validation subset was modest (r=0.74; n=25). Additionally, tumor stage was available for only 62% of cases and Gleason grade for 84%, introducing potential misclassification for stage-specific endpoints (2). These limitations do not invalidate EPIC’s findings; rather, they define their scope. The study is best interpreted as a stringent, real-world stress test of model behavior when removed from its intended decision environment and applied to long-horizon outcome prediction under limited pathway context and heterogeneous verification.
If EPIC produced an overall negative result, it would remain valuable. Biomarker research is disproportionately driven by positive findings, whereas negative boundary conditions—essential for credible translation—are underreported. EPIC is more constructive, however, in identifying a clinically plausible boundary condition in which kallikrein-based refinement may retain utility: younger men with moderately elevated PSA. Among men younger than 60 years with PSA ≥2 ng/mL, the 4Kscore demonstrated statistically significant improvements over PSA alone for both high-grade prostate cancer (AUC difference, +0.11; 95% CI: 0.05–0.18) and prostate cancer-specific mortality (AUC difference, +0.21; 95% CI: 0.02–0.40) (2). Subgroup findings warrant caution—particularly in observational cohorts—given their susceptibility to multiplicity, model dependence, and overestimation. These subgroup findings remain hypothesis-generating and require prospective confirmation before informing clinical practice. Nonetheless, the observed pattern is biologically coherent. In younger men, where benign prostatic hyperplasia prevalence and volume-driven PSA inflation are lower, an elevated PSA likely reflects a greater proportion of cancer-relevant biology, rendering incremental kallikrein signals more interpretable rather than obscured by age-related noise. Convergent evidence from a recent multiethnic cohort study similarly reported improved 4Kscore discrimination restricted to men with PSA >2 ng/mL (4), lending independent support. Additional data from the active surveillance setting have shown that the 4Kscore may also carry prognostic value for disease progression independent of clinical and MRI parameters (5), further supporting context-dependent clinical utility. The translational implication is therefore not blanket adoption, but a testable hypothesis regarding clinical positioning. Kallikrein-based refinement may be most rationally deployed in a narrow zone of genuine management uncertainty—men who are “too young to ignore, too early to biopsy”—where improved risk stratification could justify earlier MRI, targeted biopsy, or intensified surveillance, while sparing low-yield procedures in men whose refined risk remains modest.
This niche is translationally consequential because it aligns with how diagnostic pathways are actually evolving. In many settings, the practical sequence is no longer PSA followed by biopsy, but rather PSA combined with clinical assessment followed by MRI, and then targeted and/or systematic biopsy (6), with substantial variation in when MRI is triggered and how biopsy decisions are made after imaging. In this context, the central question is not whether the 4Kscore “outperforms PSA” in isolation, but where—if anywhere—it improves the diagnostic pathway, and what it replaces or complements. Does it function as an MRI triage tool, identifying men who should proceed to imaging and those who can safely defer? Does it add value after MRI by refining biopsy decisions in equivocal or low-suspicion imaging scenarios? Does it reduce low-yield procedures without delaying the detection of lethal disease? These are translational questions because they are actionable, measurable, and patient-relevant. EPIC does not answer them directly, but it cautions against assuming that a biopsy-optimized model can be repurposed as a long-horizon screening enhancer without recalibration and without evidence generated within the intended clinical pathway.
EPIC also illustrates why overreliance on discrimination alone constitutes a category error. The AUC is a ranking metric; it does not indicate whether using a test improves outcomes, reduces harm, or increases efficiency. What clinicians and health systems require are pathway-anchored, action-oriented measures: net benefit across prespecified thresholds, the number of biopsies avoided per clinically significant cancer missed, downstream impacts on MRI utilization and targeted biopsy rates, and—crucially—effects on clinically meaningful endpoints such as metastatic progression or cancer-specific mortality. AUC can improve while decision benefit remains negligible if thresholds are poorly chosen, risk shifts occur far from action boundaries, or downstream pathways already capture most of the available signal. For long-horizon use cases, calibration is equally essential: a model that ranks risk acceptably but miscalibrates absolute risk over 5–10 years can mislead patients, clinicians, and policymakers. EPIC’s subgroup findings should therefore be framed as actionable hypotheses that justify prospective, pathway-embedded testing with prespecified thresholds, rather than as established clinical directives.
From a health-system perspective, EPIC’s message is sobering but clarifying. Broad application of relatively costly reflex panels across all PSA ranges and age groups is unlikely to be efficient if incremental long-horizon information beyond PSA is minimal in unselected populations. In contrast, targeted use in subgroups where benefit appears most plausible—particularly younger men with moderately elevated PSA—may be economically defensible, given the greater potential years of life lost from missed aggressive disease and the higher degree of management uncertainty. The value proposition of a biomarker is not an intrinsic property of the assay; it is a function of deployment strategy and pathway design.
What should the field do next if it takes EPIC seriously? First, biopsy performance should no longer be treated as a proxy for screening performance. Long-horizon endpoints require horizon-specific models—at minimum recalibration, and in many cases re-derivation trained directly on aggressive and fatal outcomes over clinically relevant time frames—with explicit external validation and calibration reporting. Second, evaluation should be pathway-embedded by design, using contemporary comparators (including MRI-integrated workflows), prespecified thresholds, and patient-important endpoints, rather than relying on discrimination alone. Third, subgroup analyses should be biologically grounded and prespecified. If age and PSA strata materially alter the signal-to-noise ratio, models should incorporate this structure intentionally rather than retrofitting narratives post hoc. Randomized screening efforts that incorporate kallikrein panels into structured algorithms, such as the ongoing ProScreen trial in Finland, will be particularly informative (7). Finally, if ceiling effects limit the performance of limited-marker panels, integration with orthogonal risk layers—such as polygenic risk scores, longitudinal PSA kinetics, and emerging proteomic platforms—may be necessary to move beyond incrementalism (8-10).
Smith-Byrne et al. provide a translational correction rather than a translational defeat. When applied outside the biopsy clinic to an unselected population cohort characterized by long lag times and heterogeneous verification, a biopsy-optimized kallikrein algorithm does not outperform PSA plus age for long-horizon prediction of clinically relevant outcomes (2). This finding does not invalidate kallikrein biology; rather, it constrains the portability of a specific mapping from biomarkers to clinical action. The constructive takeaway is not to abandon refinement, but to reposition it: toward selective risk stratification at well-defined decision points—where baseline risk is sufficiently high, uncertainty is genuine, and pathway integration can be empirically tested—rather than toward a universal screening upgrade. The broader lesson for the biomarker field is both simple and overdue: translation is not the act of exporting a model; it is the act of rebuilding evidence for a new decision context.
Acknowledgments
None.
Footnote
Provenance and Peer Review: This article was commissioned by the editorial office, Translational Cancer Research. The article has undergone external peer review.
Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-2026-1-0340/prf
Funding: None.
Conflicts of Interest: The author has completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-2026-1-0340/coif). The author has no conflicts of interest to declare.
Ethical Statement: The author is accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Zappala SM, Scardino PT, Okrongly D, et al. Clinical performance of the 4Kscore Test to predict high-grade prostate cancer at biopsy: A meta-analysis of us and European clinical validation study results. Rev Urol 2017;19:149-55. [PubMed]
- Smith-Byrne K, Fensom GK, Noor U, et al. Evaluation of the 4Kscore Test in Relation to Subsequent Risk of Aggressive Prostate Cancer in the European Prospective Investigation into Cancer and Nutrition. Cancer Epidemiol Biomarkers Prev 2025;34:2058-67. [Crossref] [PubMed]
- Vickers AJ, Vertosick EA, Sjoberg DD. Value of a Statistical Model Based on Four Kallikrein Markers in Blood, Commercially Available as 4Kscore, in All Reasonable Prostate Biopsy Subgroups. Eur Urol 2018;74:535-6. [Crossref] [PubMed]
- Darst BF, Chou A, Wan P, et al. The Four-Kallikrein Panel Is Effective in Identifying Aggressive Prostate Cancer in a Multiethnic Population. Cancer Epidemiol Biomarkers Prev 2020;29:1381-8. [Crossref] [PubMed]
- Hougen HY, Reis IM, Han S, et al. Evaluating 4Kscore's role in predicting progression on active surveillance for prostate cancer independently of clinical information and PIRADS score. Prostate Cancer Prostatic Dis 2025;28:180-6. [Crossref] [PubMed]
- Cornford P, van den Bergh RCN, Briers E, et al. EAU-EANM-ESTRO-ESUR-ISUP-SIOG Guidelines on Prostate Cancer-2024 Update. Part I: Screening, Diagnosis, and Local Treatment with Curative Intent. Eur Urol 2024;86:148-63. [Crossref] [PubMed]
- Auvinen A, Tammela TLJ, Mirtti T, et al. Prostate Cancer Screening With PSA, Kallikrein Panel, and MRI: The ProScreen Randomized Trial. JAMA 2024;331:1452-9. [Crossref] [PubMed]
- McHugh JK, Bancroft EK, Saunders E, et al. Assessment of a Polygenic Risk Score in Screening for Prostate Cancer. N Engl J Med 2025;392:1406-17. [Crossref] [PubMed]
- Sjoberg DD, Vickers AJ, Assel M, et al. Twenty-year Risk of Prostate Cancer Death by Midlife Prostate-specific Antigen and a Panel of Four Kallikrein Markers in a Large Population-based Cohort of Healthy Men. Eur Urol 2018;73:941-8. [Crossref] [PubMed]
- Nordström T, Discacciati A, Bergman M, et al. Prostate cancer screening using a combination of risk-prediction, MRI, and targeted prostate biopsies (STHLM3-MRI): a prospective, population-based, randomised, open-label, non-inferiority trial. Lancet Oncol 2021;22:1240-9. [Crossref] [PubMed]

