Integrating artificial intelligence in renal cell carcinoma: evaluating ChatGPT’s performance in educating patients and trainees
Original Article

Integrating artificial intelligence in renal cell carcinoma: evaluating ChatGPT’s performance in educating patients and trainees

J. Patrick Mershon1, Tasha Posid1, Keyan Salari2, Richard S. Matulewicz3, Eric A. Singer1, Shawn Dason1

1Division of Urologic Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA; 2Department of Urology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; 3Urology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA

Contributions: (I) Conception and design: S Dason, JP Mershon; (II) Administrative support: EA Singer, S Dason, T Posid; (III) Provision of study materials or patients: EA Singer, S Dason, K Salari, RS Matulewicz; (IV) Collection and assembly of data: JP Mershon, T Posid; (V) Data analysis and interpretation: JP Mershon, T Posid; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Shawn Dason, MD. Division of Urologic Oncology, The Ohio State University Comprehensive Cancer Center, 2121 Kenny Road, Columbus, OH 43210, USA. Email: shawn.dason@osumc.edu.

Background: OpenAI’s ChatGPT is a large language model-based artificial intelligence (AI) chatbot that can be used to answer unique, user-generated questions without direct training on specific content. Large language models have significant potential in urologic education. We reviewed the primary data surrounding the use of large language models in urology. We also reported findings of our primary study assessing the performance of ChatGPT in renal cell carcinoma (RCC) education.

Methods: For our primary study, we utilized three professional society guidelines addressing RCC to generate fifteen content questions. These questions were inputted into ChatGPT 3.5. ChatGPT responses along with pre- and post-content assessment questions regarding ChatGPT were then presented to evaluators. Evaluators consisted of four urologic oncologists and four non-clinical staff members. Medline was reviewed for additional studies pertaining to the use of ChatGPT in urologic education.

Results: We found that all assessors rated ChatGPT highly on the accuracy and usefulness of information provided with overall mean scores of 3.64 [±0.62 standard deviation (SD)] and 3.58 (±0.75) out of 5, respectively. Clinicians and non-clinicians did not differ in their scoring of responses (P=0.37). Completing content assessment improved confidence in the accuracy of ChatGPT’s information (P=0.01) and increased agreement that it should be used for medical education (P=0.007). Attitudes towards use for patient education did not change (P=0.30). We also review the current state of the literature regarding ChatGPT use for patient and trainee education and discuss future steps towards optimization.

Conclusions: ChatGPT has significant potential utility in medical education if it can continue to provide accurate and useful information. We have found it to be a useful adjunct to expert human guidance both for medical trainee and, less so, for patient education. Further work is needed to validate ChatGPT before widespread adoption.

Keywords: Artificial intelligence (AI); renal neoplasm; renal cell carcinoma (RCC); health education


Submitted Dec 05, 2023. Accepted for publication Apr 10, 2024. Published online May 21, 2024.

doi: 10.21037/tcr-23-2234


Highlight box

Key findings

• Clinicians and laypeople both rated ChatGPT highly on accuracy and usefulness based on responses to basic questions about small renal masses and renal cell carcinoma.

What is known and what is new?

• ChatGPT and other artificial intelligence (AI) tools are increasingly used by patients seeking information and decision-making help with new medical diagnoses. The accuracy and utility of this information is unknown and is likely subject-specific.

• Reviewing ChatGPT responses improved confidence in clinical information provided by the program and increased agreement that it should be used for medical education.

What is the implication, and what should change now?

• Clinician awareness of AI capabilities is critical to guide patients and medical trainees in its safe and effective use in urology. Ongoing assessments of safety and accuracy are needed to justify its use in urology and in medical applications in general.


Introduction

In November 2022 OpenAI announced public access to a new online program, ChatGPT, that has rapidly become a cultural sensation. ChatGPT is an example of a “limited”, or “narrow” artificial intelligence (AI) that can be used to answer unique and novel questions across a variety of subject matters without direct training. Limited or narrow AI are programs with a specific set of constraints and output types and can only handle tasks within their programming or training parameters. General AI has not yet been achieved but would be capable of learning and solving problems of any kind or format, on the level of a human mind and without concrete limits to its abilities. Concerns exist about its long-term applications in a variety of fields, including undergraduate education, scientific and creative writing, and in medical contexts (1,2). Many of the alarms being raised about ChatGPT concern its ability to mislead readers about the source of submitted material and the possibility of dangerous misinformation reaching a vulnerable audience (3-5). One particularly alarming trend has been ChatGPT’s tendency to generate entirely fake literature citations to support its claims, which it will present as fact (6). These concerns have prompted an explosion of ethical and philosophical debates about the role of AI within medicine and for patient-facing applications (7-9).

Despite the risks of using AI in medicine, its potential benefits cannot be ignored and will force the field to wrestle with how and when (not if) to appropriately develop and employ this tool (7,10). A clear role for AI exists within medical education, a rapidly changing field that needs to respond quickly to new data to keep up with technological and scientific progress (11-13). Early perspectives of the use of ChatGPT for medical education have overall been cautiously optimistic, with focus on its limited ability to return consistently accurate technical information but with a high level of personalization and adaptability that could make it an excellent support tool for educators (14,15).

Another key area of study is AI use in patient education (10). AI-based decision aids for patients have been in existence for over 20 years, typically in more primitive forms than the now extremely user-friendly ChatGPT. ChatGPT is still in its infancy and much of the literature surrounding its value to patients is opinion-based and limited, but there has been a recent explosion in research attempting to evaluate its capabilities and limitations within the medical field. In the past year alone there have been assessments of its ability to answer questions regarding cirrhosis and hepatocellular carcinoma (16), total hip arthroplasty (17), obstructive sleep apnea (18), diabetes (19), and many other topics. Specific to urology, men’s health (20), pediatric urology (21), and prostate cancer (22,23) have been investigated, though notably no work has yet been done within kidney cancer specifically.

In this pilot study we attempt to establish in general, primarily qualitative terms the guideline-concordance and subjective utility of ChatGPT’s response to typical questions regarding the management of renal masses and renal cell carcinoma (RCC). We attempt to assess the ideal audience that would most benefit from interacting with ChatGPT to learn about renal malignancies and other urologic disease states. In conclusion, we review the current state of the literature as regards patient and trainee education using ChatGPT and other AI tools.


Methods

Assessment development

Our first task was to develop a standardized set of questions regarding the diagnosis and management of renal masses and RCC that could reasonably be answered by ChatGPT. We reviewed three major professional society guidelines published prior to the AI learning end date regarding renal masses and RCC diagnosis and management. These guidelines were chosen to fairly assess the ability of ChatGPT to process and synthesize the data available to it rather than testing its ability to extrapolate beyond its accessible knowledge. To this end, the American Society of Clinical Oncology (ASCO) 2022 guideline was initially reviewed but not included due to its publication after the AI learning end date. Using the guidelines below, we generated topic points with relative consensus among professional organizations to have an objective basis for assessing the accuracy of ChatGPT responses.

  • Management of Small Renal Masses: American Society of Clinical Oncology Clinical Practice Guideline (24);
  • NCCN Clinical Practice Guidelines in Oncology (NCCN Guidelines®) - Kidney Cancer (25);
  • Renal Mass and Localized Renal Cancer: Evaluation, Management, and Follow-up: AUA Guideline (26).

The primary author (J.P.M.) generated an initial list of questions which was then reviewed by two urologic oncologists (S.D., E.A.S.) to reach a consensus list of fifteen content questions to submit to ChatGPT 3.5 in February of 2023, with the goal being a mix of patient- and medical trainee-level questioning. J.P.M., S.D. and E.A.S. additionally reviewed the Canadian Urological Association guideline to ensure questions appropriately addressed international variations in small renal mass (SRM) management recommendations (27). The final list was input sequentially into the program and the responses recorded in a single document. To assess any potential bias of reviewers and to record changes in attitude after assessment, we obtained pre- and post-content assessment information regarding familiarity with and attitudes towards ChatGPT or other AI tools. The entire assessment tool is included in Appendix 1.

Assessment

We created a survey through Qualtrics (Qualtrics, Provo, UT, USA) with pre- and post-assessment questions as well as the fifteen content questions and answers (please see Appendix 1). The pre- and post-assessment questions asked about familiarity with ChatGPT and then asked assessors to select learner groups that would benefit most from its use, with choices of patients, medical students, residents, and attending physicians. Four fellowship-trained urologic oncologists (S.D., E.A.S., R.S.M., K.S.) utilized the assessment tool independently. After this initial review we extended the survey to four non-clinical reviewers on staff to obtain more qualitative data regarding a layperson’s impression of the information. Each of these staff members have a Bachelor of Arts (BA) or Bachelor of Science (BS) but none have any specific medical training, do not educate residents or patients, and were chosen at random from the available research staff at the Ohio State University.

For the assessment portion, each of the fifteen questions was included alongside ChatGPT’s response to that specific input, copied verbatim with no editing. For each question, the assessor was asked to rate the accuracy of the response on a Likert scale of 1 to 5 (1 being entirely inaccurate and 5 being entirely accurate). Reviewers compared the responses to the three guideline statements issued above as an objective benchmark. There was also an option to indicate dangerous or nonsensical information that could represent harm to patients or trainees. Assessors were then asked how useful the response was for an audience with a novice-level understanding equivalent to a typical patient with a new diagnosis of an SRM or RCC. Responses to this question were on a Likert scale of 1 to 5 (1 being entirely not at all useful and 5 being extremely useful). Without an objective benchmark all reviewers were asked to use their best judgment given their experience with the reading and comprehension levels of typical patients and medical trainees. Pre- and post-assessment questions were asked regarding overall subjective impressions of ChatGPT accuracy and usefulness in medical and patient education, with the addition of free text boxes to allow for qualitative discussions of the content responses and for assessors to describe their overall impression of the tool.

Statistical analysis

Data were collected from the Qualtrics database in raw format. Mean scores and standard deviation for each question score were calculated. Differences between pre- and post-assessment questions were then analyzed using paired-sample t-tests with a P value cutoff of 0.05 assigned for significance. Clinician and non-clinician scores were compared using a simple t-test. Selected quotations from reviewer’s qualitative impressions were included in the final manuscript at the discretion of the primary author to best represent the reviewer’s impressions of the tool.


Results

Pre-assessment questions: before the assessment, familiarity with ChatGPT did not differ between clinicians (mean =2.75/5) and non-clinicians (mean =2.5/5). Of those who had heard of it, 5/8 respondents said they had heard about it from social media and 3/8 said they had heard about it from colleagues in healthcare. Before completing the assessment questions, clinicians and non-clinicians had overall neutral assessments of the accuracy of the information provided by ChatGPT (mean score 3±0 for clinicians, 3.25±0.5 for non-clinicians), with a lower opinion of the accuracy of clinical information (mean score 2.625±1.06, Figure 1). Similarly, clinicians were neutral regarding the use of ChatGPT for medical (mean 3±0.81) and patient (mean 3.25±0.95) education.

Figure 1 Change in responses to qualitative questions regarding ChatGPT before and after completing the survey tool and reviewing ChatGPT responses to prompts. Assessors were asked about the accuracy and trustworthiness of responses and then asked to rate its utility in both medical and patient education. Error bars represent +/− SEM. Asterisks denote statistically significant differences. SEM, standard error of the mean.

We found that all assessors rated ChatGPT highly on the accuracy and usefulness of information provided in response to the generated questions, with scores differing slightly by question (Figures S1,S2). Clinicians rated content answers with mean scores of 3.85 [±0.42 standard deviation (SD)] out of 5 for accuracy and 3.8 (±0.62) out of 5 for usefulness (Figure 2). Non-clinicians gave slightly lower scores with mean scores of 3.43 (±0.77) for accuracy and 3.35 (±0.90) for usefulness but these groups were not significantly different in their assessments (P=0.37 for accuracy, P=0.21 for usefulness).

Figure 2 Perceptions of ChatGPT responses to the prompt questions between clinicians and non-clinicians. Scores of 1–5 were assigned on a Likert scare rating the accuracy (compared to guideline statements) and the usefulness of each prompt answer with a score of 5 representing “very accurate” or “very useful”. Error bars represent +/− SEM. SEM, standard error of the mean.

Completing the content assessment improved clinician confidence in the accuracy of information generated by ChatGPT (mean improvement of 0.75 on a 5-point Likert scale, P=0.01) and in the accuracy of clinical information provided (mean improvement of 1.5, P=0.01, Figure 1). Both clinicians and non-clinicians were more likely to agree that ChatGPT or another AI tool could be useful in medical education after reviewing answers (mean improvement of 0.75 and 1.25 for clinicians and non-clinicians, respectively, P=0.007). There was no significant change in attitude towards ChatGPT use in patient education (mean improvement 0.5, P=0.30).

All assessors were skeptical of the utility of ChatGPT as an education tool before completing content assessment questions. One relevant quote stated, “Patients should always seek a provider and ask questions to their provider about what the AI said.” After content assessment, reviewers were overall positive with their impressions of the tool. “This technology feels very beneficial to audiences without vast medical knowledge (i.e., medical students and patients). However, I do feel that for certain audiences the medical jargon and/or quantity of information may confuse or worry them. I think it is a great supplemental tool when one doesn’t have immediate access to a physician to gain an individual perspective.” Another commenter noted: “It was much more accurate and useful than I anticipated.” However, one concern brought up was the source of content/data retrieved, with multiple reviewers wishing it could provide citations, “Question 9 cited studies with no actual references, authors, or links.”


Discussion

Most of our content assessors were aware of ChatGPT through social media but had limited personal experience with the AI tool. Initially, assessor opinions were either skeptical or neutral regarding its information accuracy and its utility in medical and patient education. This is consistent with a large global survey of over 450 urologic clinicians; overall providers use ChatGPT for research and academic pursuits but do not use it for patient care (5). However, during content assessment, both clinicians and non-clinicians rated answer quality to be quite high both in terms of accuracy and usefulness to an audience with limited clinical knowledge equivalent to a typical patient (Figure 1). Reviewing the answers improved perceptions of the tool and its capabilities as well as its utility in educating novices (Figure 2). This study provides critical data assessing ChatGPT’s accuracy and utility in directly answering medical questions at the level of a trainee or patient regarding RCC. However, this is only a pilot study, and much work remains to be done in this critical area.

AI-supported healthcare tools have been in use for decades with varying degrees of efficacy and sophistication; recently, Jayakumar et al. developed an AI-enabled patient decision tool for total knee replacement that improved decision quality, patient perception of shared decision making and treatment satisfaction without changing surgical intervention rates (28). Multiple AI chatbots have shown a remarkable ability to formulate patient-friendly responses that can minimize the likelihood of confusion and alarm often experienced by patients searching online about their conditions (29).

The literature surrounding ChatGPT itself and its value to patients is limited, but there has been a recent push to evaluate its capabilities and limitations. As a whole, these studies have lauded the capabilities of the new technology. Gabriel et al. [2023] directly compared ChatGPT responses to a human-generated patient information handout about radical prostatectomy and found 79% to be concordant and comparable, with 93% containing pertinent and accurate information (23). Durairaj et al. [2023] asked expert rhinoplasty surgeons to rate answers to typical patient questions from both a fellow surgeon and from ChatGPT 3.5 and found that the AI tool outperformed the surgeons with higher ratings in accuracy, completeness, and overall quality (30). In a head-to-head comparison, the AI-generated answer was the preferred response over 80% of the time. Similar results were shown with general medical questions posed by patients online, with licensed healthcare professionals preferring ChatGPT responses to physician responses 79% of the time, with nearly 10-fold higher rates of empathetic responses (31).

ChatGPT, however, did not perform as well on all assessments; Musheyev et al. [2023] found that ChatGPT 3.5 had moderately high information quality when responding to patient questions about multiple urologic malignancies, but only moderate understandability and low actionability and actually underperformed compared to other AI Chatbots (32). Similar results were seen in a broad survey of the most common urologic conditions with ChatGPT responses rated moderate on validated tools by two urologists (33). Coskun et al. [2023] evaluated prostate cancer patient information and found that ChatGPT performed only moderately well, with a mean score of 3.62 on a 5-point Likert scale of general quality (22). The authors concluded that the current version of the technology should be viewed with caution and could be further optimized before deployment as a patient education tool.

Education represents another key area of development within the AI sphere. Many articles discuss ChatGPT and other AI tools as a double-edged sword within education as a whole, with rapid information delivery balanced by issues of possible plagiarism and offloading important intellectual work (10,13,34). Many others focus on its flaws; making up scientific papers and other sources that do not exist and occasionally answering confidently with false information (4,6,12,29). Notably we did not see this occur with our limited number of content-based questions. Its current failure to cite sources and to generate occasional misinformation is an extremely valid concern but seems to be primarily a technical challenge that could ideally be addressed with ongoing programming improvements (6). Future iterations of this technology should be even more powerful, with the ability to rapidly trawl the internet for all currently available data on a specific topic and provide more up to date information than a human could possibly generate (35). The currently available version of ChatGPT 3.5 does not have access to an up-to-date version of the internet and is time-locked at early 2022, limiting its real-time accuracy.

In the realm of medical education specifically, AI chatbots can (almost always) give accurate technical information but with a high level of personalization and adaptability (14,36). One notable example of its extreme power is its ability to generate and execute interactive medical simulations in text form for learners; Scherr et al. [2023] used ChatGPT 3.5 to create Advanced Cardiac Life Support (ACLS) and intensive care unit (ICU) scenarios with opportunities for improvisation and real-time feedback, with the goal to enhance medical student readiness for clinical clerkships (15). ChatGPT’s achievement of a passing score on the United States Medical Licensing Examination (USMLE) Step 1 licensing exam generated headlines and its ability to logically justify its answers prompted many to argue for its immense utility as a study and training tool for medical students learning how to tackle both clinical and exam problems (36,37). Most literature on this topic highlights the need for caution, but focus on our opportunity to shape the implementation of this technology (12,15). Specific recommendations include AI literacy training and increased focus on source evaluation and evidence-based medicine, with heightened effort to teach empathy and good communication skills as the technology evolves (13).

The recent focus on AI is in many ways just one facet of rapid technologic advancement impacting medicine, with telehealth being the iteration prior. Particularly during the coronavirus disease 2019 (COVID-19) pandemic, this technology allowed physicians to continue to achieve patient care goals safely and effectively. Patients on the whole welcomed the change, with a majority satisfied with its use for their care and not feeling depersonalization from its use (38). Younger and more tech-savvy patients are even more likely to embrace these technological changes in medicine, and physicians should take note (30). While adoption of telehealth was difficult for some, its utility was immense and its introduction into healthcare has expanded our options for interacting with and treating patients in ways that meet their needs. AI will be a similarly impactful tool, but the burden is on physicians to use it safely and well.

Our pilot study is limited primarily by its size, with only four urologic oncologists assessing the tool’s accuracy against an objective benchmark of relevant guidelines. Additionally, our assessment tool was not validated (as no validated tools for assessing AI responses currently exist). The possibility of bias given reviewers’ knowledge of the information source is real but was unavoidable in this version of our study. Future work could consider blinding reviewers to the source of information to generate a more objective assessment of information quality, as has been done in several small early studies (30). This work was intended primarily to generate initial, more qualitative impressions of the tool from the perspectives of providers and laypeople to further hypothesis generation and study. We intend to expand upon this work in the future within our department to include more reviewers, more sophisticated assessment metrics, and to query other common topics within urology. It should also be noted that improvement of the AI underlying ChatGPT may date our results and repeating the analysis frequently with the most up-to-date version of the software will be extremely important. Since this work was completed, ChatGPT 4.0 and countless other AI tools have been released and will need similar assessments prior to widespread adoption in a medical context.


Conclusions

We found that clinicians and lay assessors consistently rated ChatGPT highly on the accuracy and usefulness of information provided in response to questions regarding the management of SRMs and RCC. Completing content assessment improved confidence in the accuracy of ChatGPT’s information and increased agreement that it should be used for medical education. These results are an early, informal evaluation of the capabilities of evolving AI tools but show great promise for this new technology.

Understanding how to leverage ChatGPT and other AI tools effectively and safely will be critical in the coming years in medicine as in many other fields (10). This is an extremely difficult area to study but the involvement of physicians and sub-specialty trained surgeons is critical to help shape AI into a positive force for patients and trainees alike (5,31,39). Just as clinicians have had to adjust their education and counseling strategies with the advent of “Dr. Google”, AI proliferation will fundamentally shift how patients and trainees interact with medical information and we need to prepare ourselves for a new era (13). This pilot study is a first step towards understanding the power and pitfalls of this new tool and will facilitate ongoing study of this critical topic.


Acknowledgments

Funding: This work was supported by grants from the National Cancer Institute (No. 2P30CA016058, No. K08 CA259452) and from a Memorial Sloan Kettering Cancer Center Grant (No. P30 CA008748) to R.S.M.


Footnote

Data Sharing Statement: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-23-2234/dss

Peer Review File: Available at https://tcr.amegroups.com/article/view/10.21037/tcr-23-2234/prf

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-23-2234/coif). E.A.S. serves as an unpaid editorial board member of Translational Cancer Research from January 2023 to December 2024. R.S.M. reports receiving grants support from National Cancer Institute (No. 2P30CA016058, No. K08 CA259452) and from Memorial Sloan Kettering (No. P30 CA008748). S.D. reports that receiving consulting fees from Bristol Myers Squibb and Educational funding from Intuitive Surgical. The other authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Stokel-Walker C. AI bot ChatGPT writes smart essays - should professors worry? Nature 2022; Epub ahead of print. [Crossref]
  2. Biswas S. ChatGPT and the Future of Medical Writing. Radiology 2023;307:e223312. [Crossref] [PubMed]
  3. GaoCAHowardFMMarkovNSComparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers.bioRxiv 2022;2022.12.23.521610. 10.1101/2022.12.23.521610
  4. Brewster J, Arvanitis L, Sadeghi M. The next great misinformation superspreader: how ChatGPT could spread toxic misinformation at unprecedented scale. Newsweek LLC; NewsGuard; 2023.
  5. Eppler M, Ganjavi C, Ramacciotti LS, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol 2024;85:146-53. [Crossref] [PubMed]
  6. Weiser B. Here's What Happens When Your Lawyer Uses ChatGPT. The New York Times; 2023.
  7. Shen Y, Heacock L, Elias J, et al. ChatGPT and Other Large Language Models Are Double-edged Swords. Radiology 2023;307:e230163. [Crossref] [PubMed]
  8. Xu L, Sanders L, Li K, et al. Chatbot for Health Care and Oncology Applications Using Artificial Intelligence and Machine Learning: Systematic Review. JMIR Cancer 2021;7:e27850. [Crossref] [PubMed]
  9. Cacciamani GE, Collins GS, Gill IS. ChatGPT: standard reporting guidelines for responsible use. Nature 2023;618:238. [Crossref] [PubMed]
  10. Gabrielson AT, Odisho AY, Canes D. Harnessing Generative Artificial Intelligence to Improve Efficiency Among Urologists: Welcome ChatGPT. J Urol 2023;209:827-9. [Crossref] [PubMed]
  11. Suárez A, Adanero A, Díaz-Flores García V, et al. Using a Virtual Patient via an Artificial Intelligence Chatbot to Develop Dental Students' Diagnostic Skills. Int J Environ Res Public Health 2022;19:8735. [Crossref] [PubMed]
  12. Boscardin CK, Gin B, Golde PB, et al. ChatGPT and Generative Artificial Intelligence for Medical Education: Potential Impact and Opportunity. Acad Med 2024;99:22-7. [Crossref] [PubMed]
  13. Jamal A, Solaiman M, Alhasan K, et al. Integrating ChatGPT in Medical Education: Adapting Curricula to Cultivate Competent Physicians for the AI Era. Cureus 2023;15:e43036. [Crossref] [PubMed]
  14. Mogali SR. Initial impressions of ChatGPT for anatomy education. Anat Sci Educ 2024;17:444-7. [Crossref] [PubMed]
  15. Scherr R, Halaseh FF, Spina A, et al. ChatGPT Interactive Medical Simulations for Early Clinical Education: Case Study. JMIR Med Educ 2023;9:e49877. [Crossref] [PubMed]
  16. Yeo YH, Samaan JS, Ng WH, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol 2023;29:721-32. [Crossref] [PubMed]
  17. Mika AP, Martin JR, Engstrom SM, et al. Assessing ChatGPT Responses to Common Patient Questions Regarding Total Hip Arthroplasty. J Bone Joint Surg Am 2023;105:1519-26. [Crossref] [PubMed]
  18. Campbell DJ, Estephan LE, Mastrolonardo EV, et al. Evaluating ChatGPT responses on obstructive sleep apnea for patient education. J Clin Sleep Med 2023;19:1989-95. [Crossref] [PubMed]
  19. Sharma S, Pajai S, Prasad R, et al. A Critical Review of ChatGPT as a Potential Substitute for Diabetes Educators. Cureus 2023;15:e38380. [Crossref] [PubMed]
  20. Shah YB, Ghosh A, Hochberg AR, et al. Comparison of ChatGPT and Traditional Patient Education Materials for Men's Health. Urol Pract 2024;11:87-94. [Crossref] [PubMed]
  21. Caglar U, Yildiz O, Meric A, et al. Evaluating the performance of ChatGPT in answering questions related to pediatric urology. J Pediatr Urol 2024;20:26.e1-5. [Crossref] [PubMed]
  22. Coskun B, Ocakoglu G, Yetemen M, et al. Can ChatGPT, an Artificial Intelligence Language Model, Provide Accurate and High-quality Patient Information on Prostate Cancer? Urology 2023;180:35-58. [Crossref] [PubMed]
  23. Gabriel J, Shafik L, Alanbuki A, et al. The utility of the ChatGPT artificial intelligence tool for patient education and enquiry in robotic radical prostatectomy. Int Urol Nephrol 2023;55:2717-32. [Crossref] [PubMed]
  24. Finelli A, Ismaila N, Bro B, et al. Management of Small Renal Masses: American Society of Clinical Oncology Clinical Practice Guideline. J Clin Oncol 2017;35:668-80. [Crossref] [PubMed]
  25. Motzer RJ, Jonasch E, Agarwal N, et al. Kidney Cancer, Version 3.2022, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw 2022;20:71-90. [Crossref] [PubMed]
  26. Campbell SC, Clark PE, Chang SS, et al. Renal Mass and Localized Renal Cancer: Evaluation, Management, and Follow-Up: AUA Guideline: Part I. J Urol 2021;206:199-208. [Crossref] [PubMed]
  27. Richard PO, Violette PD, Bhindi B, et al. Canadian Urological Association guideline: Management of small renal masses - Full-text. Can Urol Assoc J 2022;16:E61-75. [Crossref] [PubMed]
  28. Jayakumar P, Moore MG, Furlough KA, et al. Comparison of an Artificial Intelligence-Enabled Patient Decision Aid vs Educational Material on Decision Quality, Shared Decision-Making, Patient Experience, and Functional Outcomes in Adults With Knee Osteoarthritis: A Randomized Clinical Trial. JAMA Netw Open 2021;4:e2037107. [Crossref] [PubMed]
  29. Hopkins AM, Logan JM, Kichenadasse G, et al. Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift. JNCI Cancer Spectr 2023;7:pkad010. [Crossref] [PubMed]
  30. Durairaj KK, Baker O, Bertossi D, et al. Artificial Intelligence Versus Expert Plastic Surgeon: Comparative Study Shows ChatGPT "Wins" Rhinoplasty Consultations: Should We Be Worried? Facial Plast Surg Aesthet Med 2023; Epub ahead of print. [Crossref] [PubMed]
  31. Ayers JW, Poliak A, Dredze M, et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med 2023;183:589-96. [Crossref] [PubMed]
  32. Musheyev D, Pan A, Loeb S, et al. How Well Do Artificial Intelligence Chatbots Respond to the Top Search Queries About Urological Malignancies? Eur Urol 2024;85:13-6. [Crossref] [PubMed]
  33. Szczesniewski JJ, Tellez Fouz C, Ramos Alba A, et al. ChatGPT and most frequent urological diseases: analysing the quality of information and potential risks for patients. World J Urol 2023;41:3149-53. [Crossref] [PubMed]
  34. O'Connor S. Open artificial intelligence platforms in nursing education: Tools for academic progress or abuse? Nurse Educ Pract 2023;66:103537. [Crossref] [PubMed]
  35. Adams D, Chuah KM. Artificial Intelligence-Based Tools in Research Writing: Current Trends and Future Potentials. Artificial Intelligence in Higher Education 2022:169-84.
  36. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health 2023;2:e0000198. [Crossref] [PubMed]
  37. Gilson A, Safranek CW, Huang T, et al. How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ 2023;9:e45312. Erratum in: JMIR Med Educ 2024;10:e57594. [Crossref] [PubMed]
  38. Amparore D, Campi R, Checcucci E, et al. Patients' perspective on the use of telemedicine for outpatient urological visits: Learning from the COVID-19 outbreak. Actas Urol Esp (Engl Ed) 2020;44:637-8. [Crossref] [PubMed]
  39. Javid M, Reddiboina M, Bhandari M. Emergence of artificial generative intelligence and its potential impact on urology. Can J Urol 2023;30:11588-98. [PubMed]
Cite this article as: Mershon JP, Posid T, Salari K, Matulewicz RS, Singer EA, Dason S. Integrating artificial intelligence in renal cell carcinoma: evaluating ChatGPT’s performance in educating patients and trainees. Transl Cancer Res 2024;13(11):6246-6254. doi: 10.21037/tcr-23-2234

Download Citation