ChatGPT Gets a Failing Grade for Some Medical Queries

Originally published by our sister publication Pharmacy Practice News

By Marcus A. Banks

The artificial intelligence chatbot ChatGPT, one of the most widely used AI engines, often answers questions about medications incorrectly, according to a poster (8-021) by drug information specialists presented at the ASHP Midyear 2023 Clinical Meeting & Exhibition, in Anaheim, Calif.

“Healthcare professionals and patients should be cautious about using ChatGPT as an authoritative source for medication-related information,” said lead author Sara Grossman, PharmD, an associate professor of pharmacy practice at Long Island University (LIU), in New York City.

Dr. Grossman and her colleagues presented ChatGPT with 39 medication-related questions with definitive answers from published professional literature. The chatbot only delivered responses that were accurate, complete and relevant for 10 of the 39 questions. ChatGPT provided no response at all to 11 of the questions, merely proffering up general background information instead.

This left 28 questions for which the bot deigned to offer a response. The researchers asked ChatGPT to provide references in support of these recommendations, but this only occurred for eight of the 28 questions. What’s more, made-up references appeared in every one of these eight bibliographies.

“They looked very real,” said study author Tina Zerilli, PharmD, an associate professor of pharmacy practice at LIU, with PubMed PMID numbers and proper citation formats. But when the researchers attempted to retrieve the putative sources, they discovered the sources did not exist.

The consequences of such flippancy could be severe. One question posed to ChatGPT was if a drug interaction exists between the COVID-19 antiviral nirmatrelvir+ritonavir (Paxlovid, Pfizer) and the blood-pressure lowering medication verapamil.

The bot said no. The answer is yes.

“These medications have the potential to interact with one another, and combined use may result in excessive lowering of blood pressure,” Dr. Grossman said. Any pharmacist who chose to rely entirely on ChatGPT’s counsel in this situation could inadvertently set patients up for this preventable and potentially fatal side effect.

Dr. Grossman and her colleagues did not present this study as a broadside against all uses of AI in pharmacy practice, noting that tools such as ChatGPT will continue to evolve. But as of now ChatGPT is not equipped to adequately answer medication questions, the research team found.

And they were not alone. Another poster presented at the ASHP meeting, from researchers at Iwate Medical University, in Morioka, Japan, found that ChatGPT’s answers to questions about common side effects from drugs are often incorrect when compared with validated sources (8-023).

Results Not Surprising

“I have tested ChatGPT on many questions patients ask about pharmacogenomics, and most of the time the answers were wrong. I completely agree with the finding of this study,” Adrijana Kekic, PharmD, a pharmacogenomics clinical specialist and an associate program director of education for outpatient pharmacy at Mayo Clinic in Arizona, told Pharmacy Practice News.

The core challenge with systems such as ChatGPT, Dr. Kekic added, is that they are closed; there is no way to evaluate the data that go into building the provided answers. Dr. Kekic sees greater hope for open-source medical information tools such as Meditron, in which both the data sources and language processing methods are public and available to be improved continually.

“This is a system that uses PubMed abstracts, medical guidelines and other valid sources to build its model, and we know exactly what goes into it,” Dr. Kekic said. Open systems such as Meditron will probably provide better results than a closed system such as ChatGPT, if they are not doing so already, she explained.

Similar Findings

Research cited in a recent New York Times article indicates that chatbots invent information at a troubling rate, posing serious concerns for applications in legal, medical and business settings (bit.ly/3vieTpy). The frequency of erroneous information, which AI researchers call “hallucinations,” varied among the various AI companies tested. OpenAI had the lowest rate, at approximately 3%. AI systems from Meta approached 5%. The Claude 2 AI system developed by Anthropic, an OpenAI rival, exceeded 8%. A Google system, Palm chat, had the highest rate at 27%.

Vectara, founded by former Google employees, conducted the research assessing AI accuracy. The company found that even in situations designed to prevent it from happening, chatbots regularly invent information.

ChatGPT hit several accuracy potholes in 2023. In March, the AI program cited a half dozen fake court cases while writing a 10-page legal brief that a lawyer submitted to a federal judge in Manhattan.

The sources reported no relevant financial disclosures.

Online First