Can GPT-4 be a Saviour in the Medical Field ? (2024)

Last updated August 14, 2023
In AI Breakthroughs

GPT-4’s capabilities make it a suitable player to assist in healthcare. However, is it completely reliable?

Share

Published onAugust 14, 2023

byVandana Nair

Can GPT-4 be a Saviour in the Medical Field ? (2)

Can GPT-4 be a Saviour in the Medical Field ? (3)

While OpenAI capabilities have made its way into every domain possible, there’s one field where LLMs, if utilised correctly, can have the highest impact by directly affecting lives — the medical field. Earlier this year, ChatGPT had even cleared all three parts of the United States Medical Licensing Examination (USMLE) and we even saw how ChatGPT helped save a dog’s life through accurate medical diagnosis. However, we have not seen much practical applications in the medical field. Does GPT-4 capabilities make it a suitable player in the medical field?

Massive Potential

A paper released by OpenAI and Microsoft on the Capabilities of GPT-4 on Medical Challenge Problems was released in March, this year. In this research, GPT-4 have shown impressive language understanding and generation abilities in medicine. The study evaluates GPT-4’s performance on medical competency exams and benchmark datasets, even though the model wasn’t specialised for medicine.

The researchers assess GPT-4’s performance on official USMLE practice materials and MultiMedQA datasets. GPT-4 surpasses the USMLE passing score by over 20 points, outperforming previous models (including GPT-3.5) and even models fine-tuned for medical knowledge. Additionally, GPT-4 demonstrates improved probability calibration, implying that it’s better at predicting correct answers. The study also explores how GPT-4 can explain medical reasoning, customise explanations, and create hypothetical scenarios, showcasing its potential for medical education and practice. The findings highlight GPT-4’s capabilities while acknowledging challenges related to accuracy and safety in real-world applications.

In comparison to its older models, GPT-4 has gotten much better when tested on official medical exams such as USMLE. GPT-4 improved by more than 30 percentage points when compared to GPT-3.5. While GPT-3.5 was getting close to this passing score (60% of multiple-choice questions to be correct), GPT-4 passed the score by a huge number.

Alignment and Safety In Place

When an earlier version of GPT-4, referred to as the base model, was compared with GPT-4, the former had slightly better performance by about 3-5% on some of the tests. This suggests that when the model was made safer and better at following instructions, it might have lost a bit of its raw performance. The researchers suggested that future work could focus on finding ways to balance accuracy and safety more effectively by refining the training process or by using specialised medical data.

Where does Med-PaLM fit in?

The above research did not compare GPT-4 with models such as Med-PaLM and Flan-PaLM 540B, as the models were not available for everyone to try at the time of study.

Google recently launched their multimodal healthcare LLM with Med-PaLMM – a large multimodal generative model that encodes and interprets biomedical data. Its capabilities are far more advanced than GPT-4 considering how it can handle various types of medical data such as clinical language, medical images, genomics and even performs a wide range of tasks. The model can generalise to new medical tasks and perform multimodal reasoning without specific training. It is able to precisely recognize and explain medical conditions in images using just instructions and prompts given in language.

Never Fool-Proof

However, GPT-4 applications are not as diverse as the ones Med-PaLM offers. Though GPT-4 was announced with multimodal features, it is not yet available for users. Furthermore, there have been negative observations on GPT-4’s capabilities in medical diagnosis. Problematic and biased results were part of the outcome, and concerns on how GPT-4’s inclination to embed societal biases may hamper its suitability for aiding clinical decisions.

The prevalent problem of hallucinations still persists with GPT-4 spewing incorrect information. The model has been generating incorrect answers for medical citations. GPT-4 produced over 20% errors for medical citations.

21% of medical journal articles cited by GPT-4 were found to be fake; GPT-3.5 cited an estimated 98% fake articles. Narrower topics had more fake articles than broader topics. Despite its promise, ChatGPT is currently not a reliable source of medical data. https://t.co/DCTIkT1OkZ
— JAMA Network Open (@JAMANetworkOpen) August 9, 2023

While GPT-4 might not be completely reliable as a medical assist for diagnosis with the current performance , there are other functions that the model can assist in. Hospitals are looking at AI to help relieve doctor burnout. With applications that can write notes for electronic health records and drafting empathetic notes to patients, AI can help smoothen the process. Transcribing doctor and patient comments, then creating physician’s summary format for electronic health records is one of the best use cases in the medical field. With the current limitations, GPT-4 still has a long way to go before it can be entirely adopted in the medical field.

Access all our open Survey & Awards Nomination forms in one place

Vandana Nair

As a rare blend of engineering, MBA, and journalism degree, Vandana Nair brings a unique combination of technical know-how, business acumen, and storytelling skills to the table. Her insatiable curiosity for all things startups, businesses, and AI technologies ensures that there's always a fresh and insightful perspective to her reporting.

alignment, ChatGPT, Flan-PaLM 540B, gpt-4, GPT4, LLM, Med-PaLM, medical field, OpenAI, safety, USMLE

Chinese AI Companies Surpass American Rivals

Anshul Vipat17/07/2024

Chinese Company SenseTime Releases SenseNova 5.5, Beats OpenAI’s GPT-4o

Siddharth Jindal16/07/2024

Microsoft Introduces SPREADSHEETLLM for Efficient Spreadsheet Understanding

Gopika Raj15/07/2024

OpenAI Secretly Working on Project ‘Strawberry’ to Enhance Reasoning and Build Autonomous AI Agents

Siddharth Jindal13/07/2024

OpenAI Clocks $3.4 Bn in Revenue from ChatGPT Subscriptions

Siddharth Jindal12/07/2024

OpenAI CTO Mira Murati is an Absolute PR Disaster

Tarunya S11/07/2024

OpenAI Partners with Lab that Built the Atomic Bomb for AI Bioscience Research

Siddharth Jindal11/07/2024

‘Odyssey’ AI Built for Hollywood, Sora Can Wait

Vandana Nair09/07/2024

Can GPT-4 be a Saviour in the Medical Field ? (13)

Can GPT-4 be a Saviour in the Medical Field ? (14)

Can GPT-4 be a Saviour in the Medical Field ? (15)

Can GPT-4 be a Saviour in the Medical Field ? (17)

19th - 23rd Aug 2024

Generative AI Crash Course for Non-Techies

Upcoming Large format Conference

Cypher 2024India's Biggest AI Summit

Sep 25-27, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Knowledge Graphs are Making LLMs Less Dumb

Sagar Sharma

Knowledge graphs help reducing AI hallucinations, provides up-to-date information, and leverages the relationships between data points to enhance the quality of AI-generated content.

Digital Twin in Space Research Cuts 100 Years of Work Down to 2 Years

Vandana Nair

Top Editorial Picks

Sagar Sharma

Google DeepMind’s FLAMe Models Outperform GPT-4 and Claude 3 in AI Evaluation Tasks

Siddharth Jindal

Shritama Saha

Google DeepMind Launches MatFormer Framework to Improve On-Device AI Capabilities

Donna Eva

Anthropic Doubles Claude 3.5 Sonnet API’s Output Token Limit to 8K Tokens

Shyam Nandan Upadhyay

Google’s Gemini Fuels Innovation for Karya, Miko, and Other Indian GenAI Startups

Gopika Raj

Microsoft CTO Kevin Scott Joins Shopify’s Board of Directors

Vandana Nair

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration withNVIDIA.

Join the Community >>

GenAI
Corner

View All

Anthropic Launches Claude AI Chatbot for Android to Expand Mobile Reach

Google Introduces IndicGenBench to Benchmark Indic LLMs Across 29 Languages

Google, MeitY Startup Hub to train 10,000 Indian startups in AI

Google Maps API To Cost 70% Less Now

NVIDIA Acquires AI Development Platform Brev

Google-Backed Cropin’s New AI Platform Could Tackle Food Crisis

LlamaIndex Unveils Notebook Implementation of GraphRAG

OpenAI Cofounder Andrej Karpathy Launches AI+Education Company Eureka Labs