Large Language Models Are Few-Shot Health Learners
arxiv.orgI find the healthcare applications of this stuff so interesting.
On the one hand, there are SO many reasons using LLMs to help people make health decisions should be an utterly terrible idea, to the point of immorality:
- They hallucinate
- They can't do mathematical calculations
- They're incredibly good at being convincing, no matter what junk they are outputting
And yet, despite being very aware of these limitations, I've already found myself using them for medical advice (for pets so far, not yet for humans). And the advice I got seemed useful, and helped kick off additional research and useful conversations with veterinary staff.
Plenty of people have very limited access to useful medical advice.
There are plenty of medical topics which people find embarrassing, and would prefer to - at least initially - talk to a chatbot than to their own doctor.
Do the benefits outweight the risks? As with pretty much every ethical question involving LLMs, there are no obviously correct answers here.
Whatever else its ills, the bot actually will pay attention to the tokens you're submitting to it to formulate its answer. That puts it well ahead of a majority of the doctors I've seen over the years.
I say this without snark- it is simply true. I should also mention that a good quarter of the medical care folks who have assisted me have gone above and beyond in exceptional ways. It is a field of extremes.
Most doctors/vets I've seen recently are just massively overbooked. You wait 3 hours, then you have 4 minutes of conversation time for one (out of multiple) ailments before you’re booted out the door. Its like you're on an assembly line and the workers can't even keep up.
Why are you waiting 3 hours? Are you going to an urgent care or ER?
Well...
- I waited ~8 hours in an ER with my mother, who had horrible gut pain that turned out to be a ruptured appendix, before they finally took her in, with an urgent emergency referral from an urgent care center we visited earlier.
- I waited 2 hours in a specialist's office waiting room, and I arrived on time. No explanation...
- We waited about an hour and a half at the vet, in a room by ourselves with our dog. Again, we were on time, but they were apologetic when the vet finally came in.
These are the more extraordinary circumstances, but definitely not the only ones (especialy at the vet).
ERs are designed to handle emergencies but people use them for many other reasons. From ignorance, to lack of money, to just not having a PCP. Another big one is that for low income families with government healthcare, taking your kid to the ER on Sunday afternoon with a runny nose doesn’t cost any more than waiting till Monday morning to see their pediatrician.
But the primary bottleneck at an ER is usually not a lack of physicians. It more often a lack of rooms and/or nurses because patients are being boarded there, or are just still there waiting on labs.
And Waiting 2 hours with an appointment for an office is definitely not the norm.
From the time that my appendix ruptured (after being told it was a virus and I should go home at the GP) to the time that I had my first operation of three, they waited 38 hours. This was in the UK, 5ish years ago.
Most people cannot see a doctor for weeks. So after that it's uc or er.
You can generally get a sick visit with your PCP measured in days not weeks if you have one. Usually just a few days.
The OP wrote something about multiple ailments which implied non-emergent conditions.
I mean we book appointments many weeks ahead. That is to be expected.
Attention is all you need, doctors.
> they can't do mathematical calculations
Tell me you never taught service courses for pre-meds without telling me you never taught service courses for pre-meds ;)
> They hallucinate, They're incredibly good at being convincing, no matter what junk they are outputting
Describes about a third of the doctors I've interacted with, tbh.
> And the advice I got seemed useful, and helped kick off additional research and useful conversations with veterinary staff.
It's similar to "Dr. Google". Possible to misuse. But also, there's nothing magical about the medical guild initiation process. Lots of people are smart enough to learn and understand the bits of knowledge they need to accurately self-diagnose and understand tradeoffs of treatment options, then use a medical professional as a consultant to fill in the gaps and validate mental models.
Unfortunately, most medical professionals aren't willing to engage with patients in that mode and would rather misdiagnose than work with an educated patient. (My bil -- a medical doctor, and a fairly accomplished one at that -- has been chided for using "Dr Google" at an urgent care before.)
> Do the benefits outweight the risks? As with pretty much every ethical question involving LLMs, there are no obviously correct answers here.
At the end of the day, it doesn't matter. At least in the US, you won't have access to any meaningful treatment without going through the guild anyways.
I don't think that using LLMs for medical diagnosis is a good idea, but it's important to admit when the status quo is so thoroughly hollowed out of any moral or practical justification that even terrible ideas are better than the alternative of leaving things as they are.
> Lots of people are smart enough to learn and understand the bits of knowledge they need to accurately self-diagnose and understand tradeoffs of treatment options, then use a medical professional as a consultant to fill in the gaps and validate mental models.
This is incredibly dangerous, lots of people are smart enough that they can research questions about their condition/care to discuss with their medical professional but should absolutely not be self-diagnosing. It is very reasonable to ask "I read about X what do you think" but you (and even physicians cannot do this for themselves by the way) should not be self-diagnosing anything.
This is like saying lots of doctors are smart enough to learn and understand the bits of knowledge they need to accurately train LLMs and put them in charge of [life threatening system].
> But also, there's nothing magical about the medical guild initiation process.
You're right, it's not magical. It's just 10+ years of medical training.
Doctors may have 10 years of medical training but they have very little time to apply that knowledge to any particular patient.
If you come to a doctor’s appointment with zero research then you will not be able to push back if your doctor attempts to misdiagnose you. It will be a unidirectional conversation.
If you have prepared for your appointment then the following conversation is more likely to happen:
Patient: I have symptoms X and Y
Doctor: You probably have condition A
Patient: But I don’t have Z, is it really likely that I have A?
Doctor: It’s also possible that you have condition B
In a perfect world, patients would get hour long appointments and doctors would explore the entire fault tree. For rich people this may actually be reality. But for us proles, every minute we get with a doctor is precious so we’d better study up so we can use them as medical oracles.
As stated, being informed is encouraged. Self-diagnosis is not for anyone to do.
I think another issue here is your expectations out of a medical visit may be unrealistic. Physicians aren’t supposed to arrive at the correct diagnosis from the initial visit (for most things). We start with a suspected diagnosis and differential and refine it with investigations and multiple visits for temporality/evolution.
Note that in your hypothetical that probably and possible are not mutually exclusive. It’s entirely possible patient A’s right upper quadrant pain is a gallbladder cancer but it is also probably gallstones even if you tell me the pain isn’t triggered by fatty meals. Just because a preliminary diagnosis is stated as probable it doesn’t mean other potential causes aren’t being simultaneously investigated with that ultrasound. I also don’t need to be telling the patient about all of the potential possibilities from the get go as it may cause anxiety, this is a patient-specific judgement call.
> In a perfect world, patients would get hour long appointments and doctors would explore the entire fault tree.
Honestly, outside of counseling type visits or complex oncology I’m not sure what I would spend an hour talking about. Why do feel we need to explore the entire fault tree in a single visit with missing investigations?
As a hypothetical: 50 y/o male patient comes in with first time rectal bleeding, I’ll ask a few questions and perform a physical exam but regardless of the fault tree or why this happened, this patient is getting a colonoscopy. Until we’ve excluded cancer and inflammatory bowel disease further discussion is moot.
You forgot to explain why it is so dangerous for people to self diagnose
The human body is way more complex than you think, even if you take this warning into account. Being confidently wrong about your own health based on random tidbits you know and ignoring the vast amount of knowledge you don't have is incredibly dangerous.
You are arguing against a strawman of your own making
No, I'm telling you why people who don't have training and a decades worth of education self diagnosing is dangerous. You're just deflecting because you don't like how obvious the answer is.
I assumed it was obvious like “only a fool has himself as a lawyer.”
Would you do your own code review?
It’s impossible to be objective regarding your own health. It’s an ethics violation and sanctionable for physicians to do so for themselves.
Yes, I review my own code all the time. Right before commit, I read through the diff carefully. Then of course my team reviews the code further.
The same approach works for my health, I MUST review and evaluate my health, it's just not reasonable to expect every single human in the world to go to a doctor every other week. If I come to suspect I have a serious illness, I take it to the next level of review - a doctor. You are painting a very dogmatic, black-and-white picture that cannot include this kind of nuanced approach
You’re arguing over semantics and seem to be focusing on minor ailments which is obviously not the point I was making.
Evaluating your health =/= reaching a diagnosis (or self-diagnosis). By all means, you should be conducting self-assessments and patients can absolutely diagnose/manage minor ailments. No one is suggesting you need to see a doctor for every ache, cold, fever or headache.
Part of our job in most patient encounters is providing education on when to escalate care/return for reassessment so you are clearly not expected to go to a doctor every other week.
What is dangerous is like in the rectal bleeding example I gave, one may Google their symptoms and “self-diagnose” hemorrhoids missing (consciously or subconsciously) that concurrent colon cancer is not uncommon (especially these days) and they should be seeing a doctor to assess their risk and plan further investigations.
This is a recent example that happened in a young physician whose delay in seeking care upstaged their cancer to stage IV.
> You are painting a very dogmatic, black-and-white picture that cannot include this kind of nuanced approach
Not really, I’m obviously speaking generally on a message board and not writing a position statement. I was also clearly talking in the context of potentially serious symptoms.
> Then of course my team reviews the code further.
This being the operative part of that. I would hope no one is pushing unreviewed commits to a production environment which is essentially what self-diagnosis is, except to your body.
With the wrong doctors it's downright dangerous to see a doctor.
I'd be happy with summarizing and aggregating of health and longevity articles/papers to have a concise digest of strategies.
Case in point, I'm a big fan of Andrew Huberman (https://www.youtube.com/@hubermanlab). He's quite prolific and his presentations pack a lot of data. Just taking all of that in would require a lot of time. Being able to have it condensed and indexed would be wonderful.
Plenty of others like him (e.g., Rhonda Patrick, Peter Attia, etc.) High quality stuff but there's literally not enough time to take all of it in.
However useful his advice might be, Andrew Huberman isn't a doctor (of medicine).
Summarizing academic research is almost entirely unrelated to the practice of medicine. Medical diagnosis and treatment are different from more typical uses of LLMs in lots of important ways.
Yeah, I was kind of tangential there, but it's strongly related.
Health and longevity is addressed from the other side of medicine. For example, a doctor could diagnose and prescribe medicine of one's type 2 diabetes, but in many of those cases that need is removed by following healthy practices (e.g., not being fat).
But back to the OP -- it seems like well-crafted LLMs could be idiot-savant helpers to guide doctors and ease their load.
> Summarizing academic research is almost entirely unrelated to the practice of medicine.
Do you mean basic science research? Evaluating academic medical research is considered a core competency for physicians.
https://www.royalcollege.ca/ca/en/canmeds/canmeds-framework/...
> Do you mean basic science research?
No, I mean actual diagnosis and treatment.
> Evaluating academic medical research is considered a core competency for physicians.
But it's a very different activity from diagnosis and treatment, which look much more like sequential decision-making and hypothesis-testing than like question-answering.
Those bullet points make LLMs sound just like most human doctors.
it seems to me that you can hardcode the answers to the riddles and exactly match symptoms with illnesses then sort them by likelyhood and provide again hardcoded tests (proposals for) to gather further data.
It also seems capable of anonymizing a large chunk of medical data that we would not want to share normally. Who knows, perhaps it could even be a means of payment.
I have been saying this for months for deep learning in general (and now the new hype in LLMs) in high risk situations such as medical, legal and financial advice and even transportation. The only common use-case which makes sense is summarization and even then, a human expert ends up reviewing the output before post it anyway.
> There are plenty of medical topics which people find embarrassing, and would prefer to - at least initially - talk to a chatbot than to their own doctor.
I don't think you would trust an AI chatbot alone to take a number of pills for any medication instead of going to a human doctor, especially when these AI models risk hallucinating terrible advice and its output is unexplainable and as transparent as a black-box. The same goes for 'full self-driving'.
I don't think one would trust these deep learning-based AI systems in very high risk situations unless they are highly transparent and can thoroughly explain themselves rather than regurgitate back what it has been trained on already.
It is like trusting a AI to pilot a Boeing 737 Max with zero human pilots on board end-to-end. No one would board a plane that has an black-box AI piloting it. (Autopilot is not the same thing)
People take pills from known criminals, with high risk of fentanyl OD, just for fun.
Yes, I think people would indeed take pills prescribed by AI, just make it a robot wearing a lab coat.
Also pilots! I mean, pilots kill themselves and a planeload of people more than you think. Of course people would take black box ai that works.
The main difference is humans can be held accountable of these things, where as an AI system cannot be held accountable as it is not a human.
Accepting unchecked AI systems at scale as the future is plain fantasy in the view of regulator, especially in very high risk industries which is why it makes no sense for anyone to trust these systems without any assistance.
At least for legal there is far more potential than just summarization. Harvey is already producing legal documents with error rates lower than humans.
> Harvey is already producing legal documents with error rates lower than humans.
It is mostly useful and safer for human legal professionals and experts since they have the expertise to check the output but risky and unsafe for those who have little to no legal knowledge.
A user who is a non-legal expert could get into serious trouble if the AI hallucinates output that is contradictory or harms them legally more than it helps them or even both. That is the evergreen risk.
Either way, someone will have to check over the AI's output for that risk and that is for legal human professionals to do, hence why those with no legal experience still trust human lawyers to pay them to check these legal documents.
I agree, Ive found combining LLMs with Google has worked well for research. Use it for all sorts if random things, usually starting with search then hopping to chatgpt or bard when I cant understand the results. Then back to search when I know what to look for again.
> found myself using them for medical advice (for pets so far, not yet for humans)
Which model did you use?
GPT-4
I like running "can my dog eat avocado?" through smaller models to see what happens.
For parrots, definitely not. Avocado will kill them.
At this point is it fair to ask: in which spaces, which we describe with natural language, have LLM's have not been few shot learners in?
What is a "learner"?
It seems that time and time again, transformers are the swiss army knife of learning systems. And specifically LLMs are proving to be like chameleons. In some ways that shouldn't be surprising. Some say that math is a universal language after all, and we seem to agree that math is unreasonably effective at describing reality.
Do you reckon there's pharma people right now wondering how to make LLMs push their drugs?
Take fine-tuning trainers to "conferences", perhaps?
Will they try to make their own?
What a next few years...
Definitely ongoing right now
I find chat GPT to be very helpful for working with programming languages that I’m less comfortable using (shell, python). I know enough to evaluate correct code in these languages, but producing it from scratch is more difficult, which seems like a sweet spot for carefully using ChatGPT for code.
As a physician, I would not be surprised if the medical use of these tools ends up having similar value.
I think the key here is that experts can take better advantage of tools like these because they have more ability to see when it's going off the rails. If you're a brand new programmer, you might be stumped if ChatGPT "hallucinates" a function which doesn't exist within an API. But an experience developer can pick up on the problem pretty quickly and either correct for it or know they need to pursue more traditional routes to solve the problem.
I recently used ChatGPT because my Google was failing to help me remember the name of the standard for securely sharing passwords between systems. My searches kept turning up end user password management related topics. ChatGPT got me to SCIM after one question and one correction.
I could absolutely see a doctor using something like a ChatGPT to help supplement their memory in a way I did. I don't think anyone recommends that doctors just trust ChatGPT, but to use it as a supplementary tool for their own expertise. Even if it's outside of their specific medical domain, it could help them get a basis for having a conversation with one of their specialist colleagues.
Unlike search/random chatbots the bar to beat in medicine (UpToDate) is much higher, when that happens I agree (in terms of diagnosis and management).