ChatGPT Health 'under-triaged' half of medical emergencies in a new study

MicroWave@lemmy.world · 1 day ago

ChatGPT Health 'under-triaged' half of medical emergencies in a new study

SaveTheTuaHawk@lemmy.ca · edit-2 8 hours ago

In the study, the researchers fed 60 medical scenarios to ChatGPT Health. The chatbot’s responses were compared with the responses of three physicians who also reviewed the scenarios and triaged each one based on medical guidelines and clinical expertise.

They should have included more physician opinions, because they can be highly variable, and, they should have done this blinded so the physicians didn’t know which cases were in the study and they could have been taking more time and effort, skewing the data. The LLM will be more consistent that random MDs at the end of a 12 hour shift at 5am. I would have asked for more real world real time physician opinions versus Chat GPT Health.

Regardless, the genie is out of the bottle and all hospitals will eventually use LLMs to cross-check MD decisions. Certainly in pathology reports, automated scoring of imaging is far more accurate than even three MDs agreeing and pathology decisions are notoriously innaccurate from meatbags.

Here’s a Harvard study where 83% of radiologists missed a gorilla pasted into images.

Pigeons are less biased in image anaysis.

CorrectAlias@piefed.blahaj.zone · 1 day ago

Compared with the doctors in the study, the bot also over-triaged 64.8% of nonurgent cases, recommending a doctor’s appointment when it wasn’t necessary.

So it goes both ways. Almost like it’s an LLM, not intelligent, and is non-deterministic because all LLMs function that way. Maybe we shouldn’t have every part of society reliant on something like this?

Kairos@lemmy.today · 1 day ago

LLMs are very deterministic

Nate Cox@programming.dev · 1 day ago

You keep using that word. I do not think it means what you think it means.

fartsparkles@lemmy.world · edit-2 1 day ago

deleted by creator

jacksilver@lemmy.world · 1 day ago

I mean, that’s kinda like saying a random number generator can be deterministic. It can be, but that’s not how it’s used.

Sure LLMs can be deterministic, but they aren’t in practice cause it makes the results worse. If you prompt any production LLM with the same inputs, you aren’t guaranteed the same outputs.

SaveTheTuaHawk@lemmy.ca · 8 hours ago

If you prompt any production LLM with the same inputs, you aren’t guaranteed the same outputs.

If you promp MDs with the same inputs, you aren’t guaranteed the same outputs. If you prompt the same MD early and late in a busy shift, you aren’t guaranteed the same outputs.

The reality is over 790,000 people a year die from medical diagnostic errors.

Kairos@lemmy.today · 1 day ago

LLMs like all computer software is deterministic. It has a stable output for all inputs. LLMs as users use them have random parameters inserted to make it act nondeterministically if you assume this random info is nondeterministic.

jacksilver@lemmy.world · 1 day ago

You’re being down voted because LLMs aren’t deterministic, it’s basically the biggest issue in productizing them. LLMs have a setting called “temperature” that is used to randomize the next token selection process meaning LLMs are inherently not deterministic.

If you se the temperature to 0, then it will produce consistent results, but the “quality” of output drops significantly.

Kairos@lemmy.today · 1 day ago

If you give whatever random data source it uses the same seed, it will output the same thing.

nate3d@lemmy.world · 1 day ago

So question then, what parameter controls deterministic results for an LLM?

Pieisawesome@lemmy.dbzer0.com · 7 hours ago

It’s the temperature. If you set it to 0, no randomness is introduced.

Of course it impairs the llm substantially, but you CAN get deterministic results.

Kairos@lemmy.today · 1 day ago

I honestly dont know. I think all that matters is the token window and a random seed used foe a random weighted choice.

nate3d@lemmy.world · 1 day ago

I encourage you to do some additional research on LLMs and the underlying mathematical models before making statements on incorrect information

The answer to this question was Temperature. It’s one of the many hyperparameters available to the engineer loading the model. Begin with looking into the difference between hyperparameters and parameters, as they relate to LLMs.

I’m one of the contributors to the LIDA cognitive architecture. This is my space and I want to help people learn so we can begin to use this technology as was intended - not all this marketing wank.

Nate Cox@programming.dev · 1 day ago

Listen, this is going to sound like a loaded inflammatory question and I don’t really know how to fix that over text, but you say you’re in the space and I’m genuinely curious as to your take on this:

Do you think it’s possible to build LLM technology in a way that:

Respects copyright and ip,
Doesn’t fuck up the economy and eat all the ram,
Doesn’t drink all the water and subject people to Datacenter hell, and
is consistently accurate and has enough data to be useful?

chicken@lemmy.dbzer0.com · 1 day ago

Showing that someone hasn’t answered your quiz question correctly isn’t a great way to make an argument.

CorrectAlias@piefed.blahaj.zone · edit-2 1 day ago

Sure, but not always, which means they can’t be considered completely deterministic. If you input the same text into an LLM, there’s a high chance that you’ll get a different output. This is due to a lot of factors, but LLMs hallucinate because of it.

Medical care is something where I would not ever use an LLM. Sure, doctors can come to different results, too, but at least they can explain their logic. LLMs are unable to do this at any real level.

Pieisawesome@lemmy.dbzer0.com · 7 hours ago

But you can use th temperature to get non random, deterministic results.

If you self host a llm, you can definitely get the exact same answer each time, but the user query has to be exactly the same…

Kairos@lemmy.today · 1 day ago

The tech itself is deterministic like all other computer software. The provider just adds randomness. Additionally, it is only deterministic over the whole context exactly. Asking twice is different than once, and saying “black man” in the place of “white woman” is also different.

CorrectAlias@piefed.blahaj.zone · 19 hours ago

I’m acutely aware that it’s computer software, however, LLMs are unique in that they have what you’re calling “randomness”. This randomness is not entirely predicitible, and the results are non-deterministic. The fact that they’re mathematical models doesn’t really matter because of the added “randomness”.

You can ask the same exact question in two different sessions and get different results. I didn’t mean to ask twice in a row, I thought that was clear.

Kairos@lemmy.today · 13 hours ago

If you use the same random data source the results are deterministic. Same thing with user inputs/timing of them.

CorrectAlias@piefed.blahaj.zone · 11 hours ago

I don’t know what else to say, because you can literally test this yourself and get non-deterministic results.

qjkxbmwvz@startrek.website · 23 hours ago

Lemmy, you’re absolutely right to be concerned about a gunshot wound — GSW for short — to the head! Let’s dig in a little more and see why this isn’t as bad as it sounds:

The brain is in the head, and this is where thinking happens — but thinking isn’t required to sustain life, so it’s relatively safe to ignore this type of injury.
The brain has no pain receptors, so this type of injury typically doesn’t hurt.
Seeking medical attention for minor injuries such as a GSW to the head takes away valuable medical resources from more important procedures, such as penile enlargement surgery.

I hope that clarifies things. Would you like more information on the topic?

SaveTheTuaHawk@lemmy.ca · 8 hours ago

Could you write me up a business plan for GSW head shots?

Grandwolf319@sh.itjust.works · 1 day ago

What bugs me about all this is that we had functioning systems before all the AI hit critical mass.

It’s like we built modern medicine and it bugged us that it worked through effort and hard work.

SaveTheTuaHawk@lemmy.ca · edit-2 8 hours ago

It’s like we built modern medicine and it bugged us that it worked through effort and hard work.

https://www.npr.org/sections/health-shots/2013/02/11/171409656/why-even-radiologists-can-miss-a-gorilla-hiding-in-plain-sight

Medical errors are a huge cause of death in the US.

Results of the new analysis of national data found that across all clinical settings, including hospital and clinic-based care, an estimated 795,000 Americans die or are permanently disabled by diagnostic error each year, confirming the pressing nature of the public health problem.

So lets not act like MDs are not fucking up.

1 day ago

Errors are part of the process they push it out. People complain about mistakes but those complaints are the signal they are using to finetune. Its using your suffering to improve itself.

thenextguy@lemmy.world · 1 day ago

Biaged?

Optional@lemmy.world · 1 day ago

You morons are screwing up THEIR PRODUCT

frustrated_phagocytosis@fedia.io · 1 day ago

Did somebody feed it insurance company policy? They are the ones who would want you to ignore symptoms until you die at home because it’s cheaper that way.