‘Embarrassingly simple’ probe finds AI in medical image diagnosis ‘worse than random’

ylai ( @ylai@lemmy.ml ) · 5 months ago

‘Embarrassingly simple’ probe finds AI in medical image diagnosis ‘worse than random’

howrar ( @howrar@lemmy.ca ) · 5 months ago

We have models that are specifically made to be good at these kinds of tasks. Why would you choose the ones that aren’t and then make generalizing claims about how AI sucks in this domain?

spaduf ( @spaduf@slrpnk.net ) · edit-2 5 months ago

Yeah this is probably just straight up misinformation. By no means is a diagnosis going to be made by a generalist multimodal LLM. Diagnosis is a literally a binary classification (although that is an oversimplification) and on medical CV you are optimizing on that directly.

snooggums ( @snooggums@midwest.social ) · edit-2 5 months ago

They did not use a LLM.

In a recent experiment, they set out to determine how reliable LMMs are in medical diagnosis — asking both general and more specific diagnostic questions — as well as whether models were even being evaluated correctly for medical purposes.

Curating a new dataset and asking state-of-the-art models questions about X-rays, MRIs and CT scans of human abdomens, brain, spine and chests, they discovered “alarming” drops in performance.

Thorry84 ( @Thorry84@feddit.nl ) · 5 months ago

You’ve quoted them stating they used LLMs while claiming they did not use a LLM? What am I missing here?

everett ( @everett@lemmy.ml ) · 5 months ago

What am I missing here?

“L” “M” “M”

spaduf ( @spaduf@slrpnk.net ) · 5 months ago

Which in this context just means multimodal LLM, correct?

blindsight ( @blindsight@beehaw.org ) · 5 months ago

Correct.

large language models (LLM) vs. large multi-modal models (LMM)

Regardless, they both use an LLM as the main driver. Multi modal just means that the LLM is interfaced with generative and/or predictive AIs for other types of content like images, sound, video, etc.

This is using a generalist tool for a specialized job. I’d expect the limit for LMMs is telling you if your picture is a heart or a kidney… Maybe. With low accuracy. Diagnosing? lol, hell no.