•  snooggums   ( @snooggums@midwest.social ) 
    link
    fedilink
    English
    6
    edit-2
    5 months ago

    They did not use a LLM.

    In a recent experiment, they set out to determine how reliable LMMs are in medical diagnosis — asking both general and more specific diagnostic questions — as well as whether models were even being evaluated correctly for medical purposes.

    Curating a new dataset and asking state-of-the-art models questions about X-rays, MRIs and CT scans of human abdomens, brain, spine and chests, they discovered “alarming” drops in performance.