large language models (LLM) vs. large multi-modal models (LMM)
Regardless, they both use an LLM as the main driver. Multi modal just means that the LLM is interfaced with generative and/or predictive AIs for other types of content like images, sound, video, etc.
This is using a generalist tool for a specialized job. I’d expect the limit for LMMs is telling you if your picture is a heart or a kidney… Maybe. With low accuracy. Diagnosing? lol, hell no.
You’ve quoted them stating they used LLMs while claiming they did not use a LLM? What am I missing here?
“L” “M” “M”
Which in this context just means multimodal LLM, correct?
Correct.
large language models (LLM) vs. large multi-modal models (LMM)
Regardless, they both use an LLM as the main driver. Multi modal just means that the LLM is interfaced with generative and/or predictive AIs for other types of content like images, sound, video, etc.
This is using a generalist tool for a specialized job. I’d expect the limit for LMMs is telling you if your picture is a heart or a kidney… Maybe. With low accuracy. Diagnosing? lol, hell no.