Large language models, explained with a minimum of math and jargon

sizeoftheuniverse ( @sizeoftheuniverse@programming.dev ) · 1 year ago

Large language models, explained with a minimum of math and jargon

redcalcium ( @redcalcium@lemmy.institute ) · 1 year ago

We love this example because it illustrates just how difficult it will be to fully understand LLMs. The five-member Redwood team published a 25-page paper explaining how they identified and validated these attention heads. Yet even after they did all that work, we are still far from having a comprehensive explanation for why GPT-2 decided to predict Mary as the next word.

Current approach to ML model development has the same vibe with people writing a block of code that somehow works and then put comments like "no idea why but it works, modify at your own risk’