I asked Google Bard whether it thought Web Environment Integrity was a good or bad idea. Surprisingly, not only did it respond that it was a bad idea, it even went on to urge Google to drop the proposal.

    • That’s not entirely true.

      LLMs are trained to predict next word given context, yes. But in order to do that, they develop internal model that minimizes error across wide range of contexts - and emergent feature of this process is that the model DOES perform more than pure compression of the training data.

      For example, GPT-3 is able to calculate addition and subtraction problems that didn’t appear in the training dataset. This would suggest that the model learned how to perform addition and subtraction, likely because it was easier or more efficient than storing all of the examples from the training data separately.

      This is a simple to measure example, but it’s enough to suggests that LLMs are able to extrapolate from the training data and perform more than just stitch relevant parts of the dataset together.

        • GPT3 is pretty bad at it compared to alternatives (although it’s hard to compete with calculators on that field), but if it was just repeating after the training dataset it would be way worse. From the study I’ve linked in my other comment (https://arxiv.org/pdf/2005.14165.pdf):

          On addition and subtraction, GPT-3 displays strong proficiency when the number of digits is small, achieving 100% accuracy on 2 digit addition, 98.9% at 2 digit subtraction, 80.2% at 3 digit addition, and 94.2% at 3-digit subtraction. Performance decreases as the number of digits increases, but GPT-3 still achieves 25-26% accuracy on four digit operations and 9-10% accuracy on five digit operations, suggesting at least some capacity to generalize to larger numbers of digits.

          To spot-check whether the model is simply memorizing specific arithmetic problems, we took the 3-digit arithmetic problems in our test set and searched for them in our training data in both the forms “<NUM1> + <NUM2> =” and “<NUM1> plus <NUM2>”. Out of 2,000 addition problems we found only 17 matches (0.8%) and out of 2,000 subtraction problems we found only 2 matches (0.1%), suggesting that only a trivial fraction of the correct answers could have been memorized. In addition, inspection of incorrect answers reveals that the model often makes mistakes such as not carrying a “1”, suggesting it is actually attempting to perform the relevant computation rather than memorizing a table.

        • LLMs do replicate a small subset of human cognition, but not the full scope. This can result in human-like behavior, but it’s important to be aware of the limitations.

          The biggest limitation is the misalignment in goals. LLMs won’t perform a very deep analysis of their input because they don’t need to. Their goal isn’t honest discussion, a pursuit for truth, or even having a coherent set of beliefs about the world. Their only goal is to sound plausible. And, as it turns out, it’s not too hard to just bullshit your way through the Turing test.

      •  graham1   ( @graham1@gekinzuku.com ) 
        link
        fedilink
        English
        911 months ago

        Large language models literally do subspace projections on text to break it into contextual chunks, and then memorize the chunks. That’s how they’re defined.

        Source: the paper that defined the transformer architecture and formulas for large language models, which has been cited in academic sources 85,000 times alone https://arxiv.org/abs/1706.03762

        • Hey, that comment’s a bit off the mark. Transformers don’t just memorize chunks of text, they’re way more sophisticated than that. They use attention mechanisms to figure out what parts of the text are important and how they relate to each other. It’s not about memorizing, it’s about understanding patterns and relationships. The paper you linked doesn’t say anything about these models just regurgitating information.