For example, things like this:
Some models might be able to answer this precise question correctly, but they will still fail at many simple primary-school-level math questions.
More examples in my comment here.
and one more screenshot here
For example, things like this:
Some models might be able to answer this precise question correctly, but they will still fail at many simple primary-school-level math questions.
More examples in my comment here.
I also agree with your predictions and find the answers funny 😅😅