Why are we training AIs on reddit posts instead of Research Papers? We could be saving the world!

Melatonin ( @Melatonin@lemmy.dbzer0.com ) · 1 month ago

Why are we training AIs on reddit posts instead of Research Papers? We could be saving the world!

howrar ( @howrar@lemmy.ca ) · 1 month ago

I find it amusing that everyone is answering the question with the assumption that the premise of OP’s question is correct. You’re all hallucinating the same way that an LLM would.

LLMs are rarely trained on a single source of data exclusively. All the big ones you find will have been trained on a huge dataset including Reddit, research papers, books, letters, government documents, Wikipedia, GitHub, and much more.

Example datasets: