I was thinking about this after a discussion at work about large language models (LLMs) - the initial scrape of the internet before Chat GPT become publicly usable was probably the last truly high quality scrape of human-made content any model will get. The second Chat GPT went public, the data pool became tainted with people publishing information from it. Future language models will have increasingly large percentages of their data tainted by AI-generated content, skewing the results away from how humans actually write. To get actual human content, they may need to turn to transcriptions of audio recordings or phone calls for training, and even that wouldn’t be quite correct because people write differently than they speak.

I sort of wonder if eventually people will start being influenced in how they choose to write based on seeing this AI content. If teachers use AI-generated texts in school lessons, especially at lower levels, will that effect how kids end up writing and formatting their work? It’s weird to think about the wider implications of how this AI stuff will ultimately impact society.

What’s your predictions? Is there a future where AI can get a clean, human-made scrape? Are we doomed to start writing like AIs?

  • You are right in assuming there will be a symbyosis between AI generated text and human generated text, but jumping from there to assume that we will be using solely AI generated text is wrong, in my opinion.

    AI generated content is not good enough on its own, despite what OpenAI marketing team wants you to think. No quality content is made by simply prompting chatGPT. Not just in writing, but in any field of knowledge, actually. Using chatGPT without some level of domain and fact checking on the subject you are prompting is a sure way to get screwed, as some lawyer in the USA will tell you.

    But going back to writing specifically, what we will see at first is actually an improvement on the overall quality of human generated writing, with AI offering a solution to the mechanical and usually boring side of writing good content, such as eloquence, syntax, clarity, etc.

    Then, what we will also begin to notice is the more frequent use of what I like to call shitstorming.

    Shitstorming consisting in prompting a LLM model to bring up ideas, drafts and opinions on subjects you want to write about, and have some understanding on. What you will receive in response will be a biased, somewhat lacking content, which will either inspire you to modify and refactor in a way that it makes sense, or make you so angry that you will have to write something better in response to it. Writer’s block will become a thing of the past.

    There are others aspects and nuances to this symbiosis, but to avoid going longer on an already long post, I would conclude by saying that this evolution will be a loop that will keep improving LLMs, while also improving human writing simply because we will continuously look for ways to make the content better, and more original.

    The bad side is that, for those that don’t know how to use the tool, the amount of lacking content and standardized communication will indeed flood the internet, but this will only serve to contrast original content to the point where we will immediatly recognize the two apart, much like we do with advertising nowdays.