It is almost inevitable that AI will go insane. Or at least, AI in its current state. If you didn’t know, AIs (more appropriately called LLMs) are able to generate content by predicting the next few words that should appear based on structures and patterns that they have learned from the data that they have been trained on. In this way, they are able to seemingly understand and produce human-like text.

An AI is also able to refine its predictions to become more accurate and coherent in the content that it generates — but this process is only possible due to the data that they are trained on being generated by humans. As AIs are able to produce more and more content, more of the internet shall be filled with non-human text. This messes with an AI’s database itself, which makes it difficult for AI to improve.

As such, there is an inherent paradox lying at the core of their advancement. If one were to not do anything, there is a theoretical maximum amount by which an AI could improve. Yet if one were to allow AI to have more data, they would no longer be able to use only human-produced data to train off of, leading towards a downward spiral of an AI’s coherence, understanding, and cohesiveness, making the AI seem insane.

Of course, this is one of the reasons why there is a race to develop a tool which can accurately detect whether or not content is AI-generated. After all, if you can eliminate AI-generated content from the database, there is no problem with allowing the AI to use more data. However, if you have noticed, AI is very bad at determining whether or not an AI has created something. If you ask an AI if it has produced a piece of content, then most of the time, that AI will take credit for it, regardless of the truth. If it continues to prove impossible to create an AI which can differentiate between AI-generated text and human-produced text, then there is no way to solve this problem.

All of this leaves AI with two possible futures: either one in which AI will become nearly useless (by virtue of the fact that it will eventually be unable to improve) or insane.

Comments (2)

Want to leave a comment?

Sort by: Controversial
[–] freeBread 2 points *

There are still other places where training data can be gathered from. Aren't there?

reply permalink report gild save
[–] tanuki 2 points

My point is that although there is a ton of data that can be used, the data that we have currently will eventually run out, even if that takes many, many years. This is a very long-term thought experiment, and even if the problem of distinguishing between AI-generated content and human-produced content is never solved, this still probably wouldn't affect the advancement of AI within either of our lifetimes.

reply permalink report gild save