#AI #LLM is gunking up the web, especially for lesser-represented languages. Spammers are creating garbage English language content using LLMs, then translating it into *multiple languages* at the same time, using Machine Translation, presumably to generate clickbait ad revenue in several languages at once.
In English, such gunk accounts for some 9% of total sampled web content. But in languages with less representation on the Internet, the figures could be much higher. In Malay, it’s something like 26%, and in Swahili it’s nearly HALF of everything found on the web.
Paper [pdf]: “A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism”
This entry was edited (9 months ago)