Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology DBTA/Unisphere
PRIVACY/COOKIES POLICY
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



Vendors: For commercial reprints in print or digital form, contact LaShawn Fugate (lashawn@infotoday.com)

Magazines > Computers in Libraries > May 2025

Back Index Forward

SUBSCRIBE NOW!
Vol. 45 No. 4 — May 2025

INFOLIT LAND

The Great AI Rubbish Heap
by William Badke


AI is amazing. It can crunch vast hosts of data to find connections, make (sometimes erroneous) predictions, and revolutionize anything, from healthcare to business logistics. It does what no human brain can do. So why would I call many of its products a “rubbish heap”?

Here, I want to focus primarily on generative AI (gen AI), which produces “new” text, images, video, etc. based on prompts, because that’s where the rubbish primarily happens. While the seeming quality of products from gen AI is spectacular, it is now presenting threats to our information landscape that we are simply not recognizing properly, nor acting upon to prevent.

Why threats? When you release a new force into the world, it very quickly moves outside its created boundaries. There are strong AI economic interests as companies try to outspend one another to advantage themselves with whatever they think AI can do for them. Since AI is so easy to access and enlist, it is now being used to generate websites, social media, student research papers, and even some academic publications. We humans always tend toward an easier way to do what we do, and easy AI is flooding our world with content.

So, what kinds of threats am I considering here, and how does this affect information literacy?

MODEL COLLAPSE IS A THING

Gen AI is utterly dependent on a huge source of digital data. As long as that source keeps growing, for example through the purchase of content from academic publishers such as Taylor & Francis, the data is fresh and helpful. But what happens when AI-generated material begins flooding back into its knowledgebase?

Ilia Shumailo, et al. published “AI Models Collapse When Trained on Recursively Generated Data” in the July 24, 2024, issue of Nature (nature.com/articles/s41586-024-07566-y). From its abstract, I read: “We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear.” In other words, when AI-generated content becomes part of the data that is, in turn, used to train AI, it degenerates the AI output.

This is a serious problem. Imagine that you have a water source from which you prepare your food. You then wash your dishes in the same water source, resulting in cloudy and nasty water. This phenomenon is called “model collapse,” and it appears to affect all AI tools. The Nature article describes it this way: “Model collapse is a degenerative process affecting generations of learned generative models, in which the data they generate end up polluting the training set of the next generation.” If gen AI creates errors, for example, those errors then become part of the training data. ChatGPT and Claude currently make less use of current internet material, thus keeping their data purer, but this limits their timeliness.

A related problem, described in a Sept. 25, 2024, Nature article by Lexin Zhou et al. (“Larger and More Instructable Language Models Become Less Reliable”; nature.com/articles/s41586-024-07930-y), comes from the tendency for AI to give answers even when it does not have the data to provide those answers, thus creating falsehoods. While potentially correctible in future, this is another example of model collapse leading AI to perform more poorly than promised.

AI IS DERIVATIVE

Gen AI requires prompts, statements, or questions out of which it can set parameters for the information it will provide. Given a prompt, gen AI roams through its database, the vast ocean of available data, then makes connections, reorganizes the data, and comes up with a plausible response to our prompt. It is derivative precisely because it can only mine the information already in its knowledgebase, its large language model.

For many of our students, of course, their “research” is also derivative. They study up on a topic and synthesize an explanation. This echoes the very activity of gen AI. But can gen AI go farther than a student? Can it, for example, choose the best of several options on an issue, thus moving into problem solving? Yes. I asked Copilot to state the best option to combat homelessness, and it reported: “The Housing First model is widely regarded as the most effective.” But that finding, too, is derivative, because it depends on what the existing research is reporting.

Is genuine creative thought possible in gen AI? We could point to its ability to generate a poem or a song or an image, but all of these depend on the existing data it can access. Gen AI cannot yet rise to the point of stepping beyond what it “knows” to generate truly new knowledge. This means that gen AI, for all its seeming creativity, is producing more of the same without giving us anything new. Students who rely on it to write their papers are really just playing the game of “assemble the data,” as spectacular as that assembly may appear.

There have been numerous predictions that AI will develop “superintelligence,” the ability to match or surpass human reasoning abilities, including creativity. At this point, there are few signs that superintelligence is possible, because AI doesn’t actually understand anything, nor is it self-aware. Without a thinking mind that is aware of its identity, AI remains our tool, like a high-achieving dog that is dependent on us to feed and house it.

AI GENERATES ERROR AND (FRANKLY) CRAZINESS

Most of us know about gen AI’s “hallucinations,” by which it produces false (though often plausible) content, including fake citations, when it lacks existing information. Many a student has thus bought into falsehoods or presented citations to articles that don’t exist. This is concerning, because recent data shows that, not only are most students using AI in their work, but fully 69% of them focus AI on information search (Laura Ascione, “Most Students Are Using AI for Academics,” eCampus News, Sept. 12, 2024; ecampusnews.com/ai-in-education/2024/09/12/most-students-are-using-ai-for-academics).

But there is also craziness. Gen AI can get positively weird, spewing rubbish for unaccountable reasons. Several times over several days, I asked Copilot, “Who is William Badke?” It gave me reasonable answers, then it got exceedingly strange, generating this:

In fact, Badke has been like a Gandalf of the academic realm, guiding students through the treacherous forests of databases, the murky swamps of citations, and the perilous mountains of peer-reviewed articles. His trusty staff? Well, it’s probably a well-worn library card, but let’s pretend it’s a magical wand for dramatic effect. … Remember, my dear seeker of knowledge, when you’re lost in the labyrinth of citations or drowning in a sea of search results, channel your inner Badke.

I also asked Copilot to explain the theme of my little-known and out-of-print book, The Hitchhiker’s Guide to the Meaning of Everything (a spiritual approach to the meaning of life). Lacking relevant data, Copilot blithely told me that it was about research skills (extrapolating from other writings of mine). Wrong. These examples show how gen AI fills gaps in its knowledge with “truthy-like” data, stuff that makes enough sense to fit well into the pattern of our existing knowledge, though it’s still rubbish.

We live in an age of plausible liars, and gen AI is a master at this, simply because it has no compunctions nor conscience. It is set up always to provide an answer even if the data it accesses cannot supply one. When AI makes stuff up, it does so with conviction and believability.

Suppose you were mountain climbing with a friend who usually ensured that you were safe. But every once in a while, he would forget to tie off a rope or would imperfectly drive in a piton. If you could trust him 90% of the time, would you climb with him? No, because your 90% colleague would eventually kill you.

AI IS FLOODING THE WEB ON PURPOSE

The last time I Googled AI website builder , the first (sponsored) result was, “Top 10 best AI website builders.” This was followed by a multitude of companies offering to build website content from web-owner prompts. AI is permeating Wikipedia. Whole books generated by AI are being sold on markets such as Amazon. Even some academics are dabbling with enlisting AI content. At some point, all this is going to reach a critical mass in which we will be unable to easily distinguish human from AI content. Thus, we will actually be using content that, unknown to us, comes from the machine. As a partial corrective, Google is now “watermarking” AI text in websites. We’ll see how well that works. Given model collapse and the flaws we have already considered, gen AI content that floods our information landscape puts us under the influence of robots. The Terminator movies envisioned an AI that got so good it decided to eliminate the people who created it. Reality might be a whole lot simpler: AI only needs to infiltrate our knowledgebase until it becomes dominant. Even academics, in their AI use, risk intermingling AI content and human.

IMPLICATIONS

We worry about our students using gen AI as a tool to bypass their own critical thinking and research skill development to deliver a product they had little role in producing themselves. That is a valid concern. But an equally powerful risk is that we may soon lose the ability to determine what, in our knowledgebase, is human-created and what comes from AI. The rubbish heap from AI has already infiltrated the web, social media, and even some areas of academia. Our students may well, very soon, be enlisting AI-infiltrated content from their search tools without ever knowing that no human generated it.

We have seen instances in which gen AI is deficient. Now is the time to help our students grasp the fact that this shiny new toy is not nearly everything it appears to be. Above all, gen AI (a derivative and only partly trustworthy tool) must not become the means by which students rob themselves of their education, letting it generate their research content for them. This would have them both “relieving” themselves of the tasks of critical thinking and doing it with dodgy, bot-produced content.

More seriously, even if they don’t use AI, they may unwittingly be enlisting AI content. I don’t have a solution to this problem, but all of us need to be sounding the warning that AI will soon be hiding in damaging ways within our whole information landscape.

William Badke


William Badke
(badke@twu.ca) is associate librarian at Trinity Western University and the author of Research Strategies: Finding Your Way Through the Information Fog, 7th Edition (iUniverse.com, 2021).

Comments? Emall Marydee Ojala (marydee@xmission.com), editor, Online Searcher.