FEATURE
Agentic AI: What Librarians Need to Know About What’s Coming Next
by Amy Affelt
Google’s agentic AI initiatives will also have an impact on the direct work of librarians and information professionals. |
 |
 |
 |
 |
 |
It was a startingly bold statement and a harbinger of a brave new world to come when, in a May 2023 interview with CNBC, Bill Gates declared, “Whoever wins the personal agent, that’s the big thing, because you will never go to a productivity site again, you’ll never go to Amazon again” (cnbc.com/2023/05/22/bill-gates-predicts-the-big-winner-in-ai-smart-assistants.html). This prompted the article’s shocking headline: “Bill Gates Says A.I. Could Kill Google Search and Amazon as We Know Them.” Although the implication is that Google and Amazon’s days are numbered in a world of AI, what Gates really meant is that agentic AI will replace human searchers and do the work for us.
What Is Agentic AI?
Agentic AI represents the evolution of AI from interactive chatbots that had us frantically studying the art of prompting in order to craft the best AI queries and resulting interactions to receive the best results to “sophisticated ‘co-pilots’ and ‘agents’ capable of helping people with personal and professional tasks.” How will this work? To begin, it is key to understand the evolution of AI. Prior to ChatGPT becoming more commonly used by the public, AI tools were represented by digital assistants and AI assistants such as Siri and Alexa and those sometimes-annoying “May I help you?” pop-ups on websites. ChatGPT, as a generative AI (gen AI) tool, was, for many people, their first foray into gen AI. Gen AI is different in that it “generates” something new; it is not merely answering a question or returning the types of search results that we see after a basic Google search. As a November 2024 Bloomberg article explains, basic tasks such as writing a story or drawing a picture are carried out in gen AI during a process by which the user enters prompts for guidance on what to create, and the gen AI systems produce something new based on the vast amounts of information on which they were trained (bloomberg.com/news/articles/2024-11-15/how-openai-s-chatgpt-and-its-generative-ai-rivals-are-evolving?sref=jJkH0oaz). Ideally, the more the tool attempts these projects, the better the results produced over time. However, “[t]he search results we see from generative AI are best understood as a waypoint rather than a destination,” according to a January 2025 article by Mat Honan in MIT Technology Review (technologyreview.com/2025/01/06/1108679/ai-generative-search-internet-breakthroughs).
In a tip of the hat to the old Greyhound bus slogan, agentic AI encourages users to “leave the driving to us,” with the AI’s results being the destination. According to the MIT Technology Review article, the search itself is diminished in importance to a role as the path that incorporates disparate information to form the results. Agentic AI technology relies on a chain-of-thought process. In chain-of-thought systems, the AI agent pauses behind the scenes after reading a prompt, reviews related prompts, and decides on what it considers to be the best response. As this becomes more sophisticated, rather than “doing search and giving answers,” according to Google’s CEO Sundar Pichai, “Sometimes it’s going to be actions.”
Google’s Project Astra
Actions using Google’s gen AI tools are being developed by Google DeepMind’s Project Astra (deepmind.google/technologies/project-astra) and involve experiences in which Google Search, Google Lens, and Google Maps are integrated to formulate results. In one demonstration, Google held a phone camera up to a London bus and asked Astra if that bus could go to a particular named destination (businessinsider.com/google-gemini-2-0-ai-agents-2024-12). These types of interactions and results could make Project Astra a killer app, according to a December 2024 MIT Technology Review article (technologyreview.com/2024/12/11/1108493/googles-new-project-astra-could-be-generative-ais-killer-app). The Astra app remembers previous conversations and the previous 10 minutes of video; the MIT Technology Review reporter was shown a promo video in which “Astra tells the person giving the demo where she had left her glasses, having spotted them on a desk a few seconds earlier.”
Some of the most remarkable Astra results rely on the camera as conduit, as demonstrated by Astra product manager Bibo Xu to MIT Technology Review reporter Will Douglas Heaven. Xu held a phone camera over a recipe in a cookbook, and Astra promptly read the list of ingredients and recalled them. When Xu chided Astra for missing a few, Astra replied by stating the two that were left out. In contrast with hallucinations created by gen AI, this felt “more like coaching a child than butting heads with broken software,” according to Heaven. Taking this query/project several steps further and demonstrating Astra’s memory/recall capabilities, Xu pointed her phone at a row of wine bottles and asked which would go best with chicken curry—the recipe with the ingredient list mentioned earlier. Astra chose a rioja vintage and gave supporting information for the choice. Xu asked for the cost of the bottle, and Astra, in true chain-of-thought fashion, paused, looked up prices online, and reported back.
In the January MIT Technology Review article, Honan shares several additional Astra applications. For travel planning, machine learning of individual preferences and previous interactions could lead to an all-inclusive, comprehensive experience that could be repeated again and again for multiple trips over time. For example, AI agents could conceivably respond to very basic prompts by searching for and booking flights based on parameters such as preferred airline, home airport, and number of stops. It could reserve hotel rooms based on the number of rooms needed, location and floor choice, and preferred traveler program memberships. It could go to your preferred car rental agency and book a reservation based on your preference for certain types of vehicles, whether or not you want to purchase additional insurance or pay ahead for fuel, if you need a car seat or ski rack, and myriad other specifications. Finally, after learning the types of restaurants you tend to frequent (fast casual, upscale, mom-and-pop, chain, etc.) and the typical number of guests in your party and usual mealtimes, it could make dinner reservations. If it had the ability to auto-pilot this task and perform it repeatedly until successful, this might eliminate the competition for tables that ensues when trying to snag a time at an extremely popular restaurant.
Honan also touches on a unique potential public health capability. He states that it might be possible for an AI agent to “monitor the sewage output of your home for certain diseases, and order tests and treatments in response.” While there are multiple externalities and steps to such a process, with the right type of detailed programming and oversight, this is an exciting possibility.
Have you ever insisted, “It was doing it the whole way here!” in a car dealership service bay when it is impossible to replicate those odd noises that have been emanating from your car for weeks? It often feels like after you finally snag an appointment at the dealership for service, your car is suddenly silent when the mechanic is listening. Enter agentic AI. Honan muses that it might be possible for the AI in your vehicle to record the sound and book that often-elusive service appointment.
In a similar vein, many of us believe that while it may be entirely possible to diagnose and fix mechanical problems on our own, extraneous circumstances such as lost documentation or confusing instructions (even when they are YouTube videos) undermine the attempt. Honan hypothesizes that agentic AI might be able to locate owners’ manuals or other instructions online and create customized repair videos for users. The interactive nature of these tools would allow the user to ask for clarification during different steps of the repair or how-to process and, if the resulting video is still confusing, work with the AI to start over to create a new video until there are successful instructions that allow completion of the repair.
Other Google Initiatives
An October 2024 The Information article details Google’s Project Jarvis, another vehicle in the world of “leave the driving to us” (theinformation.com/articles/google-preps-ai-that-takes-over-computers?rc=xfewmz). The Jarvis agent actually takes over the user’s computer to gather research, purchase products, and conduct other online transactions on their behalf. It begins by taking frequent screenshots of whatever is on the screen of the user’s computer. It then interprets what it sees and begins to click buttons on screen or type in a text field. According to Google product developers, the main objective of Jarvis is to help people with everyday tasks, such as returning a pair of shoes, which was the example provided by Pichai at a 2024 Google developers conference. This Jarvis-fueled version of Gemini would search your email for the receipt, read the confirmation email to locate the order number, fill out the return form, and schedule the pickup with the delivery service (youtube.com/watch?v=zRY_T-hBp74).
Google’s agentic AI initiatives will also have an impact on the direct work of librarians and information professionals. We need to be aware of the capabilities and limitations of deep research, which—in response to a research query—crafts a “multi-step research plan,” by doing repeated searches on the internet for information on a topic and prepares a report of key findings with links to sources. The user can ask for edits, as well as an expansion or tweaking of individual sections. Once approved, the resulting report can then be uploaded to Google Docs (theverge.com/2024/12/11/24318217/google-gemini-advanced-deep-research-launch). According to Business Insider, the deep research feature searches and retrieves relevant information, using what it finds to start new searches based on what it has learned (businessinsider.com/google-gemini-2-0-ai-agents-2024-12).
We are starting to hear rumblings that podcasts are the new news. According to Pew Research Center, 1 in 5 Americans listens to a podcast every day. Of those listeners, only 20% stated that the podcast they listen to is from a news organization, but two-thirds stated that they have heard the news discussed on a podcast, and 87% believe that the news they hear on podcasts is “mostly accurate.” Librarians who provide news updates and alerts might want to experiment with Google’s NotebookLM (notebooklm.google), which harnesses the power of AI to create podcasts that can be used as research materials. NotebookLM allows the user to upload files in varying formats, including PDF and audio. The information contained in the files is then summarized by Gemini 1.5, which also provides links to the underlying sources in an attempt to help users gain confidence. Exact quotes from those underlying sources are revealed. By clicking on “audio overview,” a “deep dive discussion” (aka a podcast) is created. To search the podcast content as part of a research project, you can then upload the NotebookLM file to YouTube, generate a text file, and do a keyword search using CTRL + F.
Developments by Other Companies
Google is a juggernaut in the agentic AI space, but upstarts are developing competing tools as well. Anthropic’s Claude (anthropic.com/claude) uses similar technology to take screenshots of user actions and browse the web based on what it sees, by “clicking buttons and typing,” according to a Bloomberg article (bloomberg.com/news/articles/2024-10-22/anthropic-s-new-ai-tool-analyzes-your-screen-and-acts-on-your-behalf?sref=jJkH0oaz). Although Anthropic admits that it struggles with “scrolling, dragging and zooming” (anthropic.com/news/3-5-models-and-computer-use), the Bloomberg article summarizes a company-shared video demonstration involving an example of personal planning. Using a single, initial prompt stating that someone wanted to go on a hike with a friend to watch the sunrise over the Golden Gate Bridge, “Anthropic’s AI agent was able to search on Google to find hikes, map a route, check the sunrise time and send a calendar invite with details including what kind of clothing to wear.”
Perplexity Pages (perplexity.ai/hub/blog/perplexity-pages) is a competitor of Google in that it generates a webpage from user prompts that guide it in what to search for based on entered text regarding the research topic and target audience (general readers, subject experts, etc.). Perplexity’s AI searches for the information, creates the webpage by breaking the information down into sections, and adds some citations to sources and visuals. The text cannot be edited, and errors and mistakes cannot be corrected, but the visuals can be changed (theverge.com/2024/5/30/24167986/perplexity-ai-research-pages-school-report). In a review on The Verge, Emilia David finds that in several test rounds, the target audience chosen determined whether or not the text had jargon and also the types of websites used to cite the information. David says, “Pages does the surface-level googling and writing for you, but it isn’t research.” This article was published in May 2024, so it is possible that 1 year later, Perplexity Pages will have substantially improved.
OpenAI’s ChatGPT has entered the camera-conduit agentic arena as well. In a live-streamed event in December 2024, OpenAI unveiled a tool that allows ChatGPT to “process and speak to users about what it observes in video feeds in real time” (bloomberg.com/news/articles/2024-12-12/openai-s-chatgpt-will-respond-to-video-feeds-in-real-time?sref=jJkH0oaz). By watching what is being viewed on the user’s smartphone camera, it has the potential to serve as a personal communications aide (after viewing a message in an opened app, the user could ask for suggestions on how to reply to the message) or an instruction manual (“This is my coffee maker. How do I make coffee?”). ChatGPT is also beta testing a feature that would allow users to schedule generative tasks to be completed later, such as a prompt to “write a kid-friendly joke to deliver around bedtime at 8:00 p.m.” (bloomberg.com/news/articles/2025-01-14/chatgpt-will-soon-be-able-to-remind-you-to-walk-the-dog?sref=jJkH0oaz). In late January 2025, OpenAI unveiled the Operator tool, which partners with websites such as OpenTable and Instacart to autonomously make restaurant reservations and order groceries (nytimes.com/2025/01/23/technology/openai-operator-launch.html).
Conclusion
Agentic AI is a promising frontier, but pitfalls and limitations abound, and all of these tools are in nascent and continually evolving stages. Keeping on top of the latest AI developments is important for librarians and information professionals not only for experimenting with tools that may assist in our work, but also so that we have familiarity with resources that our requesters have tried. That way, we can best guide them in their selection and use and the evaluation of source credibility and information integrity. Because of our expertise in providing nuanced, in-depth research and analysis based on rock-solid, gold-standard sources, we can confidently tell our patrons that they will always receive highly analytical expert research and evaluation if they leave the driving to us, rather than to an AI agent.
|