LLM Agents: When Large Language Models Do Stuff For You
Brad Nikkel
You can do a lot with chatbots—especially after several volleys back and forth of refining your prompts, sometimes spelling things out in ridiculous detail just to nudge your chatbot of choice toward your goal (usually some information you’re after).
But this question-reply cycle can grow tedious, and not everyone has the desire or the time to learn meticulous prompt engineering techniques, which is why some folks are trying to get large language models (LLMs) to handle most of this grunt work for them via (somewhat) set-it-and-forget-it autonomous “agents” that not only return the information we want but can also actually do things too.
Several increasingly active, experimental LLM agent-based projects have sprouted up recently, gaining significant momentum. So much so that Andrej Karpathy—former director of Artificial Intelligence (AI) at Tesla and now back with OpenAI, where he started—kicked off a recent LLM agent-focused hackathon at the AGI House with an intriguing forecast of agents’ potential, reckoning that Artificial General Intelligence (AGI) will likely involve some flavor of agent framework, possibly even “organizations or civilizations of digital entities.”
Bold, perhaps, but Karpathy’s vision of LLM agents’ promise isn’t precisely an outlier; sharing similar sentiments, Silicon Valley and venture capital firms are throwing gobs of resources at LLM agents.
To get a glimpse of what Karpathy and others are so excited about, we’ll explore several LLM agent projects, but let’s first hash out what exactly an LLM agent is (if you already know, jump ahead).
What’s with all this “Agent” Talk?
When you think of an agent, you might envision a video game character, a robot, a human, an insect, or something else that exhibits agency (i.e., a capacity to act). LLM agents, however, are closer to written versions of digital assistants like Alexa or Siri.
Before a flashback to Alexa or Siri’s lackluster “assistance” causes you to click away, hang with me. What has some folks so pumped about LLM agents’ potential is that they can tackle more multi-pronged, longer-term objectives than digital assistants currently handle (as you’ll learn in a bit, they can, for example, write mediocre programs and research reports, learn to play Minecraft, or—Hydra-like—spawn more LLM agents).
Since it’s a relatively new concept, folks are using many different terms for LLM agents, including “generative agents,” “task-driven agents,” “autonomous agents,” “intelligent agents,” “AutoGPTs,” and more. And some think all this “agent” talk is way off. Cognitive scientist Alison Gopnik, for example, argues that, rather than so-called “intelligent agents,” LLMs amount to cultural technology, a category into which she also stuffs writing, libraries, and search engines.
And if the term variability isn’t confusing enough, current definitions are also in flux but it’s tough to do better than OpenAI AI safety researcher Lilian Weng's definition of LLM agents as:
“Agent = LLM + memory + planning skills + tool use”
Lilian Weng
While there’s not yet a strong consensus on terms or definitions, for this article, at least, let’s sidestep this minefield of nuance and, borrowing from Weng, roughly define an LLM agent as an LLM-based entity that can manipulate its environment in some way (e.g., a computer, web browser, other agents, etc.) via tools (e.g., terminal, REPL, API calls, etc.) toward some goal (e.g., book a flight, write a business plan, research some topic, etc.) with significant autonomy over a long time horizon (i.e., with minimal human input, an agent decomposes a broad goal into small sub-tasks and decides when that goal is satisfied).
LLM Agents’ Key Components
We now have an idea of what LLM agents are, but how exactly do we go from LLM to an LLM agent? To do this, LLMs need two key tweaks.
First, LLM agents need a form of memory that extends their limited context window to “reflect” on past actions to guide future efforts. This often includes vector databases or employing multiple LLMs (e.g., one LLM as a decision-maker, another LLM as a memory bank).
Next, the LLM needs to be able to do more than yammer on all day.
When a crow carves a twig into a hook to snag tough-to-get bugs from curved crannies, we consider that crow an intelligent agent because it’s manipulating its environment to achieve a goal. Similarly, an LLM must create and employ tools to manipulate its environment to become an intelligent agent.
The common LLM-agent version of a crow’s carved twig is API calls. By calling existing programs, LLMs can retrieve additional information, execute actions, or—realizing no current options suffice—generate their own tool (e.g., code up a new function, test it, and run it).
Beyond allowing LLMs to manipulate their environment, tools also help overcome two inherent LLM weaknesses. They help LLMs utilize more updated information than the information they were trained on (e.g., ChatGPT’s now extinct Bing plugin), and tools allow LLMs to delegate tasks that they lack skills at handling to more capable programs (e.g., ChatGPT’s Wolfram plugin for math).
With our working definition of LLM agents and an overview of their essential components, let’s check out some recent LLM agent projects.
Academic LLM Agent Projects
Though some of the independent LLM agent projects we’ll discuss later arrived on the scene basking in the hype, university groups are also cranking out interesting research on agents. Let’s check out a few of these first because some of what they explored reoccurs in independent projects.
From Language, “The Sims” Emerges
What happens when you craft a few dozen agents, toss them into a virtual, interactive environment reminiscent of "The Sims," and let them interact with one another via natural language?
Park et al. tested this. With each LLM agent (which they dubbed “generative agents”) modeled with an observe, plan, reflect loop, what emerged was surprisingly “believable proxies of human behavior,” including information diffusion (e.g., knowledge of who was running for mayor), relationship formation (i.e., remembering inter-agent conversations), and coordination (e.g., planning a party).