6 minutes
Intelligent agents guided by LLMs
Update: Trending on Hacker News, follow the discussion here.
I’ve built a small library to build agents which are controlled by large language models (LLMs) which is heavily inspired by langchain.
You can find that small library with all the code on Github.
The goal was to get a better grasp of how such an agent works and understand it all in very few lines of code.
Langchain is great, but it already has a few more files and abstraction layers, so I thought it would be nice to build the most important parts of a simple agent from scratch.
How the agent works
The agent works like this:
- It gets instructed by a prompt which tells it the basic way to solve a task using tools
- Tools are custom build components which the agent can use
- So far, I’ve implemented the ability to execute Python code in a REPL, to use the Google search and to search on Hacker News
- The agent runs in a loop of Thought, Action, Observation, Thought, …
- The Thought and Action are the parts which are generated by an LLM
- The Observation is generated by using a tool (for example the print outputs of Python or the text result of a Google search)
- The LLM gets the new information appended to the prompt in each loop cycle and thus can act on that information
- Once the agent has enough information it provides the final answer
The prompt
The prompt for the agent looks like this:
A couple of things are passed into the prompt which are:
- Today’s date: I’ve added that one, because the LLM would otherwise sometimes argue that ’this makes no sense, because the date is in the future’ when it got new information
- The description of tools: each tool has a name and a description of how it can be used - this is to make the LLM aware of the tool to be able to use it
- The tool names: this is repeated again to indicate to the LLM that it should always choose exactly one of the tools
- The question: this is the user input - the thing you are interested in
- The previous responses: as the agent reasons and uses tools, we add the gained information here in a loop
Tool usage
A tool is just a little Python class which implements the method use(input_text: str) -> str
, has a name and a description.
The name and description help the LLM to understand what it can do and the use
method is what actually gets executed in the Observation step.
For example this is the essential part of the search tool:
The agent loop
The main loop of the agent is very simple and goes like this:
In each loop, we add the previous responses to the prompt template, so the LLM has context.
The agent then decides it’s next action which I’ll talk about in a second.
If it found the Final Answer
, then the loop ends, else if it selected a valid tool, we use the tool and retrieve an observation from the tool.
We then append the observed value to the text generated by the LLM and add the next Thought:
to trigger the LLM to reason again.
Finally, we add the generated reasoning, the tool observation and the starting of a thought to the previous responses, so it is used in the next loop.
The trick to avoid hallucination
So how exactly is the tool selected by the LLM?
This actually involves a little trick!
Here is the method which does this:
The prompt which we have generated in our loop is passed to the LLM, so it follows along and generates the next thought, action and action input.
However, it normally wouldn’t stop there, but rather generate more text etc.
So the trick is that we send a stop
pattern which in this case is when we see Observation:
in it’s output, because then it has created a Thought, an Action and used a tool and hallucinates the Observation:
itself :D
We don’t want it to hallucinate it’s tool usage, but rather we want to actually use the tool, so that’s why we stop it there.
Instead then, we execute the tool and append the actual Observation:
to the prompt which is then passed to the LLM in the next loop again.
That way we can intersperse actual information into the hallucination train.
This stop
parameter is a normal parameter of the OpenAI API by the way, so nothing special to implement there.
The _parse
method is just a small utility which extracts the Action:
and Action Input:
from the text the LLM generated so we can figure out which tool to use and what to send to the tool.
That’s all there is for a small agent which can search and execute code guided by LLMs.
Examples
To be up front: it’s far from perfect and fails a lot, but it’s still fun to watch it.
Here are some samples where it worked quite well:


Check it out yourself here on Github.