Photo by Kenny Eliason
As amazing as Large language models (LLMs) are they have some major challenges. They often “hallucinate” or make assertions that sound believable but aren’t actually true. This leads to mistakes in math, challenges combining multiple skills, and some commonsense reasoning chains. Furthermore, many of the impressive capabilities of LLMs are only present in very large models. This large number of parameters and the need for data makes LLMs difficult and expensive to train, run, and keep updated.
In an effort to overcome these limitations several methods are currently being explored including:
- Chain of thought prompting: Asking the model to generate a chain of thought through prompting (e.g. “think step-by-step”) significantly improves its ability to perform complex reasoning.
- Retrieval Augmented Generation (RAG): adding relevant information into a model’s context window as part of the prompt produces better and more factual results.
- Leveraging External Tools: enabling models to decide which tools to use and when results in improved accuracy and performance.
In a survey review of these methodologies they have collectively been referred to as Augmented Language Models or ALMs which we will adopt as well.
Chain of thought prompting and RAG have been well covered recently so we’d like to dive into leveraging external tools a little more deeply.
A simple way to enhance the performance of LLMs is to provide them with the ability to use external tools such as search engines, calculators, or Wikipedia lookup. This is done by training a model to make API calls to tools and then is provided the result from that tool.
API calls can take many forms and to illustrate here are some examples from the Toolformer paper of an LLM calling a calculator tool and a Wikipedia lookup tool.
Out of 1400 participants, 400 (or [Calculator(400 / 1400)→ 0.29] 29%) passed the test.
The Brown Act is California’s law [WikiSearch(“Brown Act”) → The Ralph M. Brown Act is an act of the California State Legislature that guarantees the public’s right to attend and participate in meetings of local legislative bodies.] that requires legislative bodies, like city councils, to hold their meetings open to the public.
ALMs With Tools In Use Today
One of the best known examples of external tool use are ChatGPT Plugins where users can make plugins (external tools) available to the model for use during the chat session including a search enging and code interpreter. Similarly, Google’s Bard Extensions lets its model have access to a user’s documents as well as access several Google services such as Google flights and hotels, Youtube, and Maps for use in the chat session.
OpenAI has also brought the ability for developers to leverage external tools with its API via OpenAI functions for integration into their own applications. Developers can define their tools via function specifications provided to the model and the model can leverage these tools with a function call. For more detail see OpenAI’s cookbook How to call functions with chat models.
Several open source models are being shipped with enhanced tool usage capabilities. Functionary, Invocer, and Trelis have fine-tuned LLama2 models for function calling modeled after OpenAI’s functions.
Open source frameworks for building LLM powered applications (Haystack, LangChain, LLamaIndex etc.) have started to make it easier to leverage external tools/functions.
Bringing together the “reasoning” capabilites realized via chain of thought prompting with the ability to leverage external tools brings us to what some call “agents.” As explained in the paper “ReAct: Synergizing Reasoning and Acting in Language Models” agents leverage the model’s “reasoning” capability to decide what tools to leverage and in what order to achieve a task.
Microsoft has explored the concept of multiple “agents” which could be an LLM, tool, or human that can interact to solve tasks in AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation AutoGen.
Image source: AutoGen paper