import os
import openai
from dotenv import load_dotenv, find_dotenv
= load_dotenv(find_dotenv()) # read local .env file
_ = os.environ['OPENAI_API_KEY'] openai.api_key
1 Introduction
Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. But using LLMs in isolation is often not enough in practice to create a truly powerful or useful business application - the real power comes when you are able to combine them with other sources of computation, services or knowledge. LangChain is an intuitive open-source python framework created to simplify the development of useful applications using large language models (LLMs), such as OpenAI or Hugging Face.
In this article, we will give an overview of the LangChain framework and then look in more detail at 3 key components: Models, Prompts and Parsers.
2 LangChain Overview & Key Components
2.1 Principles
The LangChain development team believes that the strongest and most distinctive LLM applications won’t just reference a language model, they’ll also be:
Data-aware: connect a language model to other sources of data
Agentic: allow a language model to interact with its environment
These concepts serve as the foundation for the LangChain framework.
2.2 Modules
The fundamental abstractions that serve as the foundation for any LLM-powered programme are known as LangChain modules. LangChain offers standardised, expandable interfaces for each module. Additionally, LangChain offers third-party integrations and complete implementations for commercial use.
The modules are (from least to most complex):
Models: Supported model types and integrations.
Prompts: Prompt management, optimization, and serialization.
Memory: Memory refers to state that is persisted between calls of a chain/agent.
Indexes: Language models become much more powerful when combined with application-specific data - this module contains interfaces and integrations for loading, querying and updating external data.
Chains: Chains are structured sequences of calls (to an LLM or to a different utility).
Agents: An agent is a Chain in which an LLM, given a high-level directive and a set of tools, repeatedly decides an action, executes the action and observes the outcome until the high-level directive is complete.
Callbacks: Callbacks let you log and stream the intermediate steps of any chain, making it easy to observe, debug, and evaluate the internals of an application.
2.3 Use Cases
LangChain provides ready to go built in implementations of common useful LLM usecases for the following:
Autonomous Agents: Autonomous agents are long-running agents that take many steps in an attempt to accomplish an objective. Examples include AutoGPT and BabyAGI.
Agent Simulations: Putting agents in a sandbox and observing how they interact with each other and react to events can be an effective way to evaluate their long-range reasoning and planning abilities.
Personal Assistants: One of the primary LangChain use cases. Personal assistants need to take actions, remember interactions, and have knowledge about your data.
Question Answering: Another common LangChain use case. Answering questions over specific documents, only utilizing the information in those documents to construct an answer.
Chatbots: Language models love to chat, making this a very natural use of them.
Querying Tabular Data: Recommended reading if you want to use language models to query structured data (CSVs, SQL, dataframes, etc).
Code Understanding: Recommended reading if you want to use language models to analyze code.
Interacting with APIs: Enabling language models to interact with APIs is extremely powerful. It gives them access to up-to-date information and allows them to take actions.
Extraction: Extract structured information from text.
Summarization: Compressing longer documents. A type of Data-Augmented Generation.
Evaluation: Generative models are hard to evaluate with traditional metrics. One promising approach is to use language models themselves to do the evaluation.
3 OpenAI Setup
For our examples we will be using OpenAi ChatGPT models, so lets load the required libs and config.
First we need to load certain python libs and connect the OpenAi api.
The OpenAi api library needs to be configured with an account’s secret key, which is available on the website.
You can either set it as the OPENAI_API_KEY
environment variable before using the library: !export OPENAI_API_KEY='sk-...'
Or, set openai.api_key
to its value:
import openai
openai.api_key = "sk-..."
4 Using OpenAI without LangChain
In earlier articles we looked at how to use the OpenAI API directly to use the ChatGPT model, so lets recap on how thats done without using a framework like LangChain.
We’ll define this helper function to make it easier to use prompts and examine outputs that are generated. GetCompletion is a function that just accepts a prompt and returns the completion for that prompt.
We will use OpenAI’s gpt-3.5-turbo
model.
def get_completion(prompt, model="gpt-3.5-turbo"):
= [{"role": "user", "content": prompt}]
messages = openai.ChatCompletion.create(
response =model,
model=messages,
messages=0,
temperature
)return response.choices[0].message["content"]
"What is 1+1?") get_completion(
'As an AI language model, I can tell you that the answer to 1+1 is 2.'
5 Use Case Example - Translating Customer Emails
Lets imagine we have a use case where we get multiple emails from customers in different languages. If our primary language is English it might be useful for us to convert all customer emails into English.
Lets have a bit of fun along the way, and create a customer email about a product in the ‘English Pirate’ Language.
5.1 Email Transformation using ChatGPT API
First we will use the ChatGPT API to do the task without LangChain.
= """
customer_email Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse,\
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""
Let’s say we want to transform this into American English, in a calm and respectful tone. We can define a style for our transformation thus:
= """American English \
style in a calm and respectful tone
"""
Now as we have in previous articles, manually construct a prompt for our LLM from these two parts:
= f"""Translate the text \
prompt that is delimited by triple backticks
into a style that is {style}.
text: ```{customer_email}```
"""
print(prompt)
Translate the text that is delimited by triple backticks
into a style that is American English in a calm and respectful tone
.
text: ```
Arrr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie! And to make matters worse,the warranty don't cover the cost of cleaning up me kitchen. I need yer help right now, matey!
```
Now let’s get the transformation from ChatGPT:
= get_completion(prompt) response
response
'I am quite upset that my blender lid came off and caused my smoothie to splatter all over my kitchen walls. Additionally, the warranty does not cover the cost of cleaning up the mess. Would you be able to assist me, please? Thank you kindly.'
5.2 Email Transformation using LangChain
Let’s try how we can do the same using LangChain.
First we need to load the LangChain library for OpenAI, this is basically a wrapper around the OpenAI API.
from langchain.chat_models import ChatOpenAI
# To control the randomness and creativity of the generated
# text by an LLM, use temperature = 0.0
= ChatOpenAI(temperature=0.0)
chat chat
ChatOpenAI(verbose=False, callbacks=None, callback_manager=None, client=<class 'openai.api_resources.chat_completion.ChatCompletion'>, model_name='gpt-3.5-turbo', temperature=0.0, model_kwargs={}, openai_api_key=None, openai_api_base=None, openai_organization=None, openai_proxy=None, request_timeout=None, max_retries=6, streaming=False, n=1, max_tokens=None)
Email Transformation using LangChain Create Prompt template
LangChain allows us to create a template object for the prompt, in doing so this creates something we can more easily re-use.
= """Translate the text \
template_string that is delimited by triple backticks \
into a style that is {style}. \
text: ```{text}```
"""
from langchain.prompts import ChatPromptTemplate
= ChatPromptTemplate.from_template(template_string) prompt_template
0].prompt prompt_template.messages[
PromptTemplate(input_variables=['style', 'text'], output_parser=None, partial_variables={}, template='Translate the text that is delimited by triple backticks into a style that is {style}. text: ```{text}```\n', template_format='f-string', validate_template=True)
0].prompt.input_variables prompt_template.messages[
['style', 'text']
Using this syntax for the template, the object knows there are 2 input variables.
We can now define the style and combine this with the template to create the prompt in a more structured way than before.
= """American English \
customer_style in a calm and respectful tone
"""
= """
customer_email Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse, \
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""
= prompt_template.format_messages(
customer_messages =customer_style,
style=customer_email) text
print(type(customer_messages))
print(type(customer_messages[0]))
<class 'list'>
<class 'langchain.schema.HumanMessage'>
print(customer_messages[0])
content="Translate the text that is delimited by triple backticks into a style that is American English in a calm and respectful tone\n. text: ```\nArrr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie! And to make matters worse, the warranty don't cover the cost of cleaning up me kitchen. I need yer help right now, matey!\n```\n" additional_kwargs={} example=False
Lets now get the model response.
# Call the LLM to translate to the style of the customer message
= chat(customer_messages) customer_response
print(customer_response.content)
I'm really frustrated that my blender lid flew off and made a mess of my kitchen walls with smoothie. To add to my frustration, the warranty doesn't cover the cost of cleaning up my kitchen. Can you please help me out, friend?
The advantage of using LangChain this way means we can reuse this approach with just a few changes.
Let’s imagine a different customer message we want to transform.
= """Hey there customer, \
service_reply the warranty does not cover \
cleaning expenses for your kitchen \
because it's your fault that \
you misused your blender \
by forgetting to put the lid on before \
starting the blender. \
Tough luck! See ya!
"""
= """\
service_style_pirate a polite tone \
that speaks in English Pirate\
"""
= prompt_template.format_messages(
service_messages =service_style_pirate,
style=service_reply)
text
print(service_messages[0].content)
Translate the text that is delimited by triple backticks into a style that is a polite tone that speaks in English Pirate. text: ```Hey there customer, the warranty does not cover cleaning expenses for your kitchen because it's your fault that you misused your blender by forgetting to put the lid on before starting the blender. Tough luck! See ya!
```
= chat(service_messages)
service_response print(service_response.content)
Ahoy there, me hearty customer! I be sorry to inform ye that the warranty be not coverin' the expenses o' cleaning yer galley, as 'tis yer own fault fer misusin' yer blender by forgettin' to put the lid on afore startin' it. Aye, tough luck! Farewell and may the winds be in yer favor!
As you build more sophisticated applications using prompts and LLM’s, prompts can become longer and more detailed. Prompt Templates can help with efficiency to be able to re-use good prompts. LangChain conveniently provides pre-defined prompts for common operations to speed up development such as text summarisation, question-answering, and connecting to databases etc.
Output Parsers
LangChain also supports output parsing. When you build a complex application using an LLM, you often instruct the LLM to generate the output in a certain format - for example using specific keywords to separate different parts of the response. One format for example is called ‘Chain of Thought Reasoning’ (ReAct) which uses keywords such as Thought, Action & Observation encourages the model to take more time thinking through a problem/request/prompt which tends to lead to better outputs and solutions as we learned in a previous article. Using LangChain can help us ensure we are using some of the best and most upto date methods for LLM prompting - much like the PyCaret library does for conventional machine learning.
Let’s look at an example and start with defining how we would like the LLM output to look like. Let’s say we have a JSON output from the LLM and we would like to be able to parse that output.
For example lets say we want to extract information from a product review, and output that in a particular JSON format:
{"gift": False,
"delivery_days": 5,
"price_value": "pretty affordable!"
}
{'gift': False, 'delivery_days': 5, 'price_value': 'pretty affordable!'}
Let’s also define a customer review text and a prompt template we want to use that will help generate that JSON output.
= """\
customer_review This leaf blower is pretty amazing. It has four settings:\
candle blower, gentle breeze, windy city, and tornado. \
It arrived in two days, just in time for my wife's \
anniversary present. \
I think my wife liked it so much she was speechless. \
So far I've been the only one using it, and I've been \
using it every other morning to clear the leaves on our lawn. \
It's slightly more expensive than the other leaf blowers \
out there, but I think it's worth it for the extra features.
"""
= """\
review_template For the following text, extract the following information:
gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.
delivery_days: How many days did it take for the product \
to arrive? If this information is not found, output -1.
price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.
Format the output as JSON with the following keys:
gift
delivery_days
price_value
text: {text}
"""
from langchain.prompts import ChatPromptTemplate
= ChatPromptTemplate.from_template(review_template)
prompt_template print(prompt_template)
input_variables=['text'] output_parser=None partial_variables={} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['text'], output_parser=None, partial_variables={}, template='For the following text, extract the following information:\n\ngift: Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.\n\ndelivery_days: How many days did it take for the product to arrive? If this information is not found, output -1.\n\nprice_value: Extract any sentences about the value or price,and output them as a comma separated Python list.\n\nFormat the output as JSON with the following keys:\ngift\ndelivery_days\nprice_value\n\ntext: {text}\n', template_format='f-string', validate_template=True), additional_kwargs={})]
Let’s now generate the JSON response
= prompt_template.format_messages(text=customer_review)
messages = ChatOpenAI(temperature=0.0)
chat = chat(messages)
response print(response.content)
{
"gift": true,
"delivery_days": 2,
"price_value": ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]
}
So this looks like a JSON but is it? let’s check the type
type(response.content)
str
Because its a string and not a JSON dictionary, we can’t index into it to get the values.
# We will get an error by running this line of code
# because 'gift' is not a dictionary
# 'gift' is a string
'gift') response.content.get(
AttributeError: 'str' object has no attribute 'get'
Parse the LLM output string into a Python dictionary
So we can use LangChain’s parser to help with this.
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser
So for each of the parts of the JSON we want we can define a text schema. These tell the library what we want to parse and how.
= ResponseSchema(name="gift",
gift_schema ="Was the item purchased\
description as a gift for someone else? \
Answer True if yes,\
False if not or unknown.")
= ResponseSchema(name="delivery_days",
delivery_days_schema ="How many days\
description did it take for the product\
to arrive? If this \
information is not found,\
output -1.")
= ResponseSchema(name="price_value",
price_value_schema ="Extract any\
description sentences about the value or \
price, and output them as a \
comma separated Python list.")
= [gift_schema,
response_schemas
delivery_days_schema, price_value_schema]
Now that we have defined the schema’s for each of the parts we want, LangChain can help generate the prompt that will put these together to generate the prompt we need to generate our desired output. The output parser will basically tell you what kind of prompt you need to send to the LLM.
= StructuredOutputParser.from_response_schemas(response_schemas) output_parser
= output_parser.get_format_instructions() format_instructions
Let’s have a look at the format instructions for the prompt our parser has generated to use for our LLM.
print(format_instructions)
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":
```json
{
"gift": string // Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.
"delivery_days": string // How many days did it take for the product to arrive? If this information is not found, output -1.
"price_value": string // Extract any sentences about the value or price, and output them as a comma separated Python list.
}
```
Let’s now put these format instructions together with the prompt template and submit it to the LLM.
= """\
review_template_2 For the following text, extract the following information:
gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.
delivery_days: How many days did it take for the product\
to arrive? If this information is not found, output -1.
price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.
text: {text}
{format_instructions}
"""
= ChatPromptTemplate.from_template(template=review_template_2)
prompt
= prompt.format_messages(text=customer_review,
messages =format_instructions) format_instructions
print(messages[0].content)
For the following text, extract the following information:
gift: Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.
delivery_days: How many days did it take for the productto arrive? If this information is not found, output -1.
price_value: Extract any sentences about the value or price,and output them as a comma separated Python list.
text: This leaf blower is pretty amazing. It has four settings:candle blower, gentle breeze, windy city, and tornado. It arrived in two days, just in time for my wife's anniversary present. I think my wife liked it so much she was speechless. So far I've been the only one using it, and I've been using it every other morning to clear the leaves on our lawn. It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features.
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":
```json
{
"gift": string // Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.
"delivery_days": string // How many days did it take for the product to arrive? If this information is not found, output -1.
"price_value": string // Extract any sentences about the value or price, and output them as a comma separated Python list.
}
```
= chat(messages) response
Let’s see what response we got for our prompt:
print(response.content)
```json
{
"gift": true,
"delivery_days": "2",
"price_value": ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]
}
```
Now we can use the output parser we created earlier to output a dict, and notice its of type dict not string - and so we can extract the different value parts.
= output_parser.parse(response.content) output_dict
output_dict
{'gift': True,
'delivery_days': '2',
'price_value': ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]}
type(output_dict)
dict
'delivery_days') output_dict.get(
'2'
6 Acknowledgements
I’d like to express my thanks to the wonderful LangChain for LLM Application Development Course by DeepLearning.ai - which i completed, and acknowledge the use of some images and other materials from the course in this article.