VectorStore-backed memory¶

The support for Cassandra vector store, available in LangChain, enables another interesting use case, namely a chat memory buffer that injects the most relevant past exchanges into the prompt, instead of the most recent (as most other memories do). This enables retrieval of related context arbitrarily far back in the chat history.

All you need is to instantiate a Cassandra vector store and wrap it in a VectorStoreRetrieverMemory type of memory, provided by LangChain.

In [1]:

Copied!





from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.memory import VectorStoreRetrieverMemory
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.memory import VectorStoreRetrieverMemory
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate

In [2]:

Copied!

from langchain.vectorstores.cassandra import Cassandra
from langchain.vectorstores.cassandra import Cassandra

As usual, a database connection is needed to access Cassandra. The following assumes that a vector-search-capable Astra DB instance is available. Adjust as needed.

In [3]:

Copied!





from cqlsession import getCQLSession, getCQLKeyspace
cqlMode = 'astra_db' # 'astra_db'/'local'
session = getCQLSession(mode=cqlMode)
keyspace = getCQLKeyspace(mode=cqlMode)
from cqlsession import getCQLSession, getCQLKeyspace
cqlMode = 'astra_db' # 'astra_db'/'local'
session = getCQLSession(mode=cqlMode)
keyspace = getCQLKeyspace(mode=cqlMode)

Both an LLM and an embedding function are required.

Below is the logic to instantiate the LLM and embeddings of choice. We chose to leave it in the notebooks for clarity.

In [4]:

Copied!





import os
from llm_choice import suggestLLMProvider

llmProvider = suggestLLMProvider()
# (Alternatively set llmProvider to 'GCP_VertexAI', 'OpenAI', 'Azure_OpenAI' ... manually if you have credentials)

if llmProvider == 'GCP_VertexAI':
    from langchain.llms import VertexAI
    from langchain.embeddings import VertexAIEmbeddings
    llm = VertexAI()
    myEmbedding = VertexAIEmbeddings()
    print('LLM+embeddings from Vertex AI')
elif llmProvider == 'OpenAI':
    os.environ['OPENAI_API_TYPE'] = 'open_ai'
    from langchain.llms import OpenAI
    from langchain.embeddings import OpenAIEmbeddings
    llm = OpenAI(temperature=0)
    myEmbedding = OpenAIEmbeddings()
    print('LLM+embeddings from OpenAI')
elif llmProvider == 'Azure_OpenAI':
    os.environ['OPENAI_API_TYPE'] = 'azure'
    os.environ['OPENAI_API_VERSION'] = os.environ['AZURE_OPENAI_API_VERSION']
    os.environ['OPENAI_API_BASE'] = os.environ['AZURE_OPENAI_API_BASE']
    os.environ['OPENAI_API_KEY'] = os.environ['AZURE_OPENAI_API_KEY']
    from langchain.llms import AzureOpenAI
    from langchain.embeddings import OpenAIEmbeddings
    llm = AzureOpenAI(temperature=0, model_name=os.environ['AZURE_OPENAI_LLM_MODEL'],
                      engine=os.environ['AZURE_OPENAI_LLM_DEPLOYMENT'])
    myEmbedding = OpenAIEmbeddings(model=os.environ['AZURE_OPENAI_EMBEDDINGS_MODEL'],
                                   deployment=os.environ['AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT'])
    print('LLM+embeddings from Azure OpenAI')
else:
    raise ValueError('Unknown LLM provider.')
import os
from llm_choice import suggestLLMProvider

llmProvider = suggestLLMProvider()
# (Alternatively set llmProvider to 'GCP_VertexAI', 'OpenAI', 'Azure_OpenAI' ... manually if you have credentials)

if llmProvider == 'GCP_VertexAI':
    from langchain.llms import VertexAI
    from langchain.embeddings import VertexAIEmbeddings
    llm = VertexAI()
    myEmbedding = VertexAIEmbeddings()
    print('LLM+embeddings from Vertex AI')
elif llmProvider == 'OpenAI':
    os.environ['OPENAI_API_TYPE'] = 'open_ai'
    from langchain.llms import OpenAI
    from langchain.embeddings import OpenAIEmbeddings
    llm = OpenAI(temperature=0)
    myEmbedding = OpenAIEmbeddings()
    print('LLM+embeddings from OpenAI')
elif llmProvider == 'Azure_OpenAI':
    os.environ['OPENAI_API_TYPE'] = 'azure'
    os.environ['OPENAI_API_VERSION'] = os.environ['AZURE_OPENAI_API_VERSION']
    os.environ['OPENAI_API_BASE'] = os.environ['AZURE_OPENAI_API_BASE']
    os.environ['OPENAI_API_KEY'] = os.environ['AZURE_OPENAI_API_KEY']
    from langchain.llms import AzureOpenAI
    from langchain.embeddings import OpenAIEmbeddings
    llm = AzureOpenAI(temperature=0, model_name=os.environ['AZURE_OPENAI_LLM_MODEL'],
                      engine=os.environ['AZURE_OPENAI_LLM_DEPLOYMENT'])
    myEmbedding = OpenAIEmbeddings(model=os.environ['AZURE_OPENAI_EMBEDDINGS_MODEL'],
                                   deployment=os.environ['AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT'])
    print('LLM+embeddings from Azure OpenAI')
else:
    raise ValueError('Unknown LLM provider.')

LLM+embeddings from OpenAI

Create the store¶

In [5]:

Copied!





table_name = 'vstore_memory_' + llmProvider
cassVStore = Cassandra(
    session=session,
    keyspace=keyspace,
    table_name=table_name,
    embedding=myEmbedding,
)

# just in case this demo runs multiple times
cassVStore.clear()
table_name = 'vstore_memory_' + llmProvider
cassVStore = Cassandra(
    session=session,
    keyspace=keyspace,
    table_name=table_name,
    embedding=myEmbedding,
)

# just in case this demo runs multiple times
cassVStore.clear()

Create the retriever and the memory¶

From the vector store a "retriever" is spawned. You'll keep the number of items to fetch intentionally very small for demonstration purposes.

Next, the retriever is wrapped in a VectorStoreRetrieverMemory:

In [6]:

Copied!

retriever = cassVStore.as_retriever(search_kwargs={'k': 3})
semanticMemory = VectorStoreRetrieverMemory(retriever=retriever)
retriever = cassVStore.as_retriever(search_kwargs={'k': 3})
semanticMemory = VectorStoreRetrieverMemory(retriever=retriever)

Create a fake "past conversation". Note how the topic of the discussion wanders to fixing one's PC in the last few exchanges:

In [7]:

Copied!





pastExchanges = [
    (
        {"input": "Hello, what is the biggest mammal?"},
        {"output": "The blue whale."},
    ),
    (
        {"input": "... I cannot swim. Actually I hate swimming!"},
        {"output": "I see."},
    ),
    (
        {"input": "I like mountains and beech forests."},
        {"output": "That's good to know."},
    ),
    (
        {"input": "Yes, too much water makes me uneasy."},
        {"output": "Ah, how come?."},
    ),
    (
        {"input": "I guess I am just not a seaside person"},
        {"output": "I see. How may I help you?"},
    ),
    (
        {"input": "I need help installing this driver"},
        {"output": "First download the right version for your operating system."},
    ),
    (
        {"input": "Good grief ... my keyboard does not work anymore!"},
        {"output": "Try plugging it in your PC first."},
    ),
]
pastExchanges = [
    (
        {"input": "Hello, what is the biggest mammal?"},
        {"output": "The blue whale."},
    ),
    (
        {"input": "... I cannot swim. Actually I hate swimming!"},
        {"output": "I see."},
    ),
    (
        {"input": "I like mountains and beech forests."},
        {"output": "That's good to know."},
    ),
    (
        {"input": "Yes, too much water makes me uneasy."},
        {"output": "Ah, how come?."},
    ),
    (
        {"input": "I guess I am just not a seaside person"},
        {"output": "I see. How may I help you?"},
    ),
    (
        {"input": "I need help installing this driver"},
        {"output": "First download the right version for your operating system."},
    ),
    (
        {"input": "Good grief ... my keyboard does not work anymore!"},
        {"output": "Try plugging it in your PC first."},
    ),
]

Insert these exchanges into the memory:

In [8]:

Copied!

for exI, exO in pastExchanges:
    semanticMemory.save_context(exI, exO)
for exI, exO in pastExchanges:
    semanticMemory.save_context(exI, exO)

Given a conversation input, the load_memory_variables performs a semantic search and comes up with relevant items from the memory, regardless of their order:

In [9]:

Copied!

QUESTION = "Can you suggest me a sport to try?"
QUESTION = "Can you suggest me a sport to try?"

In [10]:

Copied!

print(semanticMemory.load_memory_variables({"prompt": QUESTION})["history"])
print(semanticMemory.load_memory_variables({"prompt": QUESTION})["history"])

input: ... I cannot swim. Actually I hate swimming!
output: I see.
input: I guess I am just not a seaside person
output: I see. How may I help you?
input: I like mountains and beech forests.
output: That's good to know.

Usage in a conversation chain¶

This semantic memory element can be used within a full conversation chain.

In the following you'll create a custom prompt and a ConversationChain out of it, attaching the latter to the vector-store-powered memory seen above:

In [11]:

Copied!





semanticMemoryTemplateString = """The following is a between a human and a helpful AI.
The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know.

The AI can use information from parts of the previous conversation (only if they are relevant):
{history}

Current conversation:
Human: {input}
AI:"""

memoryPrompt = PromptTemplate(
    input_variables=["history", "input"],
    template=semanticMemoryTemplateString
)

conversationWithVectorRetrieval = ConversationChain(
    llm=llm, 
    prompt=memoryPrompt,
    memory=semanticMemory,
    verbose=True
)
semanticMemoryTemplateString = """The following is a between a human and a helpful AI.
The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know.

The AI can use information from parts of the previous conversation (only if they are relevant):
{history}

Current conversation:
Human: {input}
AI:"""

memoryPrompt = PromptTemplate(
    input_variables=["history", "input"],
    template=semanticMemoryTemplateString
)

conversationWithVectorRetrieval = ConversationChain(
    llm=llm, 
    prompt=memoryPrompt,
    memory=semanticMemory,
    verbose=True
)

Run the chain with the sports question:

In [12]:

Copied!

conversationWithVectorRetrieval.predict(input=QUESTION)
conversationWithVectorRetrieval.predict(input=QUESTION)


> Entering new ConversationChain chain...
Prompt after formatting:
The following is a between a human and a helpful AI.
The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know.

The AI can use information from parts of the previous conversation (only if they are relevant):
input: ... I cannot swim. Actually I hate swimming!
output: I see.
input: I guess I am just not a seaside person
output: I see. How may I help you?
input: I like mountains and beech forests.
output: That's good to know.

Current conversation:
Human: Can you suggest me a sport to try?
AI:

> Finished chain.

Out[12]:

' Sure, I can suggest some sports for you to try. Depending on your preferences, you could try hiking, running, biking, or even rock climbing. Do any of these sound interesting to you?'

Notice how new exchanges are automatically added to the memory:

In [13]:

Copied!

conversationWithVectorRetrieval.predict(input="Would I like a swim in a mountain lake?")
conversationWithVectorRetrieval.predict(input="Would I like a swim in a mountain lake?")


> Entering new ConversationChain chain...
Prompt after formatting:
The following is a between a human and a helpful AI.
The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know.

The AI can use information from parts of the previous conversation (only if they are relevant):
input: I like mountains and beech forests.
output: That's good to know.
input: Can you suggest me a sport to try?
response:  Sure, I can suggest some sports for you to try. Depending on your preferences, you could try hiking, running, biking, or even rock climbing. Do any of these sound interesting to you?
input: ... I cannot swim. Actually I hate swimming!
output: I see.

Current conversation:
Human: Would I like a swim in a mountain lake?
AI:

> Finished chain.

Out[13]:

" That depends on your preferences. If you don't like swimming, then a swim in a mountain lake may not be the best activity for you. However, if you enjoy the outdoors and the beauty of nature, then a swim in a mountain lake could be a great experience."

... so that now the most relevant items for the same question are changed:

In [14]:

Copied!

semanticMemory.retriever.get_relevant_documents(QUESTION)
semanticMemory.retriever.get_relevant_documents(QUESTION)

Out[14]:

[Document(page_content='input: Can you suggest me a sport to try?\nresponse:  Sure, I can suggest some sports for you to try. Depending on your preferences, you could try hiking, running, biking, or even rock climbing. Do any of these sound interesting to you?', metadata={}),
 Document(page_content="input: Would I like a swim in a mountain lake?\nresponse:  That depends on your preferences. If you don't like swimming, then a swim in a mountain lake may not be the best activity for you. However, if you enjoy the outdoors and the beauty of nature, then a swim in a mountain lake could be a great experience.", metadata={}),
 Document(page_content='input: ... I cannot swim. Actually I hate swimming!\noutput: I see.', metadata={})]

A counterexample¶

What would happen with a simpler memory element, which simply retrieves a certain number of most recent interactions?

Create and populate an instance of LangChain's ConversationTokenBufferMemory, limiting it to a maximum token length of 80 (roughly equivalent to the 3 fragments set for the semanticMemory object):

In [15]:

Copied!

from langchain.memory import ConversationTokenBufferMemory
from langchain.memory import ChatMessageHistory
from langchain.memory import ConversationTokenBufferMemory
from langchain.memory import ChatMessageHistory

In [16]:

Copied!





baseHistory = ChatMessageHistory()

recencyBufferMemory = ConversationTokenBufferMemory(
    chat_memory=baseHistory,
    max_token_limit=80,
    llm=llm,
)
baseHistory = ChatMessageHistory()

recencyBufferMemory = ConversationTokenBufferMemory(
    chat_memory=baseHistory,
    max_token_limit=80,
    llm=llm,
)

In [17]:

Copied!

for exI, exO in pastExchanges:
    recencyBufferMemory.save_context(exI, exO)
for exI, exO in pastExchanges:
    recencyBufferMemory.save_context(exI, exO)

Time to ask the same sports question. This is what will get injected into the prompt this time:

In [18]:

Copied!

print(recencyBufferMemory.load_memory_variables({"prompt": QUESTION})["history"])
print(recencyBufferMemory.load_memory_variables({"prompt": QUESTION})["history"])

AI: Ah, how come?.
Human: I guess I am just not a seaside person
AI: I see. How may I help you?
Human: I need help installing this driver
AI: First download the right version for your operating system.
Human: Good grief ... my keyboard does not work anymore!
AI: Try plugging it in your PC first.

... and this is the (rather generic) answer you'd get:

In [19]:

Copied!





conversationWithRecencyRetrieval = ConversationChain(
    llm=llm, 
    prompt=memoryPrompt,
    memory=recencyBufferMemory,
    verbose=True
)

conversationWithRecencyRetrieval.predict(input=QUESTION)
conversationWithRecencyRetrieval = ConversationChain(
    llm=llm, 
    prompt=memoryPrompt,
    memory=recencyBufferMemory,
    verbose=True
)

conversationWithRecencyRetrieval.predict(input=QUESTION)


> Entering new ConversationChain chain...
Prompt after formatting:
The following is a between a human and a helpful AI.
The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know.

The AI can use information from parts of the previous conversation (only if they are relevant):
AI: Ah, how come?.
Human: I guess I am just not a seaside person
AI: I see. How may I help you?
Human: I need help installing this driver
AI: First download the right version for your operating system.
Human: Good grief ... my keyboard does not work anymore!
AI: Try plugging it in your PC first.

Current conversation:
Human: Can you suggest me a sport to try?
AI:

> Finished chain.

Out[19]:

' Sure! What kind of sport are you interested in?'