LlamaIndex, Vector Store Quickstart¶
Create a Vector Store with LamaIndex and CassIO, and build a powerful search engine and text generator, backed by Astra DB / Apache Cassandra®.
Table of contents:¶
- Setup (database, LLM+embeddings)
- Create vector store
- Insert documents in it
- Answer questions using Vector Search
- Remove documents
- Cleanup
NOTE: this uses Cassandra's "Vector Similarity Search" capability. Make sure you are connecting to a vector-enabled database for this demo.
Setup¶
from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext
This is the LlamaIndex class providing support for Astra DB / Cassandra:
from llama_index.vector_stores import CassandraVectorStore
A database connection is needed to access Cassandra. The following assumes that a vector-search-capable Astra DB instance is available. Adjust as needed.
# Ensure loading of Astra DB credentials into environment variables:
import os
from dotenv import load_dotenv
load_dotenv("../../../.env")
import cassio
cassio.init(
database_id=os.environ["ASTRA_DB_ID"],
token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
keyspace=os.environ.get("ASTRA_DB_KEYSPACE"), # this is optional
)
Both an LLM and an embedding function are required.
Below is the logic to instantiate the LLM and embeddings of choice. We chose to leave it in the notebooks for clarity.
import os
from llm_choice import suggestLLMProvider
llmProvider = suggestLLMProvider()
# (Alternatively set llmProvider to 'GCP_VertexAI', 'OpenAI', 'Azure_OpenAI' ... manually if you have credentials)
if llmProvider == 'OpenAI':
os.environ['OPENAI_API_TYPE'] = 'open_ai'
from llama_index.llms import OpenAI
from llama_index.embeddings import OpenAIEmbedding
llm = OpenAI(temperature=0)
myEmbedding = OpenAIEmbedding()
vector_dimension = 1536
print("LLM+embeddings from OpenAI")
elif llmProvider == 'GCP_VertexAI':
from llama_index import LangchainEmbedding
# LlamaIndex lets you plug any LangChain's LLM+embeddings:
from langchain.llms import VertexAI
from langchain.embeddings import VertexAIEmbeddings
llm = VertexAI()
lcEmbedding = VertexAIEmbeddings()
vector_dimension = len(lcEmbedding.embed_query("This is a sample sentence."))
# ... if you take care of wrapping the LangChain embedding like this:
myEmbedding = LangchainEmbedding(lcEmbedding)
print("LLM+embeddings from Vertex AI")
elif llmProvider == 'Azure_OpenAI':
os.environ['OPENAI_API_TYPE'] = 'azure'
os.environ['OPENAI_API_VERSION'] = os.environ['AZURE_OPENAI_API_VERSION']
os.environ['OPENAI_API_BASE'] = os.environ['AZURE_OPENAI_API_BASE']
os.environ['OPENAI_API_KEY'] = os.environ['AZURE_OPENAI_API_KEY']
from llama_index import LangchainEmbedding
# LlamaIndex lets you plug any LangChain's LLM+embeddings:
from langchain.llms import AzureOpenAI
from langchain.embeddings import OpenAIEmbeddings
llm = AzureOpenAI(temperature=0, model_name=os.environ['AZURE_OPENAI_LLM_MODEL'],
engine=os.environ['AZURE_OPENAI_LLM_DEPLOYMENT'])
lcEmbedding = OpenAIEmbeddings(model=os.environ['AZURE_OPENAI_EMBEDDINGS_MODEL'],
deployment=os.environ['AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT'])
vector_dimension = len(lcEmbedding.embed_query("This is a sample sentence."))
# ... if you take care of wrapping the LangChain embedding like this:
myEmbedding = LangchainEmbedding(lcEmbedding)
print('LLM+embeddings from Azure OpenAI')
else:
raise ValueError('Unknown LLM provider.')
print(f"Vector dimension for this embedding model: {vector_dimension}")
LLM+embeddings from OpenAI Vector dimension for this embedding model: 1536
The following cell ensures that throughout all of LlamaIndex the LLM and embedding function instantiated above will be used:
from llama_index import ServiceContext
from llama_index import set_global_service_context
service_context = ServiceContext.from_defaults(
llm=llm,
embed_model=myEmbedding,
chunk_size=256,
)
set_global_service_context(service_context)
Create vector store¶
table_name = 'vs_ll_' + llmProvider
In the LlamaIndex abstractions, the CassandraVectorStore
instance is best wrapped into the creation of a "storage context", which you'll momentarily use to create the index proper:
storage_context = StorageContext.from_defaults(
vector_store=CassandraVectorStore(
table=table_name,
embedding_dimension=vector_dimension,
insertion_batch_size=15,
)
)
Insert documents¶
You'll want to be able to employ filtering on metadata in your question-answering process, so here is a simple way to associate a metadata dictionary to the ingested sources.
LlamaIndex supports a function that maps an input file name to a metadata dictionary, and will honour the latter when storing the source text along with the embedding vectors:
def my_file_metadata(file_name: str):
if "snow" in file_name:
return {"story": "snow_white"}
elif "rapunzel" in file_name:
return {"story": "rapunzel"}
else:
return {"story": "other"}
In the vector store, you can load, for example, the PDF files found in a given directory. Actually, you can load the documents and instantiate the "index" object at the same time:
documents = SimpleDirectoryReader(
'pdfs',
file_metadata=my_file_metadata
).load_data()
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
)
Re-opening a preexisting vector store¶
In most realistic cases, you want to access a vector store that was created and populated previously. To do so, you would create the CassandraVectorStore
as you saw already, and then use the from_vector_store
static method of VectorStoreIndex, obtaining an "index" that you can use exactly like the one you just created:
# This is how you would get an index for a pre-existing LlamaIndex vector store:
vector_store = CassandraVectorStore(
table=table_name,
embedding_dimension=vector_dimension,
)
index_from_preexisting = VectorStoreIndex.from_vector_store(
vector_store=vector_store
)
# You can try replacing "index_from_preexisting" in place of
# "index" everywhere in the next cells ... e.g.:
# query_engine = index_from_preexisting.as_query_engine(...)
Answer questions¶
Everything is ready to start asking questions. This cell allows to search for the answer over the whole indexed corpus:
question_1 = "Who is the antagonist of the young lady?"
print(f"\nQuestion on the whole store: {question_1}\n ==> ", end='')
query_engine_all = index.as_query_engine(similarity_top_k=6)
response_from_all = query_engine_all.query(question_1)
print(response_from_all.response.strip())
Question on the whole store: Who is the antagonist of the young lady? ==> The step-mother is the antagonist of the young lady.
Metadata filtering¶
You can, instead, constrain the lookup with filters on the metadata.
Here is the same question, limited to one of the input documents through a filtering condition:
from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters
print(f"\nQuestion on the 'red_hood' document: {question_1}\n ==> ", end='')
query_engine_doc1 = index.as_query_engine(
similarity_top_k=6,
filters=MetadataFilters(filters=[
ExactMatchFilter(key="story", value="snow_white"),
])
)
response_doc1 = query_engine_doc1.query(question_1)
print(response_doc1.response.strip())
Question on the 'red_hood' document: Who is the antagonist of the young lady? ==> The step-mother is the antagonist of the young lady.
And here the very same process, this time limiting the sources to the other input PDF:
print(f"\nQuestion on the 'rapunzel' document: {question_1}\n ==> ", end='')
query_engine_doc2 = index.as_query_engine(
similarity_top_k=6,
filters=MetadataFilters(filters=[
ExactMatchFilter(key="story", value="rapunzel"),
])
)
response_doc2 = query_engine_doc2.query(question_1)
print(response_doc2.response.strip())
Question on the 'rapunzel' document: Who is the antagonist of the young lady? ==> The antagonist of the young lady is the witch.
MMR (Maximum-marginal-relevance) method¶
In many cases, using the MMR method enhances the quality of the answers.
In short, the method, when running the search on the source which underpins the answer generation, tries to pick chunks of text that, while still relevant, are as diverse from each other as possible.
In this cell, you can see it in action combined with metadata filtering:
print(f"\nQuestion on 'snow_white', MMR method: {question_1}\n ==> ", end='')
query_engine_doc2_mmr = index.as_query_engine(
similarity_top_k=6,
vector_store_query_mode="mmr",
vector_store_kwargs={
"mmr_prefetch_k": 20,
},
filters=MetadataFilters(filters=[
ExactMatchFilter(key="story", value="snow_white"),
])
)
response_doc2_mmr = query_engine_doc2_mmr.query(question_1)
print(response_doc2_mmr.response.strip())
Question on 'snow_white', MMR method: Who is the antagonist of the young lady? ==> The antagonist of the young lady is the queen.
Remove documents¶
Sometimes you need to remove a document from the store. This generally entails removal of a number of nodes (i.e. the individual chunks of text, each with its embedding vector, into which the input document is split at ingestion time).
This is made easy with the delete
method of the vector store. Just keep in mind that, when indexing PDF files, LlamaIndex will treat each *page* of the input file as a separate document, so that you will erase one page at a time.
In the following, you will:
- ask a question to check the "baseline" result
- get a few "document IDs" relevant to the question
- use the
delete
method to remove those IDs from the vector store - ask the same question and compare the answer you get.
Ask a question to check the result:
q_removal_test = "Who is Mother Gothel?"
print(f"\nQuestion before removal: {q_removal_test}\n ==> ", end='')
response_before_deletion = query_engine_all.query(q_removal_test)
print(response_before_deletion.response.strip())
Question before removal: Who is Mother Gothel? ==> Mother Gothel is a character mentioned in the context information.
Use the retrieve
query-engine primitive to get a "raw" list of best-match document:
from llama_index.indices.query.schema import QueryBundle
q_bundle = QueryBundle(query_str=q_removal_test)
query_engine_all_manydocs = index.as_query_engine(similarity_top_k=4)
nodes_with_scores = query_engine_all_manydocs.retrieve(q_bundle)
print(f"Found {len(nodes_with_scores)} nodes with their score.")
Found 4 nodes with their score.
Now delete a few documents (remember that there are in general several "nodes" for each "document", as the Counter
here may highlight):
from collections import Counter
nodes_per_document = Counter(nws.node.ref_doc_id for nws in nodes_with_scores)
for document_id, node_count in nodes_per_document.most_common():
print(f"Deleting doc={document_id} (came up in {node_count} node(s))")
vector_store.delete(document_id)
print("Done deleting.")
Deleting doc=38a77cff-4cc8-47f2-91a3-1e65f17896d4 (came up in 2 node(s)) Deleting doc=8d377393-c97f-43c2-98df-36f32514a88c (came up in 1 node(s)) Deleting doc=cdd8808b-1da1-4ddd-8e68-d4d13edf9932 (came up in 1 node(s)) Done deleting.
Now repeat the question:
print(f"\nQuestion after removal: {q_removal_test}\n ==> ", end='')
response_after_deletion = query_engine_all.query(q_removal_test)
if response_after_deletion.response:
print(response_after_deletion.response.strip())
else:
print("(no answer received)")
Question after removal: Who is Mother Gothel? ==> I'm sorry, but based on the given context information, there is no mention of Mother Gothel.
(optional) Cleanup¶
You may want to clean up your database: in that case, simply run the following cell.
Warning: this will delete the vector store and all that you stored into it!
cassio.config.resolve_session().execute(f"DROP TABLE IF EXISTS {cassio.config.resolve_keyspace()}.{table_name};")
<cassandra.cluster.ResultSet at 0x7fe325bf9910>