LlamaIndex, Vector Store Quickstart¶

Create a Vector Store with LamaIndex and CassIO, and build a powerful search engine and text generator, backed by Astra DB / Apache Cassandra®.

Table of contents:¶

Setup (database, LLM+embeddings)
Create vector store
Insert documents in it
Answer questions using Vector Search
Remove documents
Cleanup

NOTE: this uses Cassandra's "Vector Similarity Search" capability. Make sure you are connecting to a vector-enabled database for this demo.

Setup¶

In [1]:

Copied!

from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext

This is the LlamaIndex class providing support for Astra DB / Cassandra:

In [2]:

Copied!

from llama_index.vector_stores import CassandraVectorStore
from llama_index.vector_stores import CassandraVectorStore

A database connection is needed to access Cassandra. The following assumes that a vector-search-capable Astra DB instance is available. Adjust as needed.

In [3]:

Copied!





# Ensure loading of Astra DB credentials into environment variables:
import os
from dotenv import load_dotenv
load_dotenv("../../../.env")

import cassio
cassio.init(
    database_id=os.environ["ASTRA_DB_ID"],
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    keyspace=os.environ.get("ASTRA_DB_KEYSPACE"),  # this is optional
)
# Ensure loading of Astra DB credentials into environment variables:
import os
from dotenv import load_dotenv
load_dotenv("../../../.env")

import cassio
cassio.init(
    database_id=os.environ["ASTRA_DB_ID"],
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    keyspace=os.environ.get("ASTRA_DB_KEYSPACE"),  # this is optional
)

Both an LLM and an embedding function are required.

Below is the logic to instantiate the LLM and embeddings of choice. We chose to leave it in the notebooks for clarity.

In [4]:

Copied!





import os
from llm_choice import suggestLLMProvider

llmProvider = suggestLLMProvider()
# (Alternatively set llmProvider to 'GCP_VertexAI', 'OpenAI', 'Azure_OpenAI' ... manually if you have credentials)

if llmProvider == 'OpenAI':
    os.environ['OPENAI_API_TYPE'] = 'open_ai'
    from llama_index.llms import OpenAI
    from llama_index.embeddings import OpenAIEmbedding
    llm = OpenAI(temperature=0)
    myEmbedding = OpenAIEmbedding()
    vector_dimension = 1536
    print("LLM+embeddings from OpenAI")
elif llmProvider == 'GCP_VertexAI':
    from llama_index import LangchainEmbedding
    # LlamaIndex lets you plug any LangChain's LLM+embeddings:
    from langchain.llms import VertexAI
    from langchain.embeddings import VertexAIEmbeddings
    llm = VertexAI()
    lcEmbedding = VertexAIEmbeddings()
    vector_dimension = len(lcEmbedding.embed_query("This is a sample sentence."))
    # ... if you take care of wrapping the LangChain embedding like this:
    myEmbedding = LangchainEmbedding(lcEmbedding)
    print("LLM+embeddings from Vertex AI")
elif llmProvider == 'Azure_OpenAI':
    os.environ['OPENAI_API_TYPE'] = 'azure'
    os.environ['OPENAI_API_VERSION'] = os.environ['AZURE_OPENAI_API_VERSION']
    os.environ['OPENAI_API_BASE'] = os.environ['AZURE_OPENAI_API_BASE']
    os.environ['OPENAI_API_KEY'] = os.environ['AZURE_OPENAI_API_KEY']
    from llama_index import LangchainEmbedding
    # LlamaIndex lets you plug any LangChain's LLM+embeddings:
    from langchain.llms import AzureOpenAI
    from langchain.embeddings import OpenAIEmbeddings
    llm = AzureOpenAI(temperature=0, model_name=os.environ['AZURE_OPENAI_LLM_MODEL'],
                      engine=os.environ['AZURE_OPENAI_LLM_DEPLOYMENT'])
    lcEmbedding = OpenAIEmbeddings(model=os.environ['AZURE_OPENAI_EMBEDDINGS_MODEL'],
                                   deployment=os.environ['AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT'])
    vector_dimension = len(lcEmbedding.embed_query("This is a sample sentence."))
    # ... if you take care of wrapping the LangChain embedding like this:
    myEmbedding = LangchainEmbedding(lcEmbedding)
    print('LLM+embeddings from Azure OpenAI')
else:
    raise ValueError('Unknown LLM provider.')

print(f"Vector dimension for this embedding model: {vector_dimension}")
import os
from llm_choice import suggestLLMProvider

llmProvider = suggestLLMProvider()
# (Alternatively set llmProvider to 'GCP_VertexAI', 'OpenAI', 'Azure_OpenAI' ... manually if you have credentials)

if llmProvider == 'OpenAI':
    os.environ['OPENAI_API_TYPE'] = 'open_ai'
    from llama_index.llms import OpenAI
    from llama_index.embeddings import OpenAIEmbedding
    llm = OpenAI(temperature=0)
    myEmbedding = OpenAIEmbedding()
    vector_dimension = 1536
    print("LLM+embeddings from OpenAI")
elif llmProvider == 'GCP_VertexAI':
    from llama_index import LangchainEmbedding
    # LlamaIndex lets you plug any LangChain's LLM+embeddings:
    from langchain.llms import VertexAI
    from langchain.embeddings import VertexAIEmbeddings
    llm = VertexAI()
    lcEmbedding = VertexAIEmbeddings()
    vector_dimension = len(lcEmbedding.embed_query("This is a sample sentence."))
    # ... if you take care of wrapping the LangChain embedding like this:
    myEmbedding = LangchainEmbedding(lcEmbedding)
    print("LLM+embeddings from Vertex AI")
elif llmProvider == 'Azure_OpenAI':
    os.environ['OPENAI_API_TYPE'] = 'azure'
    os.environ['OPENAI_API_VERSION'] = os.environ['AZURE_OPENAI_API_VERSION']
    os.environ['OPENAI_API_BASE'] = os.environ['AZURE_OPENAI_API_BASE']
    os.environ['OPENAI_API_KEY'] = os.environ['AZURE_OPENAI_API_KEY']
    from llama_index import LangchainEmbedding
    # LlamaIndex lets you plug any LangChain's LLM+embeddings:
    from langchain.llms import AzureOpenAI
    from langchain.embeddings import OpenAIEmbeddings
    llm = AzureOpenAI(temperature=0, model_name=os.environ['AZURE_OPENAI_LLM_MODEL'],
                      engine=os.environ['AZURE_OPENAI_LLM_DEPLOYMENT'])
    lcEmbedding = OpenAIEmbeddings(model=os.environ['AZURE_OPENAI_EMBEDDINGS_MODEL'],
                                   deployment=os.environ['AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT'])
    vector_dimension = len(lcEmbedding.embed_query("This is a sample sentence."))
    # ... if you take care of wrapping the LangChain embedding like this:
    myEmbedding = LangchainEmbedding(lcEmbedding)
    print('LLM+embeddings from Azure OpenAI')
else:
    raise ValueError('Unknown LLM provider.')

print(f"Vector dimension for this embedding model: {vector_dimension}")

LLM+embeddings from OpenAI
Vector dimension for this embedding model: 1536

The following cell ensures that throughout all of LlamaIndex the LLM and embedding function instantiated above will be used:

In [5]:

Copied!





from llama_index import ServiceContext
from llama_index import set_global_service_context

service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=myEmbedding,
    chunk_size=256,
)
set_global_service_context(service_context)
from llama_index import ServiceContext
from llama_index import set_global_service_context

service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=myEmbedding,
    chunk_size=256,
)
set_global_service_context(service_context)

Create vector store¶

In [6]:

Copied!

table_name = 'vs_ll_' + llmProvider
table_name = 'vs_ll_' + llmProvider

In the LlamaIndex abstractions, the CassandraVectorStore instance is best wrapped into the creation of a "storage context", which you'll momentarily use to create the index proper:

In [7]:

Copied!





storage_context = StorageContext.from_defaults(
    vector_store=CassandraVectorStore(
        table=table_name,
        embedding_dimension=vector_dimension,
        insertion_batch_size=15,
    )
)
storage_context = StorageContext.from_defaults(
    vector_store=CassandraVectorStore(
        table=table_name,
        embedding_dimension=vector_dimension,
        insertion_batch_size=15,
    )
)

Insert documents¶

You'll want to be able to employ filtering on metadata in your question-answering process, so here is a simple way to associate a metadata dictionary to the ingested sources.

LlamaIndex supports a function that maps an input file name to a metadata dictionary, and will honour the latter when storing the source text along with the embedding vectors:

In [8]:

Copied!





def my_file_metadata(file_name: str):
    if "snow" in file_name:
        return {"story": "snow_white"}
    elif "rapunzel" in file_name:
        return {"story": "rapunzel"}
    else:
        return {"story": "other"}
def my_file_metadata(file_name: str):
    if "snow" in file_name:
        return {"story": "snow_white"}
    elif "rapunzel" in file_name:
        return {"story": "rapunzel"}
    else:
        return {"story": "other"}

In the vector store, you can load, for example, the PDF files found in a given directory. Actually, you can load the documents and instantiate the "index" object at the same time:

In [9]:

Copied!





documents = SimpleDirectoryReader(
    'pdfs',
    file_metadata=my_file_metadata
).load_data()
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
)
documents = SimpleDirectoryReader(
    'pdfs',
    file_metadata=my_file_metadata
).load_data()
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
)

Re-opening a preexisting vector store¶

In most realistic cases, you want to access a vector store that was created and populated previously. To do so, you would create the CassandraVectorStore as you saw already, and then use the from_vector_store static method of VectorStoreIndex, obtaining an "index" that you can use exactly like the one you just created:

In [10]:

Copied!





# This is how you would get an index for a pre-existing LlamaIndex vector store:
vector_store = CassandraVectorStore(
    table=table_name,
    embedding_dimension=vector_dimension,
)
index_from_preexisting = VectorStoreIndex.from_vector_store(
    vector_store=vector_store
)
# You can try replacing "index_from_preexisting" in place of
# "index" everywhere in the next cells ... e.g.:
#     query_engine = index_from_preexisting.as_query_engine(...)
# This is how you would get an index for a pre-existing LlamaIndex vector store:
vector_store = CassandraVectorStore(
    table=table_name,
    embedding_dimension=vector_dimension,
)
index_from_preexisting = VectorStoreIndex.from_vector_store(
    vector_store=vector_store
)
# You can try replacing "index_from_preexisting" in place of
# "index" everywhere in the next cells ... e.g.:
#     query_engine = index_from_preexisting.as_query_engine(...)

Answer questions¶

Everything is ready to start asking questions. This cell allows to search for the answer over the whole indexed corpus:

In [11]:

Copied!





question_1 = "Who is the antagonist of the young lady?"
print(f"\nQuestion on the whole store: {question_1}\n ==> ", end='')
query_engine_all = index.as_query_engine(similarity_top_k=6)
response_from_all = query_engine_all.query(question_1)
print(response_from_all.response.strip())
question_1 = "Who is the antagonist of the young lady?"
print(f"\nQuestion on the whole store: {question_1}\n ==> ", end='')
query_engine_all = index.as_query_engine(similarity_top_k=6)
response_from_all = query_engine_all.query(question_1)
print(response_from_all.response.strip())

Question on the whole store: Who is the antagonist of the young lady?
 ==> The step-mother is the antagonist of the young lady.

Metadata filtering¶

You can, instead, constrain the lookup with filters on the metadata.

Here is the same question, limited to one of the input documents through a filtering condition:

In [12]:

Copied!





from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters

print(f"\nQuestion on the 'red_hood' document: {question_1}\n ==> ", end='')
query_engine_doc1 = index.as_query_engine(
    similarity_top_k=6,
    filters=MetadataFilters(filters=[
        ExactMatchFilter(key="story", value="snow_white"),
    ])
)
response_doc1 = query_engine_doc1.query(question_1)
print(response_doc1.response.strip())
from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters

print(f"\nQuestion on the 'red_hood' document: {question_1}\n ==> ", end='')
query_engine_doc1 = index.as_query_engine(
    similarity_top_k=6,
    filters=MetadataFilters(filters=[
        ExactMatchFilter(key="story", value="snow_white"),
    ])
)
response_doc1 = query_engine_doc1.query(question_1)
print(response_doc1.response.strip())

Question on the 'red_hood' document: Who is the antagonist of the young lady?
 ==> The step-mother is the antagonist of the young lady.

And here the very same process, this time limiting the sources to the other input PDF:

In [13]:

Copied!





print(f"\nQuestion on the 'rapunzel' document: {question_1}\n ==> ", end='')
query_engine_doc2 = index.as_query_engine(
    similarity_top_k=6,
    filters=MetadataFilters(filters=[
        ExactMatchFilter(key="story", value="rapunzel"),
    ])
)
response_doc2 = query_engine_doc2.query(question_1)
print(response_doc2.response.strip())
print(f"\nQuestion on the 'rapunzel' document: {question_1}\n ==> ", end='')
query_engine_doc2 = index.as_query_engine(
    similarity_top_k=6,
    filters=MetadataFilters(filters=[
        ExactMatchFilter(key="story", value="rapunzel"),
    ])
)
response_doc2 = query_engine_doc2.query(question_1)
print(response_doc2.response.strip())

Question on the 'rapunzel' document: Who is the antagonist of the young lady?
 ==> The antagonist of the young lady is the witch.

MMR (Maximum-marginal-relevance) method¶

In many cases, using the MMR method enhances the quality of the answers.

In short, the method, when running the search on the source which underpins the answer generation, tries to pick chunks of text that, while still relevant, are as diverse from each other as possible.

In this cell, you can see it in action combined with metadata filtering:

In [14]:

Copied!





print(f"\nQuestion on 'snow_white', MMR method: {question_1}\n ==> ", end='')
query_engine_doc2_mmr = index.as_query_engine(
    similarity_top_k=6,
    vector_store_query_mode="mmr",
    vector_store_kwargs={
        "mmr_prefetch_k": 20,
    },
    filters=MetadataFilters(filters=[
        ExactMatchFilter(key="story", value="snow_white"),
    ])
)
response_doc2_mmr = query_engine_doc2_mmr.query(question_1)
print(response_doc2_mmr.response.strip())
print(f"\nQuestion on 'snow_white', MMR method: {question_1}\n ==> ", end='')
query_engine_doc2_mmr = index.as_query_engine(
    similarity_top_k=6,
    vector_store_query_mode="mmr",
    vector_store_kwargs={
        "mmr_prefetch_k": 20,
    },
    filters=MetadataFilters(filters=[
        ExactMatchFilter(key="story", value="snow_white"),
    ])
)
response_doc2_mmr = query_engine_doc2_mmr.query(question_1)
print(response_doc2_mmr.response.strip())

Question on 'snow_white', MMR method: Who is the antagonist of the young lady?
 ==> The antagonist of the young lady is the queen.

Remove documents¶

Sometimes you need to remove a document from the store. This generally entails removal of a number of nodes (i.e. the individual chunks of text, each with its embedding vector, into which the input document is split at ingestion time).

This is made easy with the delete method of the vector store. Just keep in mind that, when indexing PDF files, LlamaIndex will treat each *page* of the input file as a separate document, so that you will erase one page at a time.

In the following, you will:

ask a question to check the "baseline" result
get a few "document IDs" relevant to the question
use the delete method to remove those IDs from the vector store
ask the same question and compare the answer you get.

Ask a question to check the result:

In [15]:

Copied!





q_removal_test = "Who is Mother Gothel?"
print(f"\nQuestion before removal: {q_removal_test}\n ==> ", end='')
response_before_deletion = query_engine_all.query(q_removal_test)
print(response_before_deletion.response.strip())
q_removal_test = "Who is Mother Gothel?"
print(f"\nQuestion before removal: {q_removal_test}\n ==> ", end='')
response_before_deletion = query_engine_all.query(q_removal_test)
print(response_before_deletion.response.strip())

Question before removal: Who is Mother Gothel?
 ==> Mother Gothel is a character mentioned in the context information.

Use the retrieve query-engine primitive to get a "raw" list of best-match document:

In [16]:

Copied!

from llama_index.indices.query.schema import QueryBundle

q_bundle = QueryBundle(query_str=q_removal_test)
query_engine_all_manydocs  = index.as_query_engine(similarity_top_k=4)
nodes_with_scores = query_engine_all_manydocs.retrieve(q_bundle)

print(f"Found {len(nodes_with_scores)} nodes with their score.")
from llama_index.indices.query.schema import QueryBundle

q_bundle = QueryBundle(query_str=q_removal_test)
query_engine_all_manydocs  = index.as_query_engine(similarity_top_k=4)
nodes_with_scores = query_engine_all_manydocs.retrieve(q_bundle)

print(f"Found {len(nodes_with_scores)} nodes with their score.")

Found 4 nodes with their score.

Now delete a few documents (remember that there are in general several "nodes" for each "document", as the Counter here may highlight):

In [17]:

Copied!





from collections import Counter

nodes_per_document = Counter(nws.node.ref_doc_id for nws in nodes_with_scores)

for document_id, node_count in nodes_per_document.most_common():
    print(f"Deleting doc={document_id} (came up in {node_count} node(s))")
    vector_store.delete(document_id)
print("Done deleting.")
from collections import Counter

nodes_per_document = Counter(nws.node.ref_doc_id for nws in nodes_with_scores)

for document_id, node_count in nodes_per_document.most_common():
    print(f"Deleting doc={document_id} (came up in {node_count} node(s))")
    vector_store.delete(document_id)
print("Done deleting.")

Deleting doc=38a77cff-4cc8-47f2-91a3-1e65f17896d4 (came up in 2 node(s))
Deleting doc=8d377393-c97f-43c2-98df-36f32514a88c (came up in 1 node(s))
Deleting doc=cdd8808b-1da1-4ddd-8e68-d4d13edf9932 (came up in 1 node(s))
Done deleting.

Now repeat the question:

In [18]:

Copied!





print(f"\nQuestion after removal: {q_removal_test}\n ==> ", end='')
response_after_deletion = query_engine_all.query(q_removal_test)
if response_after_deletion.response:
    print(response_after_deletion.response.strip())
else:
    print("(no answer received)")
print(f"\nQuestion after removal: {q_removal_test}\n ==> ", end='')
response_after_deletion = query_engine_all.query(q_removal_test)
if response_after_deletion.response:
    print(response_after_deletion.response.strip())
else:
    print("(no answer received)")

Question after removal: Who is Mother Gothel?
 ==> I'm sorry, but based on the given context information, there is no mention of Mother Gothel.

(optional) Cleanup¶

You may want to clean up your database: in that case, simply run the following cell.

Warning: this will delete the vector store and all that you stored into it!

In [19]:

Copied!

cassio.config.resolve_session().execute(f"DROP TABLE IF EXISTS {cassio.config.resolve_keyspace()}.{table_name};")
cassio.config.resolve_session().execute(f"DROP TABLE IF EXISTS {cassio.config.resolve_keyspace()}.{table_name};")

Out[19]:

<cassandra.cluster.ResultSet at 0x7fe325bf9910>