Further reading
The easiest way to see CassIO in action, as outlined in the "Start Here" page, is certainly to open a Colab example backed by an Astra DB instance.
However, you may want to switch to a different setup. This page outlines how this is accomplished in two separate respects: running the examples locally as Jupyter notebooks, and using a local (pre-release) Cassandra cluster instead of Astra DB.
Run with local Jupyter
There are several reasons one might prefer to launch the code locally: for example, it may be easier to evolve the notebooks into a full-fledged application; also, one can prefer a non-ephemeral setup, especially when planning to run several examples in a row.
In the following we assume you have fulfilled the pre-requisites listed on the previous page.
You should have basic familiarity with git
and the shell console.
Clone this repository
First, clone this repo on you machine (it spans both the website and the examples). In a directory of your choice, execute the following:
git clone https://github.com/CassioML/cassio-website.git
cd cassio-website
Note that the following commands are to be run in the cassio-website
directory.
DB
You need a .env
file which defines the database credentials and connection
parameters.
You can copy the provided .env.template
file and replace
the environment variables you see there
(essentially, the full path to the bundle file, the keyspace
name and your database secret token string and ID).
Using the Astra CLI
Alternatively, once you have the Database Administrator token,
you can use the Astra CLI to automate the rest (secure-connect bundle and .env
file).
First install Astra CLI.
Then configure it with:
astra setup
providing the Database Administrator token you created earlier
(the string starting with AstraCS:...
).
Finally, in the root directory of this repo, adjusting names if needed, launch
astra db create-dotenv cassio_db -k cassio_tutorials
This will download the bundle zipfile and create a .env
file
with all connection parameters you'll need later.
Pre-populate the database
Some of the provided code examples require pre-existing data on your database.
To populate the newly-created keyspace with the required data:
- download the newest (vector-search-compatible)
cqlsh
utility from this link; - extract the
cqlsh
archive to a location of your liking, e.g./home/user/myCqlsh
; - ensure you are still in the repo's root directory;
- source the environment file you prepared in the previous step with
. .env
; - launch the script that populates the database (adjust the path as needed):
/home/user/myCqlsh/cqlsh-astra/bin/cqlsh \
-b "$ASTRA_DB_SECURE_BUNDLE_PATH" \
-u token \
-p "$ASTRA_DB_APPLICATION_TOKEN" \
-k "$ASTRA_DB_KEYSPACE" \
-f setup/provision_db/write_sample_data.cql
Astra DB's in-browser console
Alternatively, you can also populate the database
without a local cqlsh
client, all from your browser.
Locate the CQL Console in you Astra DB instance, then:
- enter the command
USE cassio_tutorials;
and press Enter. Replace with your keyspace name if you called it differently. - Paste the contents of this file in the Console and wait for it to complete.
LLM Credentials
In this repo's root directory again, create a .api_keys
file where the secrets
necessary for your LLM of choice are defined. You can copy
the provided .api_keys.template
and adjust the values therein.
Check out the LLM Pre-requisites for a list of supported LLMs: each will require a different variable to be set here.
Automatic choice of LLM
The code examples generally rely on a helper function to determine
which LLM to use, based on which secrets are detected in this file.
You can define your preferred LLM
(e.g. in case you define more than one secret) by setting
the environment variable PREFERRED_LLM_PROVIDER
in .api_keys
.
Remember to "source" this file before launching notebooks or Python scripts:
. .api_keys
Framework-specific setup
Now, database and LLM are all set for running the examples locally.
For each framework, still, you will have to prepare a specific Python environment with the right dependencies. The instructions are given in the section of this docs specific to that framework: for example, here is how you start the LangChain examples locally.
Use a local Vector-capable Cassandra
At the time of writing, the Vector Similarity Search capabilities are not yet included in the distributions of Cassandra (see CEP-30 to track the status).
If you want to use the latest Cassandra that implements Vector Search, you can still do so by building the code yourself. The following instructions describe how to do that and get a single-node vector-capable Cassandra cluster running locally.
Warning
This is a development branch at the moment: it is not guaranteed to be stable. Please refrain from using it in production environments for a little while more.
These instructions require some knowledge of Java building tools.
Build and run
You need Java JDK 11. Go to a directory of your choice and execute:
# get the code
git clone https://github.com/datastax/cassandra.git
cd cassandra
git checkout vsearch
git pull
# setup usage of JDK 11
sudo update-alternatives --config java # ... and pick JDK 11
export CASSANDRA_USE_JDK11=true
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/ # adapt this command
# clean and build
ant realclean
ant jar
If the above succeeded, you can launch the cluster with:
sudo bin/cassandra -f -R
You may need to create a log directory if not present:
sudo mkdir -p /var/log/cassandra
CQL Console
Using the vector-aware CQL Console that can be downloaded here, open a CQL Console by launching:
# adjust the path as needed
/home/user/myCqlsh/cqlsh-astra/bin/cqlsh
Populate the database
Still in the CQL Console, create a keyspace for the examples by executing the following:
CREATE KEYSPACE IF NOT EXISTS cassio_tutorials
WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1 };
You can check that the keyspace exists with:
DESC KEYSPACES;
Exit the CQL console (EXIT
+ Enter), go back to this repo's root dir (you should have cloned it earlier to /some/directory/cassio-website/
),
then run this script, which will populate the local database
with sample tables occasionally used by some of the examples you'll encounter:
# adjust the path as needed
/home/user/myCqlsh/cqlsh-astra/bin/cqlsh \
-k cassio_tutorials \
-f setup/provision_db/write_sample_data.cql
Use the local Cassandra in the code
Your local Cassandra is ready to support all examples.
The only remaining thing is to make sure the Session
object used in the
code is a connection to your local database: how exactly this is achieved
depends on the framework you use.
With Langchain, for example, you can simply use the provided
getCQLSession
and getCQLKeyspace
functions making sure you pass the parameter mode="local"
when calling them.
More generally, you can build a connection to the local Cassandra with Python code similar to the following:
from cassandra.cluster import Cluster
cluster = Cluster()
session = cluster.connect()
the session
object, along with the keyspace name (a string),
can then be used e.g. in the cassio.init(...)
invocations,
or when instantiating individual objects, exactly in the same way
as you would a connection to Astra DB
(see the drivers documentation for more options).