pingcap / tidb-vector-python

TiDB Vector SDK for Python. Join our Discord: https://discord.gg/XzSW23Jg9p

Home Page:https://tidb.cloud/ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tidb-vector-python

This is a Python client for TiDB Vector.

Now only TiDB Cloud Serverless cluster support vector data type, see this blog for more information.

Installation

pip install tidb-vector

Usage

TiDB vector supports below distance functions:

  • L1Distance
  • L2Distance
  • CosineDistance
  • NegativeInnerProduct

supports following orm or framework:

SQLAlchemy

Learn how to connect to TiDB Serverless in the TiDB Cloud documentation.

Define table with vector field

from sqlalchemy import Column, Integer
from sqlalchemy.orm import declarative_base
from tidb_vector.sqlalchemy import VectorType

Base = declarative_base()

class Test(Base):
    __tablename__ = 'test'
    id = Column(Integer, primary_key=True)
    embedding = Column(VectorType(3))

Insert vector data

test = Test(embedding=[1, 2, 3])
session.add(test)
session.commit()

Get the nearest neighbors

session.scalars(select(Test).order_by(Test.embedding.l2_distance([1, 2, 3.1])).limit(5))

Get the distance

session.scalars(select(Test.embedding.l2_distance([1, 2, 3.1])))

Get within a certain distance

session.scalars(select(Test).filter(Test.embedding.l2_distance([1, 2, 3.1]) < 0.2))

Django

To use vector field in Django, you need to use django-tidb.

Peewee

Define peewee table with vector field

from peewee import Model, MySQLDatabase
from tidb_vector.peewee import VectorField

# Using `pymysql` as the driver
connect_kwargs = {
    'ssl_verify_cert': True,
    'ssl_verify_identity': True,
}

# Using `mysqlclient` as the driver
connect_kwargs = {
    'ssl_mode': 'VERIFY_IDENTITY',
    'ssl': {
        # Root certificate default path
        # https://docs.pingcap.com/tidbcloud/secure-connections-to-serverless-clusters/#root-certificate-default-path
        'ca': '/etc/ssl/cert.pem'  # MacOS
    },
}

db = MySQLDatabase(
    'peewee_test',
    user='xxxxxxxx.root',
    password='xxxxxxxx',
    host='xxxxxxxx.shared.aws.tidbcloud.com',
    port=4000,
    **connect_kwargs,
)

class TestModel(Model):
    class Meta:
        database = db
        table_name = 'test'

    embedding = VectorField(3)

Insert vector data

TestModel.create(embedding=[1, 2, 3])

Get the nearest neighbors

TestModel.select().order_by(TestModel.embedding.l2_distance([1, 2, 3.1])).limit(5)

Get the distance

TestModel.select(TestModel.embedding.cosine_distance([1, 2, 3.1]).alias('distance'))

Get within a certain distance

TestModel.select().where(TestModel.embedding.l2_distance([1, 2, 3.1]) < 0.5)

TiDB Vector Client

Within the framework, you can directly utilize the built-in TiDBVectorClient, as demonstrated by integrations like Langchain and Llama index, to seamlessly interact with TiDB Vector. This approach abstracts away the need to manage the underlying ORM, simplifying your interaction with the vector store.

We provide TiDBVectorClient which is based on sqlalchemy, you need to use pip install tidb-vector[client] to install it.

Create a TiDBVectorClient instance:

from tidb_vector.integrations import TiDBVectorClient

TABLE_NAME = 'vector_test'
CONNECTION_STRING = 'mysql+pymysql://<USER>:<PASSWORD>@<HOST>:4000/<DB>?ssl_verify_cert=true&ssl_verify_identity=true'

tidb_vs = TiDBVectorClient(
    # the table which will store the vector data
    table_name=TABLE_NAME,
    # tidb connection string
    connection_string=CONNECTION_STRING,
    # the dimension of the vector, in this example, we use the ada model, which has 1536 dimensions
    vector_dimension=1536,
    # if recreate the table if it already exists
    drop_existing_table=True,
)

Bulk insert:

ids = [
    "f8e7dee2-63b6-42f1-8b60-2d46710c1971",
    "8dde1fbc-2522-4ca2-aedf-5dcb2966d1c6",
    "e4991349-d00b-485c-a481-f61695f2b5ae",
]
documents = ["foo", "bar", "baz"]
embeddings = [
    text_to_embedding("foo"),
    text_to_embedding("bar"),
    text_to_embedding("baz"),
]
metadatas = [
    {"page": 1, "category": "P1"},
    {"page": 2, "category": "P1"},
    {"page": 3, "category": "P2"},
]

tidb_vs.insert(
    ids=ids,
    texts=documents,
    embeddings=embeddings,
    metadatas=metadatas,
)

Query:

tidb_vs.query(text_to_embedding("foo"), k=3)

# query with filter
tidb_vs.query(text_to_embedding("foo"), k=3, filter={"category": "P1"})

Bulk delete:

tidb_vs.delete(["f8e7dee2-63b6-42f1-8b60-2d46710c1971"])

# delete with filter
tidb_vs.delete(["f8e7dee2-63b6-42f1-8b60-2d46710c1971"], filter={"category": "P1"})

About

TiDB Vector SDK for Python. Join our Discord: https://discord.gg/XzSW23Jg9p

https://tidb.cloud/ai

License:Apache License 2.0


Languages

Language:Python 99.7%Language:Makefile 0.3%