Integration: pgvector
A Document Store for storing and retrieval from pgvector
Table of Contents
- Pgvector Document Store for Haystack
Installation
pgvector
is an extension for PostgreSQL that adds support for vector similarity search.
To quickly set up a PostgreSQL database with pgvector, you can use Docker:
docker run -d -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=postgres ankane/pgvector
For more information on how to install pgvector, visit the pgvector GitHub repository.
Use pip
to install pgvector-haystack
:
pip install pgvector-haystack
Usage
Define the connection string to your PostgreSQL database in the PG_CONN_STR
environment variable. For example:
export PG_CONN_STR="postgresql://postgres:postgres@localhost:5432/postgres"
Once installed, initialize PgvectorDocumentStore:
from haystack_integrations.document_stores.pgvector import PgvectorDocumentStore
document_store = PgvectorDocumentStore(
table_name="haystack_docs",
embedding_dimension=768,
vector_function="cosine_similarity",
recreate_table=True,
search_strategy="hnsw",
)
Writing Documents to PgvectorDocumentStore
To write documents to PgvectorDocumentStore
, create an indexing pipeline.
from haystack import Pipeline
from haystack.components.converters import TextFileToDocument
from haystack.components.writers import DocumentWriter
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "embedder")
indexing.connect("embedder", "writer")
indexing.run({"converter": {"sources": file_paths}})
Retrieval from PgvectorDocumentStore
You can retrieve semantically similar documents to a given query using a simple pipeline that includes the
PgvectorEmbeddingRetriever
.
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack_integrations.components.retrievers.pgvector import PgvectorEmbeddingRetriever
from haystack import Pipeline
querying = Pipeline()
querying.add_component("embedder", SentenceTransformersTextEmbedder())
querying.add_component("retriever", PgvectorEmbeddingRetriever(document_store=document_store, top_k=3))
querying.connect("embedder", "retriever")
results = querying.run({"embedder": {"text": "my query"}})
You can also retrieve Documents based on keyword matching with the PgvectorKeywordRetriever
.
from haystack_integrations.components.retrievers.pgvector import PgvectorKeywordRetriever
retriever = PgvectorKeywordRetriever(document_store=document_store, top_k=3))
results = retriever.run(query="my query")
Examples
You can find a code example showing how to use the Document Store and the Retriever under the examples/
folder of
this repo.
License
pgvector-haystack
is distributed under the terms of the
Apache-2.0 license.