Integration: Weaviate
Use a Weaviate database with Haystack
Table of Contents
Haystack 2.0
Installation
Use pip
to install Weaviate:
pip install weaviate-haystack
Usage
Once installed, initialize your Weaviate database to use it with Haystack 2.x.
In this example, we use the temporary embedded version for simplicity. To use a self-hosted Docker container or Weaviate Cloud Service, take a look at the docs.
from haystack_integrations.document_stores.weaviate import WeaviateDocumentStore
from weaviate.embedded import EmbeddedOptions
document_store = WeaviateDocumentStore(embedded_options=EmbeddedOptions())
Writing Documents to WeaviateDocumentStore
To write documents to WeaviateDocumentStore
, create an indexing pipeline.
from haystack.components.file_converters import TextFileToDocument
from haystack.components.writers import DocumentWriter
indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "writer")
indexing.run({"converter": {"paths": file_paths}})
License
weaviate-haystack
is distributed under the terms of the
Apache-2.0 license.
Haystack 1.x
Haystack supports the use of
Weaviate as data storage for LLM pipelines, with the WeaviateDocumentStore
. You can choose to run Weaviate locally yourself, or use a hosted Weaviate database.
For details on the available methods and parameters of the WeaviateDocumentStore
, check out the Haystack
API Reference and
Documentation
Installation
pip install farm-haystack[weaviate]
Usage
To use Weaviate as your data storage for your Haystack LLM pipelines, you should have it running locally or have a hosted instance. Then, you can initialize a WeaviateDocumentStore
:
from haystack.document_stores import WeaviateDocumentStore
document_store = WeaviateDocumentStore(host='http://localhost",
port=8080,
embedding_dim=768)
Writing Documents to WeaviateDocumentStore
To write documents to your WeaviateDocumentStore
, create an indexing pipeline, or use the write_documents()
function.
For this step, you may make use of the available
FileConverters and
PreProcessors, as well as other
Integrations that might help you fetch data from other resources. Below is an example indexing pipeline that indexes your Markdown files into a Weaviate database. The example pipeline below not only indexes the contents of the files, but also the embeddings. This way, we can do vector search on our files.
Indexing Pipeline
from haystack import Pipeline
from haystack.document_stores import WeaviateDocumentStore
from haystack.nodes import EmbeddingRetriever, MarkdownConverter, PreProcessor
document_store = WeaviateDocumentStore(host="http://localhost",
port=8080,
embedding_dim=768)
converter = MarkdownConverter()
preprocessor = PreProcessor()
retriever = EmbeddingRetriever(document_store = document_store,
embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")
indexing_pipeline = Pipeline()
indexing_pipeline.add_node(component=converter, name="PDFConverter", inputs=["File"])
indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["PDFConverter"])
indexing_pipeline.add_node(component=retriever, name="Retriever", inputs=["PreProcessor"])
indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["Retriever"])
indexing_pipeline.run(file_paths=["filename.pdf"])
Using Weaviate in a Query Pipeline
Once you have documents in your WeaviateDocumentStore
, it’s ready to be used in any Haystack pipeline. For example, below is a pipeline that makes use of a custom prompt that, given a query, is designed to generate long answers based on the retrieved documents.
from haystack import Pipeline
from haystack.document_stores import WeaviateDocumentStore
from haystack.nodes import AnswerParser, EmbeddingRetriever, PromptNode, PromptTemplate
document_store = WeaviateDocumentStore(host='http://localhost",
port=8080,
embedding_dim=768)
retriever = EmbeddingRetriever(document_store = document_store,
embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")
prompt_template = PromptTemplate(prompt = """"Given the provided Documents, answer the Query. Make your answer detailed and long\n
Query: {query}\n
Documents: {join(documents)}
Answer:
""",
output_parser=AnswerParser())
prompt_node = PromptNode(model_name_or_path = "gpt-4",
api_key = "YOUR_OPENAI_KEY",
default_prompt_template = prompt_template)
query_pipeline = Pipeline()
query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
query_pipeline.run(query = "What is Weaviate", params={"Retriever" : {"top_k": 5}})