Function Calling and Multimodal QA with Gemini


by Tuana Celik: Twitter, LinkedIn, Tilde Thurium: Twitter, LinkedIn and Silvano Cerza: LinkedIn

πŸ“š Check out the Gemini Models with Google Vertex AI Integration for Haystack article for a detailed run through of this example.

This is a notebook showing how you can use Gemini with Haystack 2.0.

Gemini is Google’s newest model. You can read more about its capabilities here.

Install dependencies

As a prerequisite, you need to have a Google Cloud Project set up that has access to Gemini. Following that, you’ll only need to authenticate yourself in this Colab.

First thing first we need to install our dependencies.

(You can ignore the pip dependency error for cohere and tiktoken, that’s irrelevant for our purposes.)

!pip install --upgrade haystack-ai google-vertex-haystack trafilatura

To use Gemini you need to have a Google Cloud Platform account and be logged in using Application Default Credentials (ADCs). For more info see the official documentation.

Time to login!

from google.colab import auth

auth.authenticate_user()

Remember to set the project_id variable to a valid project ID that you have enough authorization to use for Gemini. We’re going to use this one throughout the example!

To find your project ID you can find it in the GCP resource manager or locally by running gcloud projects list in your terminal. For more info on the gcloud CLI see the official documentation.

project_id = input("Enter your project ID:")

Use gemini-1.5-flash

Answer Questions

Now that we setup everything we can create an instance of our Gemini component.

from haystack_integrations.components.generators.google_vertex import VertexAIGeminiGenerator

gemini = VertexAIGeminiGenerator(model="gemini-1.5-flash", project_id=project_id)

Let’s start by asking something simple.

This component expects a list of Parts as input to the run() method. Parts can be anything from a message, to images, or even function calls. Here are the docstrings from the source code for the most up-to-date reference we could find here.

result = gemini.run(parts = ["What is the most interesting thing you know?"])
for answer in result["replies"]:
    print(answer)

Answer Questions about Images

Let’s try something a bit different! gemini-1.5-flash can also work with images, let’s see if we can have it answer questions about some robots πŸ‘‡

We’re going to download some images for this example. πŸ€–

import requests
from haystack.dataclasses.byte_stream import ByteStream

URLS = [
    "https://raw.githubusercontent.com/silvanocerza/robots/main/robot1.jpg",
    "https://raw.githubusercontent.com/silvanocerza/robots/main/robot2.jpg",
    "https://raw.githubusercontent.com/silvanocerza/robots/main/robot3.jpg",
    "https://raw.githubusercontent.com/silvanocerza/robots/main/robot4.jpg"
]
images = [
    ByteStream(data=requests.get(url).content, mime_type="image/jpeg")
    for url in URLS
]

Next, let’s run the VertexAIGeminiGenerator component on it’s own.

result = gemini.run(parts = ["What can you tell me about this robots?", *images])
for answer in result["replies"]:
    print(answer)

Did Gemini recognize all its friends? πŸ‘€

Function Calling with gemini-pro

With gemini-pro, we can also start introducing function calling! So let’s see how we can do that πŸ‘‡

Let’s see if we can build a system that can run a get_current_weather function, based on a question asked in natural language.

First we create our function definition and tool.

For demonstration purposes, we’re simply creating a get_current_weather function that returns an object which will always tell us it’s ‘Sunny, and 21.8 degrees’.. If it’s Celsius, that’s a good day! β˜€οΈ

def get_current_weather(location: str, unit: str = "celsius"):
  return {"weather": "sunny", "temperature": 21.8, "unit": unit}

Now we have to provide this function as a Tool to Gemini. So, first we need to create a FunctionDeclaration that explains this function to Gemini πŸ‘‡

from vertexai.generative_models import Tool, FunctionDeclaration

get_current_weather_func = FunctionDeclaration(
    name="get_current_weather",
    description="Get the current weather in a given location",
    parameters={
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
            "unit": {
                "type": "string",
                "enum": [
                    "celsius",
                    "fahrenheit",
                ],
            },
        },
        "required": ["location"],
    },
)
tool = Tool([get_current_weather_func])

We’re also going to chat with Gemini this time, we’re going to use another class for this.

We also need the Gemini Pro model to use functions, Gemini Pro Vision doesn’t support functions.

Let’s create a VertexAIGeminiChatGenerator

from haystack_integrations.components.generators.google_vertex import VertexAIGeminiChatGenerator

gemini_chat = VertexAIGeminiChatGenerator(model="gemini-pro", project_id=project_id, tools=[tool])
from haystack.dataclasses import ChatMessage

messages = [ChatMessage.from_user(content = "What is the temperature in celsius in Berlin?")]
res = gemini_chat.run(messages=messages)
res["replies"]

Look at that! We go a message with some interesting information now. We can use that information to call a real function locally.

Let’s do exactly that and pass the result back to Gemini.


weather = get_current_weather(**res["replies"][0].content)

messages += res["replies"] + [ChatMessage.from_function(content=weather, name="get_current_weather")]

res = gemini_chat.run(messages = messages)
res["replies"][0].content

Seems like the weather is nice and sunny, remember to put on your sunglasses. 😎

Build a full Retrieval-Augmented Generation Pipeline with gemini-1.5-flash

As a final exercise, let’s add the VertexAIGeminiGenerator to a full RAG pipeline. In the example below, we are building a RAG pipeline that does question answering on the web, using gemini-1.5-flash

from haystack.components.fetchers.link_content import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.rankers import TransformersSimilarityRanker
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack import Pipeline

fetcher = LinkContentFetcher()
converter = HTMLToDocument()
document_splitter = DocumentSplitter(split_by="word", split_length=50)
similarity_ranker = TransformersSimilarityRanker(top_k=3)
gemini = VertexAIGeminiGenerator(model="gemini-1.5-flash", project_id=project_id)

prompt_template = """
According to these documents:

{% for doc in documents %}
  {{ doc.content }}
{% endfor %}

Answer the given question: {{question}}
Answer:
"""
prompt_builder = PromptBuilder(template=prompt_template)

pipeline = Pipeline()
pipeline.add_component("fetcher", fetcher)
pipeline.add_component("converter", converter)
pipeline.add_component("splitter", document_splitter)
pipeline.add_component("ranker", similarity_ranker)
pipeline.add_component("prompt_builder", prompt_builder)
pipeline.add_component("gemini", gemini)

pipeline.connect("fetcher.streams", "converter.sources")
pipeline.connect("converter.documents", "splitter.documents")
pipeline.connect("splitter.documents", "ranker.documents")
pipeline.connect("ranker.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "gemini")

Let’s try asking Gemini to tell us about Haystack 2.0 and how to use it.

question = "What do graphs have to do with Haystack?"
result = pipeline.run({"prompt_builder": {"question": question},
                   "ranker": {"query": question},
                   "fetcher": {"urls": ["https://haystack.deepset.ai/blog/introducing-haystack-2-beta-and-advent"]}})

for answer in result["gemini"]["replies"]:
  print(answer)

Now you’ve seen some of what Gemini can do, as well as how to integrate it with Haystack 2.0. If you want to learn more: