Streaming Model Explorer


notebook by Tilde Thurium: Mastodon || Twitter || LinkedIn

Problem: there are so many LLMs these days! Which model is the best for my use case?

This notebook uses Haystack 2.0 to compare the results of sending the same prompt to several different models.

This is a very basic demo where you can only compare a few models that support streaming responses. I’d like to support more models in the future, so watch this space for updates.

Models

Haystack’s OpenAIGenerator and CohereGenerator support streaming out of the box.

The other models use the HuggingFaceAPIGenerator.

Prerequisites

!pip install -U haystack-ai cohere-haystack "huggingface_hub>=0.22.0"

In order for userdata.get to work, these keys need to be saved as secrets in your Colab. Click on the key icon in the left menu or see detailed instructions here.

from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack_integrations.components.generators.cohere import CohereGenerator
from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.utils import Secret
from google.colab import userdata

open_ai_generator = OpenAIGenerator(api_key=Secret.from_token(userdata.get('OPENAI_API_KEY')))

cohere_generator = CohereGenerator(api_key=Secret.from_token(userdata.get('COHERE_API_KEY')))

hf_generator = HuggingFaceAPIGenerator(
    api_type="serverless_inference_api",
    api_params={"model": "mistralai/Mistral-7B-Instruct-v0.1"},
    token=Secret.from_token(userdata.get('HF_API_KEY')))


hf_generator_2 = HuggingFaceAPIGenerator(
    api_type="serverless_inference_api",
    api_params={"model": "tiiuae/falcon-7b-instruct"},
    token=Secret.from_token(userdata.get('HF_API_KEY')))


hf_generator_3 = HuggingFaceAPIGenerator(
    api_type="serverless_inference_api",
    api_params={"model": "bigscience/bloom"},
    token=Secret.from_token(userdata.get('HF_API_KEY')))
tokenizer_config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]



tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]



tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]



special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]



tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]



special_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]



tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]



special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]
MODELS = [open_ai_generator, cohere_generator, hf_generator, hf_generator_2, hf_generator_3]

The AppendToken dataclass formats the output so that the model name is printed, and the text follows in chunks of 5 tokens.

from dataclasses import dataclass
import ipywidgets as widgets

def output():...

@dataclass
class AppendToken:
  output: widgets.Output
  chunks = []
  chunk_size = 5

  def __call__(self, chunk):
      with self.output:
        text = getattr(chunk, 'content', '')
        self.chunks.append(text)
        if len(self.chunks) == self.chunk_size:
          output_string = ' '.join(self.chunks)
          self.output.append_display_data(output_string)
          self.chunks.clear()

def multiprompt(prompt, models=MODELS):
  outputs = [widgets.Output(layout={'border': '1px solid black'}) for _ in models]
  display(widgets.HBox(children=outputs))

  for i, model in enumerate(models):
    model_name = getattr(model, 'model', '')
    outputs[i].append_display_data(f'Model name: {model_name}')
    model.streaming_callback = AppendToken(outputs[i])
    model.run(prompt)
multiprompt("Tell me a cyberpunk story about a black cat.")
HBox(children=(Output(layout=Layout(border='1px solid black')), Output(layout=Layout(border='1px solid black')…

This was a very silly example prompt. If you found this demo useful, let me know the kinds of prompts you tested it with!

Mastodon || Twitter || LinkedIn

Thanks for following along.