Haystack 2.5.0

⬆️ Upgrade Notes

  • Removed ChatMessage.to_openai_format method. Use haystack.components.generators.openai_utils._convert_message_to_openai_format instead.
  • Removed unused debug parameter from Pipeline.run method.
  • Removed deprecated SentenceWindowRetrieval. Use SentenceWindowRetriever instead.

🚀 New Features

  • Added the unsafe argument to enable behavior that could lead to remote code execution in ConditionalRouter and OutputAdapter. By default, unsafe behavior is disabled, and users must explicitly set unsafe=True to enable it. When unsafe is enabled, types such as ChatMessage, Document, and Answer can be used as output types. We recommend enabling unsafe behavior only when the Jinja template source is trusted. For more information, see the documentation for ConditionalRouter and OutputAdapter.

⚡️ Enhancement Notes

  • Adapts how ChatPromptBuilder creates ChatMessages. Messages are deep copied to ensure all meta fields are copied correctly.
  • The parameter, min_top_k, has been added to the TopPSampler. This parameter sets the minimum number of documents to be returned when the top-p sampling algorithm selects fewer documents than desired. Documents with the next highest scores are added to meet the minimum. This is useful when guaranteeing a set number of documents to pass through while still allowing the Top-P algorithm to determine if more documents should be sent based on scores.
  • Introduced a utility function to deserialize a generic Document Store from the init_parameters of a serialized component.
  • Refactor deserialize_document_store_in_init_parameters to clarify that the function operates in place and does not return a value.
  • The SentenceWindowRetriever now returns context_documents as well as the context_windows for each Document in retrieved_documents . This allows you to get a list of Documents from within the context window for each retrieved document.

⚠️ Deprecation Notes

  • The default model for OpenAIGenerator and OpenAIChatGenerator, previously ‘gpt-3.5-turbo’, will be replaced by ‘gpt-4o-mini’.

🐛 Bug Fixes

  • Fixed an issue where page breaks were not being extracted from DOCX files.
  • Used a forward reference for the Paragraph class in the DOCXToDocument converter to prevent import errors.
  • The metadata produced by DOCXToDocument component is now JSON serializable. Previously, it contained datetime objects automatically extracted from DOCX files, which are not JSON serializable. These datetime objects are now converted to strings.
  • Starting from haystack-ai==2.4.0, Haystack is compatible with sentence-transformers>=3.0.0; earlier versions of sentence-transformers are not supported. We have updated the test dependencies and LazyImport messages to reflect this change.
  • For components that support multiple Document Stores, prioritize using the specific from_dict class method for deserialization when available. Otherwise, fall back to the generic default_from_dict method. This impacts the following generic components: CacheChecker, DocumentWriter, FilterRetriever, and SentenceWindowRetriever.