Quantcast
Channel: Weaviate Community Forum - Latest posts
Viewing all articles
Browse latest Browse all 3605

Is pre-filtering not supported for hybrid search?

$
0
0

Description

We’re currently using Pinecone in our company and would like to extend to an engine that lets us perform hybrid search, as doing this in Pinecone is non-trivial. We’ve looked at OpenSearch’s Neural Search plugin, but the problem is that this plugin also doesn’t allow for pre-filtering using Boolean queries as there seems to be an incompatibility with Lucene.

The way that we’re using our vector database is that we basically have a single index that contains a lot of different vectors. One good example as to why we need pre-filtering is because our clients use different languages. If a user’s query is in, for example, English, then we would only want to search within the subset of English vectors (i.e., metadata.lang == "en").

I thought that Weaviate supported this but it seems like it doesn’t?

Here’s my setup:

filters = (
    Filter.by_property("type").equal("dummy-type") &
    Filter.by_property("lang").equal("en")
)

dense_search_results = weaviate_index.query.near_vector(
    near_vector=query_embedding_vector,
    limit=20,
    return_metadata=MetadataQuery(distance=True),
    filters=filters,
)

hybrid_search_results = weaviate_index.query.hybrid(
    query=query_text,
    vector=query_embedding_vector,
    alpha=0.5,
    limit=20,
    return_metadata=MetadataQuery(score=True),
    filters=filters,
    fusion_type=HybridFusion.RELATIVE_SCORE,
)

As you can see, I’m using a type called "dummy-type" for testing purposes. the dense_search_results is correctly [], but the hybrid_search_results just returns a bunch of different vectors that seem to completely disregard the filtering logic.

Adding post-filtering logic isn’t really an option right now, since making that work reliably also doesn’t seem that easy to do.

Any opinions are appreciated. Thanks!

Server Setup Information

  • Weaviate Server Version: 1.30.0
  • Deployment Method: Docker
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: Python 3.12
  • Multitenancy?: No.

Viewing all articles
Browse latest Browse all 3605

Trending Articles