Quantcast
Channel: Weaviate Community Forum - Latest posts
Viewing all articles
Browse latest Browse all 3590

Not Equal Filter with Word Tokenization with non-alphanumeric characters

$
0
0

hi @dhanshew72 !

because you had tokenization set to word, the property value test.com/2 will be tokenized as test com 2

This will proves our point:

client.collections.delete("Test")
collection = client.collections.create(
    name="Test",
    vectorizer_config=wvc.config.Configure.Vectorizer.none(),
)

collection.data.insert_many([
    {"text": "test.com/2"},
    {"text": "test.com/3"},
    {"text": "test.com/4"},

])

now we query:

results = collection.query.fetch_objects(
    filters=(
        wvc.query.Filter.by_property("text").equal("test") & 
        wvc.query.Filter.by_property("text").equal("com") & 
        wvc.query.Filter.by_property("text").equal("2")
    )
)
for i in results.objects:
    print("###")
    print(i.properties)

results:

{‘text’: ‘test.com/2’}

As you want to exclude that filtered objects, not equal on a word tokenization will not help you.

So you can try adding a new property, with the field tokenization, and then filling in the content of that property so you can filter it out.

Let me know if this helps :slight_smile:


Viewing all articles
Browse latest Browse all 3590

Trending Articles