Quantcast
Channel: Weaviate Community Forum - Latest posts
Viewing all articles
Browse latest Browse all 3605

Update existing chunks in a document with more than QUERY_MAXIMUM_RESULTS entries

$
0
0

Description

We have a setup where we have multiple Documents, that are chunked into Chunks. For some of these documents, we have an automated service that updates the Document daily. To correctly update the documents we:

  1. Get all UUIDs of Chunks belonging to that specific Document
  2. Use generate a deterministic uuid5 to calculate the uuids for all new chunks
  3. Figure out which chunks to delete and which chunks to add
  4. Add only the new chunks
  5. Delete the chunks that are no longer relevant

This allows us to:

  • have a fallback if any of the steps fail
  • not reupload unnecessary Chunks
  • save some cost & bandwidth

However, step 1 is giving us some challenges, as to achieve that, we need to query all existing chunks. The ‘normal’ Get with offset doesn’t work above QUERY_MAXIMUM_RESULTS so the only other option we’ve seen so far has been to use the Cursor API, which requires us to dump our entire Weaviate instance, which can’t be the suggested way to achieve this.

So, I’m wondering how we’re supposed to solve this problem, we can’t find anything in the documentation so far, and we’re slightly scared of the implications of increasing the QUERY_MAXIMUM_RESULTS.

Server Setup Information

  • Weaviate Server Version: 1.24.6
  • Deployment Method: Docker
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: Python v3
  • Multitenancy?: Nope

Any additional Information

Not really


Viewing all articles
Browse latest Browse all 3605

Trending Articles