Quantcast
Channel: Weaviate Community Forum - Latest posts
Viewing all articles
Browse latest Browse all 3601

WAL's folder grows unlimittely

$
0
0

Check. Yeah, what’s happening here is that your individual chunks are already larger than the threshold to compact further. If you have PERSISTENCE_HNSW_MAX_LOG_SIZE: 4GiB that means that a file larger than 4GiB will not be considered for compaction.

However, the HNSW commit logs are delta logs. If log 2 deletes something that was created in log 1, but log 1+2 are too big to be compacted, then the information from log 1 cannot be removed effectively.

Limiting the max size is essentially a memory vs disk space trade-off. It sounds like in your case, you are suffering from a lot of disk growth, so it might be worth considering allowing some more memory, so files can still be compacted effectively. I can’t say what the ideal values are in this case, but if you still have memory available, I would try increasing the value.

Note on all of the above: This mainly describes the current implementation and doesn’t mean this can’t be improved. We’ve discussed two options internally:

  1. Some sort of in-place compaction (where you remove redundant info from log 1 based on the fact that you know it will be present in log 2). This isn’t trivial because it’s not always clear which information has to persist and which is fully overridden. There are some commit types where it’s pretty clear (e.g. replace_links_at_level means any link set at that level previously is no longer needed)
  2. A full graph-to-disk dump (possibly at shutdown). If the deltas on disk have grown considerably larger than the actual graph, it might make sense to discard all logs and flush a perfect representation of the graph to disk to replace all historic logs.

Viewing all articles
Browse latest Browse all 3601

Trending Articles