Description
I am trying to use weaviate v4 python client to batch import data into my weaviate. This is the code setup:
client = weaviate.connect_to_local(WEAVIATE_HOST, WEAVIATE_PORT)
data_jsons = ... # a list of dict of key/values that match up with the collection schema
collection = client.collections.get('my_collection')
try:
with collection.batch.dynamic() as batch:
for a_json in tqdm(data_jsons[:10000]):
key = create_key(a_json) # could be a hash of the data
vector = a_json.pop('vector') # bring my own vector use case
batch.add_object(properties=a_json,
uuid=key,
vector=vector)
failed_objects = collection.batch.failed_objects
if len(failed_objects) > 0:
raise Exception(f"Failed to insert {len(failed_objects)} objects")
except Exception as e:
print(f"Error: {e}")
when there’s intermittent failure, it will complete and failed_objects will indeed be >0, such that I can raise the error to the caller.
However, if the weaviate instance is permanently down (I just pause it to simulate this), then the above code will take a long time to complete and slowly printing out something like:
UserWarning: Bat003: The dynamic batch-size could not be refreshed successfully: error WeaviateTimeoutError('The request to Weaviate timed out while awaiting a response. Try adjusting the timeout config for your client. Details: ')
warnings.warn(
{'message': 'Failed to send 260 objects in a batch of 260. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
{'message': 'Failed to send 260 objects in a batch of 260. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
{'message': 'Failed to send 260 objects in a batch of 260. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
{'message': 'Failed to send 260 objects in a batch of 260. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
{'message': 'Failed to send 260 objects in a batch of 260. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
{'message': 'Failed to send 260 objects in a batch of 260. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
{'message': 'Failed to send 260 objects in a batch of 260. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
{'message': 'Failed to send 110 objects in a batch of 110. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
Error: Failed to insert 1930 objects
I think these are from the logger in weaviate and it seems add_object never throw any exceptions all along (so the try/except is actually useless above). What I want to achieve is if there are 3 messages like this getting triggered, I want it to just quit and throw exception. Right now, it seems to be waiting for a timeout, then do something, trigger that message, then timeout again, which result in this code running for a very long time before it hits my raise Exception.
Is there a proper way to handle connection error (e.g. if weaviate instance just died)? my goal is I dont want a very large batch import job to get stuck forever.
Server Setup Information
- Weaviate Server Version: 1.27.0
- Deployment Method: docker on Mac OS
- Multi Node? Number of Running Nodes: 1 (no multi tenancy, no replication, no cluster)
- Client Language and Version: En
- Multitenancy?: No
Any additional Information
I didnt specify any specific timeout in the client. its just plain simple connect_to_local(WEAVIATE_HOST, WEAVIATE_PORT)