Issue During Batch Insert

Description

I am using Weaviate locally with a Docker container and the Weaviate Python client. I encounter a “Deadline Exceeded” error when trying to insert a large batch of data.

Code:

import weaviate
import os

client = weaviate.Client(
    url="http://localhost:8080",
    additional_headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_API_KEY"]  # Replace with your inference API key
    },
)

client.schema.create_class({
    "class": "work_steps",
    "vectorizer": "text2vec-openai",
    "module_config": {
        "generative-openai": {}
    }
})

work_steps_data = [
    {"wtd_text": d["wtd_text"], "wta_text": d["wta_text"]}
    for d in data_json
]

# len(work_steps_data)  # 106954

try:
    client.batch.create_objects(work_steps_data)
except weaviate.exceptions.WeaviateBatchError as e:
    print(f"Error: {e}")

Error:

{
    "name": "WeaviateBatchError",
    "message": "Query call with protocol GRPC batch failed with message <AioRpcError of RPC that terminated with:
    status = StatusCode.DEADLINE_EXCEEDED
    details = \"Deadline Exceeded\"
    debug_error_string = \"UNKNOWN:Error received from peer  {grpc_message:\"Deadline Exceeded\", grpc_status:4, created_time:\"2024-08-01T18:14:40.555441469+04:00\"}\"
>.",
    ...
}

docker-compose.yaml

version: '3.4'
services:
  weaviate:
    image: semitechnologies/weaviate:1.25.6
    ports:
      - 8080:8080
      - 50051:50051
    volumes:
      - weaviate_data:/var/lib/weaviate
    environment:
      CLIP_INFERENCE_API: 'http://multi2vec-clip:8080'
      OPENAI_APIKEY: $OPENAI_APIKEY
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'multi2vec-clip'
      ENABLE_MODULES: 'multi2vec-clip,generative-openai,generative-cohere,text2vec-openai,text2vec-huggingface,text2vec-cohere,reranker-cohere'
      CLUSTER_HOSTNAME: 'node1'
    restart: on-failure:0
  multi2vec-clip:
    image: semitechnologies/multi2vec-clip:sentence-transformers-clip-ViT-B-32-multilingual-v1
    environment:
      ENABLE_CUDA: '0'
volumes:
  weaviate_data:

Additional Information

Docker Logs:

weaviate-1        | {"action":"startup","default_vectorizer_module":"multi2vec-clip","level":"info","msg":"the default vectorizer modules is set to \"multi2vec-clip\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-08-01T07:04:33Z"}
...
weaviate-1        | {"level":"warning","msg":"prop len tracker file /var/lib/weaviate/work_steps/iPkMMMILWoTR/proplengths does not exist, creating new tracker","time":"2024-08-01T08:54:43Z"}
...
multi2vec-clip-1  | INFO:     Model initialization complete
...
weaviate-1        | {"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50051","time":"2024-08-01T07:04:38Z"}

Problem

I am trying to insert a large dataset (about 106,954 records) into Weaviate, but I keep encountering a “Deadline Exceeded” error when using the batch insert functionality.

Questions

How can I avoid the “Deadline Exceeded” error during batch insertion?
Are there any recommended configurations or settings for handling large batch inserts?
Is there a way to increase the timeout settings for GRPC batch operations in Weaviate?

Any assistance or recommendations would be greatly appreciated. Thank you!

P.S:

I am used below workaround for the batch upsert to avoid any errors:

work_step_col = client.collections.get("work_steps")
# work_step_col.data.insert_many(work_steps_data)

import time

batch_size = 1000  # Adjust the batch size as needed
for i in range(0, len(work_steps_data), batch_size):
    batch = work_steps_data[i:i + batch_size]
    work_step_col.data.insert_many(batch)
    time.sleep(2)

I has been 15 minutes and counting, so I am posting this anyway

Issue During Batch Insert

Description

Code:

Error:

docker-compose.yaml

Additional Information

Docker Logs:

Problem

Questions

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112