Ok this turned out to be more awkward to narrow down than I expected and seems to require some rather odd specifics that maybe point to some other underlying issue?
Here is a reproducible example:
import weaviate
from weaviate.util import generate_uuid5
from weaviate.classes import config as wvc
client = weaviate.connect_to_local(
headers={"X-OpenAI-Api-Key": "<key>"}
)
# Create the collection and explicitly set a vectorizer
client.collections.delete("Test")
collection = client.collections.create(
"Test",
vectorizer_config=(
wvc.Configure.Vectorizer.text2vec_openai(
model="ada",
model_version="002",
)
),
properties=[
wvc.Property(name="non_vectorized", data_type=wvc.DataType.TEXT, skip_vectorization=True),
wvc.Property(name="vectorized_text", data_type=wvc.DataType.TEXT),
wvc.Property(name="vectorized_array", data_type=wvc.DataType.TEXT_ARRAY)
]
)
uuid = generate_uuid5("example1")
# Insert the new object
data = {"non_vectorized": "Original Text", "vectorized_text": "Original Text", "vectorized_array": []}
collection.data.insert(properties={**data}, uuid=uuid)
# Replacing a non-vectorized property on its own does not work
replace_data = {**data, "non_vectorized": "I Changed"}
collection.data.replace(properties=replace_data, uuid=uuid)
print(collection.query.fetch_objects().objects[0].properties)
# Replacing either vectorized property at the same time works
replace_data = {**data, "vectorized_text": "I Changed", "non_vectorized": "I Changed"}
collection.data.replace(properties=replace_data, uuid=uuid)
print(collection.query.fetch_objects().objects[0].properties)
replace_data = {**data, "vectorized_array": ["I Changed"], "non_vectorized": "I Changed"}
collection.data.replace(properties=replace_data, uuid=uuid)
print(collection.query.fetch_objects().objects[0].properties)
Here is what makes this strange:
- If I don’t explicitly set a vectorizer_config, this issue does not occur.
- If I don’t have the text array in my schema, this issue does not occur.