Quantcast
Channel: Weaviate Community Forum - Latest posts
Viewing all articles
Browse latest Browse all 3881

How to load existing db to similarity search?

$
0
0

weaviate-client==4.7.1
langchain-weaviate==0.0.2
langchain==0.2.11

I am able to create a simple example to create a ‘db’ and use that db to do inference in one flow:

from bge import bge_m3_embedding

print(f'Read in text ...')
loader = TextLoader('state_of_the_union.txt')
documents = loader.load()

print('Split text ...')
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

print('Load embedding model ...')
embedding_model = bge_m3_embedding
print('Embed docs ...')
weaviate_client = weaviate.connect_to_local()
db = WeaviateVectorStore.from_documents(docs, embedding_model, client=weaviate_client, index_name='test')

#db = WeaviateVectorStore.from_documents([], embedding_model, client=weaviate_client, index_name='test')
# print('Perform search ...')
query = 'What did the president say about Ketanji Brown Jackson'
results = db.similarity_search_with_score(query, alpha=1)
for i, doc in enumerate(results):
    print(f'{i}--->{doc[1]:.3f}')
print(results[0])
#
weaviate_client.close()

This works all fine. The db is created and similar docs are retrieved. However, now if I wan to use this ‘db’ to run the same query, I got an outofindex error:

print('Load embedding model ...')
embedding_model = bge_m3_embedding
print('Load embedded docs ...')
weaviate_client = weaviate.connect_to_local()

db = WeaviateVectorStore.from_documents([], embedding_model, client=weaviate_client, index_name='test')
# print('Perform search ...')
query = 'What did the president say about Ketanji Brown Jackson'
results = db.similarity_search_with_score(query, alpha=1)
for i, doc in enumerate(results):
    print(f'{i}--->{doc[1]:.3f}')
print(results[0])

And the error message is below:

Traceback (most recent call last):
  File "/Users/I747411/ai/lc_weaviate.py", line 22, in <module>
    db = WeaviateVectorStore.from_documents([], embedding_model, client=weaviate_client, index_name='test')
  File "/Users/I747411/ai/venv/lib/python3.10/site-packages/langchain_core/vectorstores/base.py", line 1058, in from_documents
    return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
  File "/Users/I747411/ai/venv/lib/python3.10/site-packages/langchain_weaviate/vectorstores.py", line 487, in from_texts
    weaviate_vector_store.add_texts(texts, metadatas, tenant=tenant, **kwargs)
  File "/Users/I747411/ai/venv/lib/python3.10/site-packages/langchain_weaviate/vectorstores.py", line 165, in add_texts
    embeddings = self._embedding.embed_documents(list(texts))
  File "/Users/I747411/ai/venv/lib/python3.10/site-packages/langchain_community/embeddings/huggingface.py", line 331, in embed_documents
    embeddings = self.client.encode(
  File "/Users/I747411/ai/venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 565, in encode
    if all_embeddings[0].dtype == torch.bfloat16:
IndexError: list index out of range
/Users/I747411/ai/venv/lib/python3.10/site-packages/weaviate/warnings.py:303: ResourceWarning: Con004: The connection to Weaviate was not closed properly. This can lead to memory leaks.
            Please make sure to close the connection using `client.close()`.

Please see the error message: “IndexError: list index out of range”.

What’s the proper way to use existing vector db to do inference? Please help!


Viewing all articles
Browse latest Browse all 3881

Trending Articles