here is same code, by switching to local connection; the counts of object is 25 iner fastapi_onazure-t2v-transformers-1 Started 0.2s
Container fastapi_onazure-contextionary-1 Started 0.2s
Container fastapi_onazure-weaviate-1 Started 0.3s
(.venv) connie.wang@Connies-MacBook-Pro-M3 fastapi_onazure % python app/rag/with_weaviate/*create.py
2024-11-13 16:43:35,687 - INFO - === configs.py - blob_name for azure: rag/data/constitution.pdf
2024-11-13 16:43:35,688 - INFO - === configs.py - pdf_file_path : /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data
2024-11-13 16:43:36,087 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings “HTTP/1.1 200 OK”
== 0.1. embeddings initiated from embedding_openai.py: text-embedding-ada-002 and dimension: 1536
2024-11-13 16:43:36,125 - INFO - HTTP Request: GET http://localhost:8080/v1/.well-known/openid-configuration “HTTP/1.1 404 Not Found”
2024-11-13 16:43:36,148 - INFO - HTTP Request: GET http://localhost:8080/v1/meta “HTTP/1.1 200 OK”
2024-11-13 16:43:36,243 - INFO - HTTP Request: GET https://pypi.org/pypi/weaviate-client/json “HTTP/1.1 200 OK”
2024-11-13 16:43:36,273 - INFO - === vectore_stores.py - embeded client initated <weaviate.client.WeaviateClient object at 0x30057cb90>
2024-11-13 16:43:36,276 - INFO - HTTP Request: GET http://localhost:8080/v1/schema/PDF_COLLECTION “HTTP/1.1 404 Not Found”
2024-11-13 16:43:36,279 - INFO - HTTP Request: GET http://localhost:8080/v1/schema/PDF_COLLECTION “HTTP/1.1 404 Not Found”
2024-11-13 16:43:36,452 - INFO - HTTP Request: POST http://localhost:8080/v1/schema “HTTP/1.1 200 OK”
2024-11-13 16:43:36,456 - INFO -
=== file_path: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/.DS_Store
2024-11-13 16:43:36,456 - INFO -
Document /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/.DS_Store Processing Status:
{
“status”: true,
“message”: ,
“error”:
}
2024-11-13 16:43:36,456 - INFO -
=== file_path: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf
chunking_recursiveCharacterTextSplitter.py: file is being chunked: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf
2024-11-13 16:43:37,267 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 0 - Chunk 0
2024-11-13 16:43:37,523 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 1 - Chunk 1
2024-11-13 16:43:37,909 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 2 - Chunk 2
2024-11-13 16:43:38,386 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 3 - Chunk 3
2024-11-13 16:43:38,785 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 4 - Chunk 4
2024-11-13 16:43:39,070 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 5 - Chunk 5
2024-11-13 16:43:39,445 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 6 - Chunk 6
2024-11-13 16:43:40,080 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 7 - Chunk 7
2024-11-13 16:43:40,446 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 8 - Chunk 8
2024-11-13 16:43:40,799 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 9 - Chunk 9
2024-11-13 16:43:41,220 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 10 - Chunk 10
2024-11-13 16:43:41,776 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 11 - Chunk 11
2024-11-13 16:43:42,107 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 12 - Chunk 12
2024-11-13 16:43:42,344 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 13 - Chunk 13
2024-11-13 16:43:42,607 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 14 - Chunk 14
2024-11-13 16:43:43,177 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 15 - Chunk 15
2024-11-13 16:43:43,409 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 16 - Chunk 16
2024-11-13 16:43:43,935 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 17 - Chunk 17
2024-11-13 16:43:44,466 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 18 - Chunk 18
2024-11-13 16:43:44 - All chunks inserted for /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf
2024-11-13 16:43:44,468 - INFO -
Document /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf Processing Status:
{
“status”: true,
“message”: ,
“error”:
}
2024-11-13 16:43:44,468 - INFO -
=== file_path: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf
chunking_recursiveCharacterTextSplitter.py: file is being chunked: /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf
2024-11-13 16:43:44,761 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 0 - Chunk 0
2024-11-13 16:43:45,217 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 1 - Chunk 1
2024-11-13 16:43:45,741 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 2 - Chunk 2
2024-11-13 16:43:46,267 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 3 - Chunk 3
2024-11-13 16:43:46,606 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 4 - Chunk 4
2024-11-13 16:43:47,222 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 5 - Chunk 5
2024-11-13 16:43:47,746 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 6 - Chunk 6
2024-11-13 16:43:48,216 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 7 - Chunk 7
2024-11-13 16:43:48,458 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 8 - Chunk 8
2024-11-13 16:43:48,934 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 9 - Chunk 9
2024-11-13 16:43:49,185 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 10 - Chunk 10
2024-11-13 16:43:49,495 - INFO - HTTP Request: POST http://localhost:8080/v1/objects “HTTP/1.1 200 OK”
Inserted: Page 11 - Chunk 11
2024-11-13 16:43:49 - All chunks inserted for /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf
2024-11-13 16:43:49,497 - INFO -
Document /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf Processing Status:
{
“status”: true,
“message”: ,
“error”:
}
2024-11-13 16:43:49,498 - INFO - === utils.py url: http://localhost:8080/v1/objects/
2024-11-13 16:43:49,509 - INFO - === utils.py
{‘classes’: [{‘class’: ‘PDF_COLLECTION’, ‘invertedIndexConfig’: {‘bm25’: {‘b’: 0.75, ‘k1’: 1.2}, ‘cleanupIntervalSeconds’: 60, ‘indexNullState’: True, ‘indexPropertyLength’: True, ‘indexTimestamps’: True, ‘stopwords’: {‘additions’: None, ‘preset’: ‘en’, ‘removals’: None}}, ‘moduleConfig’: {‘generative-cohere’: {}, ‘text2vec-openai’: {‘baseURL’: ‘https://api.openai.com’, ‘model’: ‘ada’, ‘vectorizeClassName’: True}}, ‘multiTenancyConfig’: {‘autoTenantActivation’: False, ‘autoTenantCreation’: False, ‘enabled’: False}, ‘properties’: [{‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_content’, ‘tokenization’: ‘word’}, {‘dataType’: [‘int’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: False, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_number’}, {‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘source’, ‘tokenization’: ‘word’}], ‘replicationConfig’: {‘asyncEnabled’: False, ‘deletionStrategy’: ‘DeleteOnConflict’, ‘factor’: 1}, ‘shardingConfig’: {‘actualCount’: 1, ‘actualVirtualCount’: 128, ‘desiredCount’: 1, ‘desiredVirtualCount’: 128, ‘function’: ‘murmur3’, ‘key’: ‘_id’, ‘strategy’: ‘hash’, ‘virtualPerPhysical’: 128}, ‘vectorIndexConfig’: {‘bq’: {‘enabled’: True}, ‘cleanupIntervalSeconds’: 300, ‘distance’: ‘cosine’, ‘dynamicEfFactor’: 8, ‘dynamicEfMax’: 500, ‘dynamicEfMin’: 100, ‘ef’: -1, ‘efConstruction’: 128, ‘filterStrategy’: ‘sweeping’, ‘flatSearchCutoff’: 40000, ‘maxConnections’: 32, ‘pq’: {‘bitCompression’: False, ‘centroids’: 256, ‘enabled’: False, ‘encoder’: {‘distribution’: ‘log-normal’, ‘type’: ‘kmeans’}, ‘segments’: 0, ‘trainingLimit’: 100000}, ‘skip’: False, ‘sq’: {‘enabled’: False, ‘rescoreLimit’: 20, ‘trainingLimit’: 100000}, ‘vectorCacheMaxObjects’: 1000000000000}, ‘vectorIndexType’: ‘hnsw’, ‘vectorizer’: ‘text2vec-openai’}]}
2024-11-13 16:43:49,520 - INFO -
=== utils.py total objects 25 in PDF_COLLECTION
2024-11-13 16:43:49,520 - INFO - === utils.py counts per file
{
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf”: 14,
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf”: 11
}
{‘status’: True, ‘message’: [‘25 already in http://localhost:8080/v1/objects/’], ‘error’: }
2024-11-13 16:43:49,521 - INFO - === *created.py - url: http://localhost:8080/v1/objects/
2024-11-13 16:43:49,521 - INFO - === *created.py - object_count: 25
2024-11-13 16:43:49,521 - INFO -
Document Processing Status: for /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data
{
“status”: true,
“message”: [
“25 already in http://localhost:8080/v1/objects/”
],
“error”:
}
(.venv) connie.wang@Connies-MacBook-Pro-M3 fastapi_onazure % python app/rag/with_weaviate/utils/utils.py
2024-11-13 16:44:09,454 - INFO - === configs.py - blob_name for azure: rag/data/constitution.pdf
2024-11-13 16:44:09,454 - INFO - === configs.py - pdf_file_path : /Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data
2024-11-13 16:44:09,465 - INFO - HTTP Request: GET http://localhost:8080/v1/.well-known/openid-configuration “HTTP/1.1 404 Not Found”
2024-11-13 16:44:09,479 - INFO - HTTP Request: GET http://localhost:8080/v1/meta “HTTP/1.1 200 OK”
2024-11-13 16:44:09,571 - INFO - HTTP Request: GET https://pypi.org/pypi/weaviate-client/json “HTTP/1.1 200 OK”
2024-11-13 16:44:09,597 - INFO - === vectore_stores.py - embeded client initated <weaviate.client.WeaviateClient object at 0x107d5a0c0>
2024-11-13 16:44:09,597 - INFO - === utils.py url: http://localhost:8080/v1/objects/
2024-11-13 16:44:09,603 - INFO - === utils.py
{‘classes’: [{‘class’: ‘PDF_COLLECTION’, ‘invertedIndexConfig’: {‘bm25’: {‘b’: 0.75, ‘k1’: 1.2}, ‘cleanupIntervalSeconds’: 60, ‘indexNullState’: True, ‘indexPropertyLength’: True, ‘indexTimestamps’: True, ‘stopwords’: {‘additions’: None, ‘preset’: ‘en’, ‘removals’: None}}, ‘moduleConfig’: {‘generative-cohere’: {}, ‘text2vec-openai’: {‘baseURL’: ‘https://api.openai.com’, ‘model’: ‘ada’, ‘vectorizeClassName’: True}}, ‘multiTenancyConfig’: {‘autoTenantActivation’: False, ‘autoTenantCreation’: False, ‘enabled’: False}, ‘properties’: [{‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_content’, ‘tokenization’: ‘word’}, {‘dataType’: [‘int’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: False, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘page_number’}, {‘dataType’: [‘text’], ‘indexFilterable’: True, ‘indexRangeFilters’: False, ‘indexSearchable’: True, ‘moduleConfig’: {‘text2vec-openai’: {‘skip’: False, ‘vectorizePropertyName’: True}}, ‘name’: ‘source’, ‘tokenization’: ‘word’}], ‘replicationConfig’: {‘asyncEnabled’: False, ‘deletionStrategy’: ‘DeleteOnConflict’, ‘factor’: 1}, ‘shardingConfig’: {‘actualCount’: 1, ‘actualVirtualCount’: 128, ‘desiredCount’: 1, ‘desiredVirtualCount’: 128, ‘function’: ‘murmur3’, ‘key’: ‘_id’, ‘strategy’: ‘hash’, ‘virtualPerPhysical’: 128}, ‘vectorIndexConfig’: {‘bq’: {‘enabled’: True}, ‘cleanupIntervalSeconds’: 300, ‘distance’: ‘cosine’, ‘dynamicEfFactor’: 8, ‘dynamicEfMax’: 500, ‘dynamicEfMin’: 100, ‘ef’: -1, ‘efConstruction’: 128, ‘filterStrategy’: ‘sweeping’, ‘flatSearchCutoff’: 40000, ‘maxConnections’: 32, ‘pq’: {‘bitCompression’: False, ‘centroids’: 256, ‘enabled’: False, ‘encoder’: {‘distribution’: ‘log-normal’, ‘type’: ‘kmeans’}, ‘segments’: 0, ‘trainingLimit’: 100000}, ‘skip’: False, ‘sq’: {‘enabled’: False, ‘rescoreLimit’: 20, ‘trainingLimit’: 100000}, ‘vectorCacheMaxObjects’: 1000000000000}, ‘vectorIndexType’: ‘hnsw’, ‘vectorizer’: ‘text2vec-openai’}]}
2024-11-13 16:44:09,609 - INFO -
=== utils.py total objects 25 in PDF_COLLECTION
2024-11-13 16:44:09,609 - INFO - === utils.py counts per file
{
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/constitution.pdf”: 14,
“/Users/connie.wang/Desktop/connie/inspiration_azure/fastapi_onazure/app/rag/with_weaviate/data/what_is_a_constitution.pdf”: 11
}