Page Comparison

Current state: Accepted

...

To reduce network transmission and skip Plusar management, the new interface will allow users to input the path of some data files(json, numpy, etc.) on MinIO/S3 storage, and let the data nodes directly read these files and parse them into segments. The internal logic of the process becomes:

1. client calls import() to pass some file paths to Milvus proxy node

2. proxy node passes the file paths to data coordinator node

3. data coordinator node picks a data node or multiple data nodes (according to the sharding number) to parse files, each file can be parsed into a segment or multiple segments.

SDK Interfaces

The python API declaration:

def import(collection_name, files, partition_name=None, bucketoptions=None)

collection_name: the target collection name (required)
partition_name: target partition name (optional)
files: a list of files with row-based format or a dict of files with column-based format (required)bucket
row-based files: ["file_1.json", "file_2.json"]
column-based files: {"id": "id.json", "vectors": "embeddings.npy"}
options: extra options in JSON format, for example: the MinIO/S3 bucket where the files come from, same with Milvus server bucket by default from (optional)
{"bucket": "mybucket"}

Pre-defined format for import files

...

Versions Compared

Old Version 44

New Version 45

Key

SDK Interfaces

Pre-defined format for import files