Page Comparison

Current state: Accepted

...

To reduce network transmission and skip Plusar management, the new interface will allow users to input the path of some data files(json, numpy, etc.) on MinIO/S3 storage, and let the data nodes directly read these files and parse them into segments. The internal logic of the process becomes:

1. client calls import() to pass some file paths to Milvus proxy node

2. proxy node passes the file paths to data coordinator node

3. data coordinator node picks a data node or multiple data nodes (according to the sharding number) to parse files, each file can be parsed into a segment or multiple segments.

SDK Interfaces

The python API declaration:

...

Assume we have a collection with 2 fields(one primary key and one vector field) and 5 rows:

uid	vector
101	[1.1, 1.2, 1.3, 1.4]
102	[2.1, 2.2, 2.3, 2.4]
103	[3.1, 3.2, 3.3, 3.4]
104	[4.1, 4.2, 4.3, 4.4]
105	[5.1, 5.2, 5.3, 5.4]

There are two ways to represent the collection with data files:

(1) Row-based data file, a JSON file contains multiple rows.

file_1.json:

Code Block

{ 
  {"uid": 101, "vector": [1.1, 1.2, 1.3, 1.4]}, 
  {"uid": 102, "vector": [2.1, 2.2, 2.3, 2.4]}, 
  {"uid": 103, "vector": [3.1, 3.2, 3.3, 3.4]}, 
  {"uid": 104, "vector": [4.1, 4.2, 4.3, 4.4]}, 
  {"uid": 105, "vector": [5.1, 5.2, 5.3, 5.4]},
}

...

Code Block
import(collection_name="test", files={"uid": "file_1.json", "vector": "file_2.json"})

We also allow user to store vectors in a Numpy file, let's say the "vector" field is stored in file_2.npy, then we can call import():

...

Versions Compared

Old Version 46

New Version 47

Key

SDK Interfaces