Current state: Accepted
...
To reduce network transmission and skip Plusar management, the new interface will allow users to input the path of some data files(json, numpy, etc.) on MinIO/S3 storage, and let the data nodes directly read these files and parse them into segments. The internal logic of the process becomes:
1. client calls import() to pass some file paths to Milvus proxy node
2. proxy node passes the file paths to data coordinator
3. data coordinator pick a data node or multiple data node (according to the files count) to parse files, each file can be parsed to a segment or multiple segments.
Some points to consider:
- JSON format is flexible, ideally, the import API ought to parse user's JSON files without asking user to reformat the files according to a strict rule.
- Users can store scalar fields and vector fields in a JSON file, with row-based or column-based. The import() API can support both of them.
...
The "options" for other SDK is not JSON object. A declaration for JAVA SDK For Java SDk, a declaration could be:
Code Block |
---|
public class ImportParam { |
...
private MinioDataSource data_source; |
...
private List<DataFile> external_files; |
...
} |
RPC Interfaces
The declaration of import API in RPC level:
Code Block |
---|
rpc Import(ImportRequest) returns (MutationResult) {}
message ImportRequest {
common.MsgBase base = 1;
string options = 2;
}
message MutationResult {
common.Status status = 1;
schema.IDs IDs = 2; // required for insert, delete
repeated uint32 succ_index = 3; // error indexes indicate
repeated uint32 err_index = 4; // error indexes indicate
bool acknowledged = 5;
int64 insert_cnt = 6;
int64 delete_cnt = 7;
int64 upsert_cnt = 8;
uint64 timestamp = 9;
}
|