Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Current state: Accepted

...

To reduce network transmission and skip Plusar management, the new interface will allow users to input the path of some data files(json, numpy, etc.) on MinIO/S3 storage, and let the data nodes directly read these files and parse them into segments. The internal logic of the process becomes:

        1. client calls import() to pass some file paths to Milvus proxy node  

        2. proxy node passes the file paths to data coordinator

        3. data coordinator pick a data node or multiple data node (according to the files count) to parse files, each file can be parsed to a segment or multiple segments.


Some points to consider:

  • JSON format is flexible, ideally, the import  API ought to parse user's JSON files without asking user to reformat the files according to a strict rule.
  • Users can store scalar fields and vector fields in a JSON file, with row-based or column-based. The import() API can support both of them.

...


The "options" for other SDK is not JSON object. A declaration for JAVA SDK For Java SDk, a declaration could be:

Code Block
public class ImportParam {

...


  private MinioDataSource data_source;

...


  private List<DataFile> external_files;

...


}




RPC Interfaces

    The declaration of import API in RPC level:

Code Block
rpc Import(ImportRequest) returns (MutationResult) {}

message ImportRequest {
  common.MsgBase base = 1;
  string options = 2;
}


message MutationResult {
  common.Status status = 1;
  schema.IDs IDs = 2; // required for insert, delete
  repeated uint32 succ_index = 3; // error indexes indicate
  repeated uint32 err_index = 4; // error indexes indicate
  bool acknowledged = 5;
  int64 insert_cnt = 6;
  int64 delete_cnt = 7;
  int64 upsert_cnt = 8;
  uint64 timestamp = 9;
}




Internal machinery



Test Plan