Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Current state: Accepted

...

To reduce network transmission and skip Plusar management, the new interface will allow users to input the path of some data files(json, numpy, etc.) on MinIO/S3 storage, and let the data nodes directly read these files and parse them into segments. The internal logic of the process becomes:

        1. client calls import() to pass some file paths to Milvus proxy node  

        2. proxy node passes the file paths to data coordinator node

        3. data coordinator node picks a data node or multiple data nodes (according to the sharding number) to parse files, each file can be parsed into a segment or multiple segments.

1.  SDK Interfaces

The python API declaration:

...

  • Return error "File xxx doesn't exist" if could not open the file. 
  • The row count of each field must be equal, otherwise, return the error "Inconsistent row count between field xxx and xxx". (all segments generated by this file will be abandoned)
  • If a vector dimension doesn't equal to field schema, return the error "Incorrect vector dimension for field xxx". (all segments generated by this file will be abandoned)

2. Proxy RPC Interfaces

Code Block
service MilvusService {
  rpc Import(ImportRequest) returns (ImportResponse) {}
  rpc GetImportState(GetImportStateRequest) returns (GetImportStateResponse) {}
}

message ImportRequest {
  string collection_name = 1;  // target collection
  string partition_name = 2;  // target partition
  bool row_based = 3;  // the file is row-based or column-based
  repeated string files = 4;  // file paths to be imported
  repeated common.KeyValuePair options = 5;  // import options
}

message ImportResponse {
  common.Status status = 1;
  repeated int64 tasks = 2;  // id array of import tasks
}

message GetImportStateRequest {
  int64 task = 1;  // id of an import task
}

enum ImportState {
  Pending = 0;
  Executing = 1;
  Completed = 2;
  Failed = 3;
}

message GetImportStateResponse {
  common.Status status = 1;
  ImportState state = 2;  // is this import task finished or not
  int64 row_count = 3;  // if the task is finished, this value is how many rows are imported. if the task is not finished, this value is how many rows are parsed.
}

...