Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Current state: Accepted

...

Released: with Milvus 2.1 

Authors:    yhmo

Summary

Import data by a shortcut to get better performance compared with insert(). 

...

To reduce network transmission and skip Plusar management, the new interface will allow users to input the path of some data files(json, numpy, etc.) on MinIO/S3 storage, and let the data nodes directly read these files and parse them into segments. The internal logic of the process becomes:

        1. client calls import() to pass some file paths to Milvus proxy node  

        2. proxy node passes the file paths to data coordinator node

        3. data coordinator node picks a data node or multiple data nodes (according to the sharding number) to parse files, each file can be parsed into a segment or multiple segments.

SDK Interfaces

The python API declaration:

...

Code Block
import(collection_name="test", files={"uid": "file_1.json", "vector": "file_2.npy"})



Protobuf Interfaces

Code Block
service MilvusService {
  rpc Import(ImportRequest) returns (ImportResponse) {}
  rpc GetImportState(GetImportStateRequest)

...

 returns (GetImportStateResponse) {}
}

message ImportRequest {
  string collection_name = 1;
  string partition_name = 2;
  bool rowBased = 3;
  repeated string files = 4;
  repeated common.KeyValuePair options = 5;
}

message ImportResponse {
  common.Status status = 1;
  repeated int64 taskIDs = 2;
}

message GetImportStateRequest {
  int64 taskID = 1;
}

message GetImportStateResponse {
  common.Status status = 1;
  bool finished = 2;
  int64 rowCount = 3;
}




Proxy Interfaces

    The declaration of import API in proxy RPC:

Code Block
service MilvusService {
  rpc Import(ImportRequest) returns (ImportResponse) {}
}

message ImportRequest {
  common.MsgBase base = 1;
  string options = 2;      // options in JSON format
}


message ImportResponse {
  common.Status status = 1;
  repeated schema.IDs IDs = 2;    // auto-generated ids for succeed chunks
  uint32 succ_index = 3;          // number of chunks that successfully imported
}


Datacoord

...

interfaces

The declaration of import API in datacoord RPC:

Code Block
service DataCoord {
  rpc Import(milvuspb.ImportRequest) (milvuspb.ImportResponse) {}
  rpc CompleteImport(ImportResult) returns (common.Status) {}
}

message ImportResult {
  common.Status status = 1;
  schema.IDs IDs = 2;             // auto-generated ids
  repeated int64 segments = 3;    // id array of new sealed segments
}


Datanode

...

interfaces

The declaration of import API in datanode RPC:

...