Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Current state: Accepted

...

To reduce network transmission and skip Plusar management, the new interface will allow users to input the path of some data files(json, numpy, etc.) on MinIO/S3 storage, and let the data nodes directly read these files and parse them into segments. The internal logic of the process becomes:

        1. client calls import() to pass some file paths to Milvus proxy node  

        2. proxy node passes the file paths to data coordinator node

        3. data coordinator node picks a data node or multiple data nodes (according to the sharding number) to parse files, each file can be parsed into a segment or multiple segments.

SDK Interfaces

The python API declaration:

def import(collection_name,  files, partition_name=None, bucket=None, default_fields=None)

  • collection_name:  the target collection name  (required)
  • partition_name: target partition name  (optional)
  • files: a list of files with row-based format or a dict of files with column-based format  (required)
  • bucket: the MinIO/S3 bucket where the files come from, same with Milvus server bucket by default  (optional)default_fields: a dict to set the default value for some fields (optional)


Pre-defined format for import files

...