Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Current state: Accepted

...

To reduce network transmission and skip Plusar management, the new interface will allow users to input the path of some data files(json, numpy, etc.) on MinIO/S3 storage, and let the data nodes directly read these files and parse them into segments. The internal logic of the process becomes:

        1. client calls import() to pass some file paths to Milvus proxy node  

        2. proxy node passes the file paths to root coordinator, then root coordinator passes to data coordinator

        3. data coordinator node picks a data node or multiple data nodes (according to the sharding number) to parse files, each file can be parsed into a segment or multiple segments.

        4. once a task is finished, data node report to data coordinator, and data coordinator report to root coordinator, the generated segments will be sent to index node to build index

        5. the root coordinator will record a task list in Etcd, after the generated segments successfully build index, root coordinator marks the task as "completed"

1.  SDK Interfaces

The python API declaration:

...

  • Return error "Collection doesn't exist" if the target collection doesn't exist
  • Return error "Partition doesn't exist" if the target partition doesn't exist
  • Return error "Bucket doesn't exist" if the target bucket doesn't exist
  • Return error "File list is empty" if the files list is empty

The get_import_state():

  • Return error "File xxx doesn't exist" if could not open the file. 
  • All fields must be presented, otherwise, return the error "The field xxx is not provided"
  • For row-based json files, return "not a valid row-based json format, the key rows not found" if could not find the "rows" node
  • For column-based files, if a vector field is duplicated in numpy file and json file, return the error "The field xxx is duplicated

The get_import_state():

  • Return error "File xxx doesn't exist" if could not open the file. json parse error: xxxxx" if encounter illegal json format
  • The row count of each field must be equal, otherwise, return the error "Inconsistent row count between field xxx and xxx". (all segments generated by this file will be abandoned)
  • If a vector dimension doesn't equal to field schema, return the error "Incorrect vector dimension for field xxx". (all segments generated by this file will be abandoned)
  • If a data file size exceed 1GB, return error "Data file size must be less than 1GB"
  • If an import task is no response for more than 6 hours, it will be marked as failed
  • If datanode is crashed or restarted, the import task on it will be marked as failed

...