Current state: Accepted
...
To reduce network transmission and skip Plusar management, the new interface will allow users to input the path of some data files(json, numpy, etc.) on MinIO/S3 storage, and let the data nodes directly read these files and parse them into segments. The internal logic of the process becomes:
1. client calls import() to pass some file paths to Milvus proxy node
2. proxy node passes the file paths to data coordinator
3. data coordinator pick a data node or multiple data node (according to the files count) to parse files, each file can be parsed to a segment or multiple segments.
Some points to consider:
- JSON format is flexible, ideally, the import API ought to parse user's JSON files without asking user to reformat the files according to a strict rule.
- Users can store scalar fields and vector fields in a JSON file, with row-based or column-based. The import() API can support both of them.
...
Code Block |
---|
{ "data_source": { // required "type": "minioMinio", // required, currently only support "minio"/"s3" // required "address": "localhost:9000", // optional, milvus server will use its minio setting if without this value "accesskey_id": "minioadmin", // optional, milvus server will use its minio setting if without this value "accesskey_secret": "minioadmin", // optional, milvus server will use its minio setting if without this value "use_ssl": false, // optional, milvus server will use its minio setting if without this value "bucket_name": "mybucketaaa" // // optional, milvus server will use its minio setting if without this value }, "internal_data": { // optional, external_data or internal_data. (external files include json, npy, etc. internal files are exported by milvus) "path": "xxx/xxx/xx", // relative path to the source storage where store the exported data "collections_mapping": { // optional, give a new name to collection during importing. "aaa": "bbb", // field name mapping, key is the source field name, value is the target field name "ccc": "ddd" } }, "external_data": { // optional, external_data or internal_data. // optional, external_data or internal_data. (external files include json, npy, etc. internal files are exported by milvus) "target_collection": "xxx", // target collection name "fileschunks": [ // required { { "file": xxxx/xx.json, // required, relative path under the storage source defined by DataSource, currently support json/npy "type": "row_based", // required, row_based or column_based chunk list, each chunk can be import as one segment or split into multiple segments "fields_mappingfiles": [{ // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty. // required, files that provide data of a chunk "table.rows.idpath": "uidxxxx / xx.json", "table.rows.year": "year", "table.rows.vector": "vector", } }, { "file": xxxx/xx.json, / // required, relative path under the storage source defined by DataSource, currently support json/npy "type": "columnrow_based", // required, row_based or column_based, tell milvus how to parse this json file "fields_mapping": { // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty. "table.columnsrows.id": "uid", "table.columns.year": "year", "table.columns.vector": "vector", } }, { "file": xxxx/xx.npy, // required, relative path under the storage source defined by DataSource, currently support json/npy "type": "column_based", // required, row_based or column_based "fields_mapping": { // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty. // field name mapping, tell milvus how to insert data to correct field, key is a json node path, value is a field name of the collection "table.rows.year": "year", "table.rows.vector": "vector" } }] }, { "files": [{ "path": "xxxx / xx.json", "type": "column_based", "fields_mapping": { "table.columns.id": "uid", "table.columns.year": "year", "table.columns.vector": "vector" } }] }, { "files": [{ "file": "xxxx / xx.npy", "type": "column_based", "fields_mapping": { "vector": "vector", } }] } ], "default_fields": { // optional, use default value to fill some fields "age": 0, "weight": 0.0 }, } } |
Key fields of the JSON object:
...
Code Block |
---|
rpc Import(ImportRequest) returns (MutationResult) {} message ImportRequest { common.MsgBase base = 1; string options = 2; } message MutationResult { common.Status status = 1; schema.IDs IDs = 2; // requiredreturn auto-id for insert/import, deleted id for delete repeated uint32 succ_index = 3; // errorsucceed indexes indicatefor insert repeated uint32 err_index = 4; // error indexes for indicateinsert bool acknowledged = 5; int64 insert_cnt = 6; int64 delete_cnt = 7; int64 upsert_cnt = 8; uint64 timestamp = 9; } |
...