Current state: Accepted
...
To reduce network transmission and skip Plusar management, the new interface will allow users to input the path of some data files(json, numpy, etc.) on MinIO/S3 storage, and let the data nodes directly read these files and parse them into segments. The internal logic of the process becomes:
1. client calls import() to pass some file paths to Milvus proxy node
2. proxy node passes the file paths to data coordinator
3. data coordinator pick a data node or multiple data node (according to the files count) to parse files, each file can be parsed to a segment or multiple segments.
Some points to consider:
- JSON format is flexible, ideally, the import API ought to parse user's JSON files without asking user to reformat the files according to a strict rule.
- Users can store scalar fields and vector fields in a JSON file, with row-based or column-based. The import() API can support both of them.
...
Code Block |
---|
{ "data_source": { // required "type": "Miniominio", // required, "minio" or "s3", case insensitive "address": "localhost:9000", // optional, milvus server will use its minio/s3 settingconfiguration if without this value "accesskey_id": "minioadmin", // optional, milvus server will use its minio/s3 settingconfiguration if without this value "accesskey_secret": "minioadmin", // optional, milvus server will use its minio/s3 settingconfiguration if without this value "use_ssl": false, // optional, milvus server will use its minio/s3 settingconfiguration if without this value "bucket_name": "aaa" // optional, milvus server will use its minio/s3 settingconfiguration if without this value }, "internal_data": { // optional, external_data or internal_data. (external files include json, npy, etc. internal files are exported by milvus) "path": "xxx/xxx/xx", // required, relative path to the source storage where store the exported data "collections_mapping": { // optional, give a new name to collection during importing "aaacoll_a": "bbbcoll_b", // fieldcollection name mapping, key is the source fieldcollection name, value is thea targetnew fieldcollection name "ccccoll_c": "dddcoll_d" } }, "external_data": { // optional, external_data or internal_data. (external files include json, npy, etc. internal files are exported by milvus) "target_collection": "xxx", // required, target collection name "chunks": [{ // required, chunk list, each chunk can be import as one segment or split into multiple segments "files": [{ // required, files that provide data of a chunk "path": "xxxx / xx.json", // required, relative path under the storage source defined by DataSource, currently support json/npy "type": "row_based", // required, row_based or column_based, tell milvus how to parse this json file "fields_mapping": { , case insensitive // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty "table.rows.id": "uid", // field name mapping, tell milvus how to insert data to correct field, key is a json node path, value is a field name of the collection "table.rows.year": "year", "table.rows.vector": "vector" } }] }, { "files": [{ "path": "xxxx / xx.json", "type": "column_based", "fields_mapping": { "table.columns.id": "uid", "table.columns.year": "year", "table.columns.vector": "vector" } }] }, { "files": [{ "file": "xxxx / xx.npy", "type": "column_based", "fields_mapping": {"from": 0, // optional, import part of the file from a number "to": 1000, // optional, import part of the file end by a number "fields_mapping": { // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty "table.rows.id": "uid", // field name mapping, tell milvus how to insert data to correct field, key is a json node path, value is a field name of the collection. If the file is numpy format, the key is a field name of the collection same with value. "table.rows.year": "year", "table.rows.vector": "vector" } }] } ], "default_fields": { // optional, use default value to fill some fields "age": 0, // key is a field name, value is default value of this field, can be number or string "weight": 0.0 } } } |
Key fields of the JSON object:
...