...
Code Block |
---|
{ "table": { "columns": [ "id": [1, 2, 3], "year": [2021, 2022, 2023], "vector": [ [1.0, 1.1, 1.2], [2.0, 2.1, 2.2], [3.0, 3.1, 3.2] ] ] } } |
- Numpy file is a binary format, we only treat it as vector data. Each numpy file represents a vector field.
- Transferring a large file from client to server proxy to datanode is time-consume work and occupies too much network bandwidth, we will ask user to upload data files to MinIO/S3 where the datanode can access directly. Let the datanode read and parse files from MinIO/S3.
- The parameter of import API is easy to expand in future
SDK Interfaces
Based on the several points, we choose a JSON object as a parameter of python import() API, the API declaration will be like this:
def import(options)
The "options" is a JSON object which has the following format:
Code Block |
---|
{
"data_source": { // required
"type": "Minio", // required
"address": "localhost:9000", // optional, milvus server will use its minio setting if without this value
"accesskey_id": "minioadmin", // optional, milvus server will use its minio setting if without this value
"accesskey_secret": "minioadmin", // optional, milvus server will use its minio setting if without this value
"use_ssl": false, // optional, milvus server will use its minio setting if without this value
"bucket_name": "aaa" // optional, milvus server will use its minio setting if without this value
},
"internal_data": { // optional, external_data or internal_data. (external files include json, npy, etc. internal files are exported by milvus)
"path": "xxx/xxx/xx", // relative path to the source storage where store the exported data
"collections_mapping": { // optional, give a new name to collection during importing.
"aaa": "bbb",
"ccc": "ddd"
}
},
"external_data": { // optional, external_data or internal_data. (external files include json, npy, etc. internal files are exported by milvus)
"target_collection": "xxx", // target collection name
"files": [ // required
{
"file": xxxx / xx.json, // required, relative path under the storage source defined by DataSource, currently support json/npy
"type": "row_based", // required, row_based or column_based
"fields_mapping": { // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty.
"table.rows.id": "uid",
"table.rows.year": "year",
"table.rows.vector": "vector",
}
},
{
"file": xxxx / xx.json, // required, relative path under the storage source defined by DataSource, currently support json/npy
"type": "column_based", // required, row_based or column_based
"fields_mapping": { // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty.
"table.columns.id": "uid",
"table.columns.year": "year",
"table.columns.vector": "vector",
}
},
{
"file": xxxx / xx.npy, // required, relative path under the storage source defined by DataSource, currently support json/npy
"type": "column_based", // required, row_based or column_based
"fields_mapping": { // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty.
"vector": "vector",
}
}
],
"default_fields": { // optional, use default value to fill some fields
"age": 0,
"weight": 0.0
},
}
} |
RPC Interfaces
Internal machinery
...