...
Typically, it cost several hours to insert one billion entities with 128-dimensional vectors. We need a new interface to do bulk load for the following purposes:
- import data from json JSON format files. (first stage)
- import data from numpy Numpy format files. (first stage)
- copy a collection within on Milvus 2.0 service. (second stage)
- copy a collection from one Milvus 2.0 server 0 service to another. (second stage)
- import data from Milvus 1.x to Milvus 2.0 (third stage)
- parquet/faiss files (TBD)
...
- Numpy file is a binary format, we only treat it as vector data. Each numpy Numpy file represents a vector field.
- Transferring a large file from client to server proxy to datanode is time-consume work and occupies too much network bandwidth, we will ask user users to upload data files to MinIO/S3 where the datanode can access directly. Let the datanode read and parse files from MinIO/S3.
- Users may store scalar fields and vector fields in different format files. For example, store scalar fields in JSON files and store vector fields in Numpy files.
- The parameter of import API is easy to expand in future
...
Code Block |
---|
{ "data_source": { // required "type": "Miniominio", // required, currently only support "minio"/"s3" "address": "localhost:9000", // optional, milvus server will use its minio setting if without this value "accesskey_id": "minioadmin", // optional, milvus server will use its minio setting if without this value "accesskey_secret": "minioadmin", // optional, milvus server will use its minio setting if without this value "use_ssl": false, // optional, milvus server will use its minio setting if without this value "bucket_name": "mybucket" // optional, milvus server will use its minio setting if without this value }, "internal_data": { // optional, external_data or internal_data. (external files include json, npy, etc. internal files are exported by milvus) "path": "xxx/xxx/xx", // relative path to the source storage where store the exported data "collections_mapping": { // optional, give a new name to collection during importing. "aaa": "bbb", "ccc": "ddd" } }, "external_data": { // optional, external_data or internal_data. (external files include json, npy, etc. internal files are exported by milvus) "target_collection": "xxx", // target collection name "files": [ // required { "file": xxxx/xx.json, // required, relative path under the storage source defined by DataSource, currently support json/npy "type": "row_based", // required, row_based or column_based "fields_mapping": { // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty. "table.rows.id": "uid", "table.rows.year": "year", "table.rows.vector": "vector", } }, { "file": xxxx/xx.json, // required, relative path under the storage source defined by DataSource, currently support json/npy "type": "column_based", // required, row_based or column_based "fields_mapping": { // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty. "table.columns.id": "uid", "table.columns.year": "year", "table.columns.vector": "vector", } }, { "file": xxxx/xx.npy, // required, relative path under the storage source defined by DataSource, currently support json/npy "type": "column_based", // required, row_based or column_based "fields_mapping": { // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty. "vector": "vector", } } ], "default_fields": { // optional, use default value to fill some fields "age": 0, "weight": 0.0 }, } } |
Key fields of the JSON object:
- "data_source": contains the address and login methods of MinIO/S3. If the address and login methods are not provided, Milvus will use its MinIO/S3 configurations.
- "internal_data": reserved field for collection clone, not available in the first stage. It requires another API export().
- "external_data": for importing data from user's files. Let datanode where to read the data files and how to parse them.
RPC Interfaces
Internal machinery
...