Page Comparison

...

Typically, it cost several hours to insert one billion entities with 128-dimensional vectors. We need a new interface to do bulk load for the following purposes:

import data from json format files. (first stage)
import data from numpy format files. (first stage)
copy a collection from one Milvus 2.0 server to another. (second stage)
import data from json format files
import data from numpy format files
import data from Milvus 1.x to Milvus 2.0

...

Milvus 1.x to Milvus 2.0 (third stage)
parquet/faiss files (TBD)

Design Details

Some points to consider:

JSON format is flexible, ideally, the import API ought to parse user's JSON files without asking user to reformat the files according to a strict rule.
User can store scalar fields and vector fields in a JSON file, with row-based or column-based, the import API ought t support both of them.

A row-based example:

Wiki Markup
{ "table": { "rows": [ {"id": 1, "year": 2021, "vector": [1.0, 1.1, 1.2]}, {"id": 2, "year": 2022, "vector": [2.0, 2.1, 2.2]}, {"id": 3, "year": 2023, "vector": [3.0, 3.1, 3.2]} ] } }

A column-based example:

Wiki Markup
{ "table": { "columns": [ "id": [1, 2, 3], "year": [2021, 2022, 2023], "vector": [ [1.0, 1.1, 1.2], [2.0, 2.1, 2.2], [3.0, 3.1, 3.2] ] ] } }

Numpy file is binary format, we only treat it as vector data. Each numpy file represents a vector field.
Transferring a large file from client to server proxy to datanode is time-consume work and occupies too much network bandwidth, we will ask user to upload data files to MinIO/S3 where the datanode can access directly. Let the datanode read and parse files from MinIO/S3.
The parameter of import API is easy to expand in future

`SDK Interfaces`

`RPC Interfaces`

...

Versions Compared

Old Version 3

New Version 4

Key

Design Details

`SDK Interfaces`

`RPC Interfaces`