Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Keywords: Query / Search / Vector

Released: Milvus 2.0rc3


Summary

Using minimal memory consumption, let `search` or `query` operation support to return vector raw data in output fields.

Motivation

In Milvus 2.0rc1 中,`search` 和 `query` 操作还不支持把向量列作为查询结果的一部分输出。这是基于节约内存的考虑,向量列相比其它标量列数据太大,会占用太多的内存,所以在 `load_collection` 或 `load_partition` 时,只有标量列数据文件和索引文件被加载到内存(仅当向量索引不存在时才会加载原始向量文件)。## Design Details(required), operations like search or query do not support return vector raw data in output fields. This is from the consideration of memory consumption,

vector field with big dimension will occupy hundreds of times of memory comparing with scalar field. So in general load_collection or load_partition only load

scalar fields' raw data into memory. Vector fields' raw data is loaded into memory only in 3 cases:

  1. steaming segment
  2. vector field's index type is FLAT
  3. vector field's index has not been created

Design Details

在 search / query 结束后,再分析 output_fields 里是否包含向量列,若包含,则加载结果 IDs 所在 segment 向量列,通过结果 IDs 对应的 offset 得到对应向量数据。

1. 添加数据结构 `VectorFieldInfo` 用于记录 `segment` 中向量数据相关信息

```go
type VectorFieldInfo struct {
mu sync.RWMutex
fieldBinlog *datapb.FieldBinlog
rowNum map[string]int64 // map[binlogPath]int64
rawDataMmap map[string][]byte // map[binlogPath][]byte
}

type Segment struct {
... ...
vectorFieldInfos   Add new field VectorFieldInfo into segment struct to record vector field related information

Code Block
type VectorFieldInfo struct {
    mu              sync.RWMutex
    fieldBinlog     *datapb.FieldBinlog
    rowDataInMemory bool
    rawData         map[string]storage.FieldData  // map[binlogPath]FieldData
}

type Segment struct {
    ... ...
    vectorFieldInfos map[UniqueID]*VectorFieldInfo

...


}

...


2. 在 `segment` 中添加新接口

```go
// fill vector raw data into RetrieveResults
func (s *Segment) fillRetrieveResults(plan *RetrievePlan, result *segcorepb.RetrieveResults) error

...

When the offset of vector is obtained, we can get origin vector data from local vector data file.##



Test Plan

...

Check `get_entity_by_id` can get correct vector raw data in following 2 scenarios:

* scenario (1)
* create_collection
* insert
* get_entity_by_id

* scenario (2)
* create_collection
* insert
* create_index
* get_entity_by_id

## Rejected Alternatives(optional)

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

## References(optional)

...

Do query / search (with vector field in output fields) in all kinds of combinations of following scenarios, check the correctness of result.

  1. float vector or binary vector
  2. with/wo index
  3. all kinds of index type