Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Current state: Under Discussion

ISSUE: #6299

PRs: #6570

Keywords: Query / Search / Vector

Released: Milvus 2.0rc3


Summary

Using minimal memory consumption, let `search` or `query` operation support to return vector raw data in output fields.

Motivation

在 Milvus 2.0rc1 中,`search` 和 `query` 操作还不支持把向量列作为查询结果的一部分输出。这是基于节约内存的考虑,向量列相比其它标量列数据太大,会占用太多的内存,所以在 `load_collection` 或 `load_partition` 时,只有标量列数据文件和索引文件被加载到内存(仅当向量索引不存在时才会加载原始向量文件)。


## Design Details(required)

在 search / query 结束后,再分析 output_fields 里是否包含向量列,若包含,则加载结果 IDs 所在 segment 向量列,通过结果 IDs 对应的 offset 得到对应向量数据。

1. 添加数据结构 `VectorFieldInfo` 用于记录 `segment` 中向量数据相关信息

```go
type VectorFieldInfo struct {
mu sync.RWMutex
fieldBinlog *datapb.FieldBinlog
rowNum map[string]int64 // map[binlogPath]int64
rawDataMmap map[string][]byte // map[binlogPath][]byte
}

type Segment struct {
... ...
vectorFieldInfos map[UniqueID]*VectorFieldInfo
}
```

2. 在 `segment` 中添加新接口

```go
// fill vector raw data into RetrieveResults
func (s *Segment) fillRetrieveResults(plan *RetrievePlan, result *segcorepb.RetrieveResults) error

// 1. load vector field binlog file from minio
// 2. decode binlog file, get vector raw data
// 3. save raw data into local disk
// 4. do mmap
func (s *Segment) segmentVectorFieldDataMmap(fieldID int64, binlog string, rowCount int, data interface{}) ([]byte, error)
```

3. 在 `segmentLoader` 中添加新接口

```go
func (loader *segmentLoader) loadSegmentVectorFieldsData(segment *Segment, binlogs []string) error
```

4. 在 retrieve 函数中添加如下逻辑

* 当输出列包含向量列、向量列未加载、且当前 segment 返回值不为空时

```go
if err = q.historical.loader.loadSegmentVectorFieldsData(segment, binlogs); err != nil {
return err
}
if err = segment.fillRetrieveResults(plan, result); err != nil {
return err
}
```

5. load_segment 接口添加参数 `include_vector_field` or `vector_fields[]`


**`search` 接口不支持返回原始向量数据**
如果想得到 `search` 返回结果所对应的原始向量数据,可通过再次调用 `get_entity_by_id` 得到。


Original vector data storage public interface and struct

Public Interfaces

```go
type FileManager interface {
GetFile(path string) (string, error)
PutFile(path string, content []byte) error
Exist(path string) bool
ReadFile(path string) []byte
}
```

A VectorFileManager implements FileManager interface.

```go
type VectorFileManager struct {
localFileManager FileManager
remoteFileManager FileManager
insertCodec *InsertCodec
}
```

localFileManager is responsible to local file manager. And can be implements with golang os library.
remoteFileManager is responsible for cloud storage or remote server storage, and will be implemented with minio client now.

When the offset of vector is obtained, we can get origin vector data from local vector data file.



## Test Plan(required)

Check `get_entity_by_id` can get correct vector raw data in following 2 scenarios:

* scenario (1)
* create_collection
* insert
* get_entity_by_id

* scenario (2)
* create_collection
* insert
* create_index
* get_entity_by_id

## Rejected Alternatives(optional)

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

## References(optional)

Briefly list all references

  • No labels