Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

Current state: Accepted

ISSUE: #6299

PRs: #6570 #6598 #6671

Keywords: Query / Search / Vector

Released: Milvus 2.0rc3


Summary

This project is to use minimal memory, let query support to return vector field in output.

Motivation

In Milvus 2.0rc1, query does not support return vector field in output. If query request's output fields contain float vector or binary vector, proxy will error out.

This is for the consideration of memory consumption, because vector field with big dimension will occupy hundreds of times of memory comparing with scalar

field. So generally load_collection or load_partition only load scalar fields' raw data into memory. Vector fields' raw data is loaded into memory only in 3 cases:

  1. streaming segment
  2. vector field's index type is FLAT
  3. vector field's index has not been created

Only if vector's raw data has been loaded into memory, query can return vector field in output.

But query need this capability to return vector's raw data, for example tester can use this to check the correctness of inserted data.


Currently search also does not support return vector field in output, but we don't plan to enhance search in this project. If users need to get the vector data after

search returns ID, they can call query to get it.

If there is real requirement from users to let search return vector in output, we can achieve this in SDK level.

Design Details

  • Add new field VectorFieldInfo into segment struct to return vector field related information  
type VectorFieldInfo struct {
    mu              sync.RWMutex
    fieldID         UniqueID
    fieldBinlog     *datapb.FieldBinlog
    rowDataInMemory bool
    rawData         map[string]storage.FieldData  // map[binlogPath]FieldData
}

type Segment struct {
    ... ...
    vectorFieldInfos map[UniqueID]*VectorFieldInfo
}


  • Add new interface in segment_loader
// load vector field's data from info.fieldBinlog, save the raw data into info.rawData
func (loader *segmentLoader) loadSegmentVectorFieldData(info *VectorFieldInfo) error {


  • Add new interface in query_collection
// For vector output fields, load raw data from fieldBinlog if needed,
// get vector raw data via result.Offset from *VectorfieldInfo, then
// fill vector raw data into result
func (q *queryCollection) fillVectorFieldsData(segment *Segment, result *segcorepb.RetrieveResults) error


We also enhanced query to support wildcard in output fields.

  • "*" - means all scalar fields
  • "%" - means all vector fields

For example, A/B are scalar fields, C/D are vector fields, duplicated fields are automatically removed.

  • output_fields=["*"] ==> [A,B]
  • output_fields=["%"] ==> [C,D]
  • output_fields=["*","%"] ==> [A,B,C,D]
  • output_fields=["*",A] ==> [A,B]
  • output_fields=["*",C] ==> [A,B,C]


Original vector data storage public interface and struct

Public Interfaces

```go
type FileManager interface {
GetFile(path string) (string, error)
PutFile(path string, content []byte) error
Exist(path string) bool
ReadFile(path string) []byte
}
```

A VectorFileManager implements FileManager interface.

```go
type VectorFileManager struct {
localFileManager FileManager
remoteFileManager FileManager
insertCodec *InsertCodec
}
```

localFileManager is responsible to local file manager. And can be implements with golang os library.
remoteFileManager is responsible for cloud storage or remote server storage, and will be implemented with minio client now.

When the offset of vector is obtained, we can get origin vector data from local vector data file.



Test Plan

Do query / search (with vector field in output fields) in all kinds of combinations of following scenarios, check the correctness of result.

  1. float vector or binary vector
  2. with/wo index
  3. all kinds of index type



  • No labels