...
ISSUE: #6299
Keywords: Query / Search / Vector
...
Original vector data storage public interface and struct
Public Interfaces```go
type FileManager interface {
GetFile(path string) . It may be discussed and changed in future.
Code Block |
---|
type ChunkManager interface { GetPath(key string) (string, error) |
...
Write(key string, content []byte) error |
...
Exist( |
...
key string) bool |
...
Read(key string) ([]byte, error)
ReadAt(key string, p []byte, off int64) (n int, err error)
} |
A VectorFileManager implements FileManager interface and add a method to download vector file from remote and deserialize its content, finally save pure vector to local storage.```go
Code Block |
---|
type |
...
VectorChunkManager struct |
...
{
localChunkManager ChunkManager
remoteChunkManager ChunkManager
}
func NewVectorChunkManager(localChunkManager ChunkManager, remoteChunkManager ChunkManager) *VectorChunkManager |
localChunkManager is responsible to local file manager. And can be implements with golang os library.
remoteFileManager The path of local chunk manager is config in milvus.yaml with key storage.path.
remoteChunkManager is responsible for cloud storage or remote server storage, and will be implemented with minio client now.
When the offset of vector is obtained, we can get origin vector data from local vector data file.
Get the vector the ID through the following process:
1.Get segment's id size in each binlog and vector file names when load_segment. The binlogs file will be sorted by file name's last id to guarantee the order is increasing. Suppose we get sizes are 300, 300, 400, 500.
2.Get the id offset in segment in C layer. Suppose we get an offset 700.
3.We can know the vector we want to get is in 3rd vector files. for 300+300 <700<300+300+400
4.Get the 3rd file in to memory and deserialize out pure vector. Save the vector to local storage. Release the memory usage.
5.Mmap the file to memory, and get the data of offset 100. The data length differs data type and dim.
Test Plan
Do query / search (with vector field in output fields) in all kinds of combinations of following scenarios, check the correctness of result.
...