...
Released:
Summary(required)
At present, the behavior of By now, doing "search" in Milvus is to will return TOPK most similar results for each vector be being searched.
This MEP is about how to realize a functionality named "range search" functionality. User specifies a range scope -- including radius low bound and radius high bound, Milvus will do does "range search", and return also returns TOPK sorted results with distance falling in this range scope.
- both radius low bound and radius high bound MUST be set
- range search will return TOPK results with
- for L2: low_bound <= distance < high_bound
- for IP: low_bound < distance <= high_bound
...
- get results with distance falling in a range scope
- get arbitrary TOPK search results more than 16384
...
We reuse the SDK interface Search() to do "range search". Add Only add 2 more parameters "radius_low_bound" and "radius_high_bound" into params.
...
Code Block | ||
---|---|---|
| ||
default_index = {"index_type": "HNSW", "params":{"M": 48, "efConstruction": 500}, "metric_type": "L2"} collection.create_index("float_vector", default_index) collection.load() search_params = {"metric_type": "L2", "params": {"ef": 32, "radius_low_bound": 1.0, "radius_high_bound": 2.0}} res = collection.search(vectors[:nq], "float_vector", search_params, limit, "int64 >= 0") |
- Add 3 new interfaces in knowhere
Code Block | ||
---|---|---|
| ||
// range search parameter legacy check virtual bool CheckRangeSearch(Config& cfg, const IndexType type, const IndexMode mode); // range search virtual DatasetPtr QueryByRange(const DatasetPtr& dataset, const Config& config, const faiss::BitsetView bitset); // brute force range search static DatasetPtr BruteForce::RangeSearch(const DatasetPtr base_dataset, const DatasetPtr query_dataset, const Config& config, const faiss::BitsetView bitset); |
Design Details(required)
In Knowhere
Range search related new APIs CheckRangeSearch(), QueryByRange() and BruteForce::RangeSearch() are available since knowhere-v1.3.0.
Range search completely reuses the call stack from SDK to segcore.
...
- seg_offsets_: with length "nq * topk", -1 is filled in when no enough result data
- distances_: with lengh "nq * topk", data is undefined when no enough result data
...
Compatibility, Deprecation, and Migration Plan(optional)
...
- Add new unittest
- Add benchmark using range search dataset
There is no public data set for range search. I have created range search data set based on sift1M and glove200.
You can find them in NAS:
test/milvus/ann_hdf5/sift-128-euclidean-range.hdf5
test/milvus/ann_hdf5/glove-200-angular-range.hdf5
test/milvus/ann_hdf5/binary/sift-4096-hamming-range.hdf5
For Milvus
- add unittest in segcore
- use sift1M/glove200 dataset to test range search, we expect to get identical result as search
...