Current state: ["Under Discussion"]
...
Released:
Summary(required)
By now, doing the behavior of "search" in Milvus will return is returning TOPK most similar sorted results for each queried vector being searched.
This MEP is about to realize a functionality named "range search". User specifies a range scope -- including radius low bound and radius high bound, Milvus does "range search", and also returns TOPK sorted results with distance falling in this range scope.
- both radius low bound and radius high bound MUST be setrange search will return TOPK results with
for IP: - falling in range scope means
Metric type | Behavior |
---|---|
IP | radius_low_bound < distance <= radius_high_bound |
...
L2 |
...
and other metric types |
...
radius_low_bound <= distance < radius_high_bound |
Motivation(required)
Many users request a the "range search" functionality. With this capability, user can
...
If call range search API with unsupported index types or unsupported metric types, an exception will be thrown out.
In Knowhere, only one parameter "radius" is two new parameters "radius_low_bound" and "radius_high_bound" are passed in via config, and range search will return all un-sorted results with distance "better" than this radius.
...
falling in this range scope.
Metric type | Behavior |
---|---|
IP | radius_low_bound < distance <= radius_high_bound |
L2 or other metric types |
...
radius_low_bound <= distance < radius_high_bound |
We add 3 new APIs CheckRangeSearch(), QueryByRange() and BruteForce::RangeSearch() to support range search, these APIs are already available since knowhere-v1.3.0.
- CheckRangeSearch()
This API is used to do range search parameter legacy check. It will throw exception when parameter legacy check fail.
...
2. QueryByRange()
This API does range search for index, it returns all unsorted results with distance "better than radius" (for IP: > radius; for others: < radius)falling in the specified range scope.
PROTO | virtual DatasetPtr |
INPUT | Dataset { Config { knowhere::meta::RADIUS_LOW_BOUND: - // radius for range search knowhere::meta::RADIUS_HIGH_BOUND: - } |
OUTPUT | Dataset { |
...
The range search result for each query vector is: IDS[lims[n], lims[n+1]) and DISTANCE[lims[n], lims[n+1])
The memory used for IDS, DISTANCE and LIMS are allocated in Knowhere, they will be auto-freed when Dataset deconstruction.
...
This API does range search for no-index dataset, it returns all unsorted results with distance "better than radius" (for IP: > radius; for others: < radius).
PROTO | static DatasetPtr | ||
INPUT | Dataset { Dataset { Config { knowhere::meta::RADIUS_LOW_BOUND: - // radius for range search } | OUTPUT | Dataset { knowhere::meta::IDS: RADIUS_HIGH_BOUND: - } |
OUTPUT | Dataset { |
...
Segcore search flow will be updated as this flow chart, range search related change is marked RED.
In Segcore , it uses radius parameter's existence to decide whether to run range search, or to run range search. When
- when both "radius_low_bound" and "radius_high_bound" are set, range search is called;
- when only one of "radius_low_bound" and "radius_high_bound" is set, exception is thrown out;
- otherwise, search is called.
For API query::SearchOnSealedIndex() and BruteForceSearch(), they do like following:
- pass in correct radius parameter to knowhere (For IP: parameters (radius_low_bound ; for others: / radius_high_bound) to knowhere
- get all unsorted range search result from knowhere
- for each NQ's results, do sort, then filter using another radius parameterre-generate heap-sort
- return result Dataset with TOPK results
...