Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Current state: ["Under Discussion"]

...

  • both radius low bound and radius high bound MUST be set
  • range search will return TOPK results with
    • for L2IP: low_bound < = distance <= high_bound
    • for IPL2 or other metric types: low_bound <= distance < = high_bound

Motivation(required)

Many users request Milvus to realize a "range search" functionality. With this capability, user can

...

If call range search API with unsupported index types or unsupported metric types, knowhere an exception will throw be thrown out exception.

In Knowhere, only one parameter "radius" is passed in via config, and range search will return all un-sorted results with distance "better" than this radius.

  • For IP: > radius
  • For L2 or other metric types: < radius

...

metric typerangesimilarnot similar
L2([0, inf)small0largeinf
IP[-1, 1]1-1
jaccard[0, 1]01
tanimoto[0, 0.5]inf)00.5inf
hamming[0, n]0n

2. QueryByRange()

This API does range search for index, it returns all unsorted results with distance "better than radius" (for IP: > radius; for others: < radius).

PROTO
virtual DatasetPtr
QueryByRange(const DatasetPtr& dataset, const Config& config, const faiss::BitsetView bitset)

INPUT

Dataset {
    knowhere::meta::TENSOR: -   // query data
    knowhere::meta::ROWS: -      // rows of queries
    knowhere::meta::DIM: -          // dimension
}

Config {

    knowhere::meta::RADIUS: -   // radius for range search

}

OUTPUT

Dataset {
    knowhere::meta::IDS: -                // result IDs with length LIMS[nq]
    knowhere::meta::DISTANCE: -  // result DISTANCES with length LIMS[nq]
    knowhere::meta::LIMS: -            // result offset prefix sum with length nq + 1
}

...

This API does range search for no-index dataset, it returns all unsorted results with distance "better than radius" (for IP: > radius; for others: < radius).

PROTO
static DatasetPtr
RangeSearch(const DatasetPtr base_dataset,
const DatasetPtr query_dataset,
const Config& config,
const faiss::BitsetView bitset);

INPUT

Dataset {
    knowhere::meta::TENSOR: -   // base data
    knowhere::meta::ROWS: -      // rows of base data
    knowhere::meta::DIM: -          // dimension
}

Dataset {
    knowhere::meta::TENSOR: -   // query data
    knowhere::meta::ROWS: -      // rows of queries
    knowhere::meta::DIM: -          // dimension
}

Config {

    knowhere::meta::RADIUS: -   // radius for range search

}

OUTPUT

Dataset {
    knowhere::meta::IDS: -                // result IDs with length LIMS[nq]
    knowhere::meta::DISTANCE: -  // result DISTANCES with length LIMS[nq]
    knowhere::meta::LIMS: -            // result offset prefix sum with length nq + 1
}

...

In Segcore, it uses radius parameters parameter's existence to decide to run range search or search. When "radius_low_bound" and "radius_high_bound" are set, range search is called; otherwise, search is called. 

...

How to get more than 16384 range search results ?

With above this solution, user can get maximum 16384 range search results in one call.

...

The result of each iteration will have some duplication with the result of previous iteration, use user need do duplication check and remove them.

...