Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Current state: ["Under Discussion"]

...

  • both radius low bound and radius high bound MUST be set
  • falling in range scope means
Metric typeBehavior
IPradius_low_bound < distance <= radius_high_bound
L2 and other metric typesradius_low_bound <= distance < radius_high_bound

Motivation(required)

Many users request the "range search" functionality. With this capability, user can

...

In Knowhere, two new parameters "radius_low_bound" and "radius_high_bound" are passed in via config, and range search will return all un-sorted results with distance falling in this range scope.

Metric typeBehavior
IPradius_low_bound < distance <= radius_high_bound
L2 or other metric typesradius_low_bound <= distance < radius_high_bound

We add 3 new APIs CheckRangeSearch(), QueryByRange() and BruteForce::RangeSearch() to support range search.

...

This API returns all unsorted results with distance falling in the specified range scope.

PROTO
virtual DatasetPtr
QueryByRange(const DatasetPtr& dataset, const Config& config, const faiss::BitsetView bitset)

INPUT

Dataset {
    knowhere::meta::TENSOR: -   // query data
    knowhere::meta::ROWS: -      // rows of queries
    knowhere::meta::DIM: -          // dimension
}

Config {

    knowhere::meta::RADIUS_LOW_BOUND: -   

    knowhere::meta::RADIUS_HIGH_BOUND: -   

}

OUTPUT

Dataset {
    knowhere::meta::IDS: -                // result IDs with length LIMS[nq]
    knowhere::meta::DISTANCE: -  // result DISTANCES with length LIMS[nq]
    knowhere::meta::LIMS: -            // result offset prefix sum with length nq + 1
}

...

This API does range search for no-index dataset, it returns all unsorted results with distance "better than radius" (for IP: > radius; for others: < radius).

PROTO
static DatasetPtr
RangeSearch(const DatasetPtr base_dataset,
const DatasetPtr query_dataset,
const Config& config,
const faiss::BitsetView bitset);

INPUT

Dataset {
    knowhere::meta::TENSOR: -   // base data
    knowhere::meta::ROWS: -      // rows of base data
    knowhere::meta::DIM: -          // dimension
}

Dataset {
    knowhere::meta::TENSOR: -   // query data
    knowhere::meta::ROWS: -      // rows of queries
    knowhere::meta::DIM: -          // dimension
}

Config {

    knowhere::meta::RADIUS_LOW_BOUND: -   

    knowhere::meta::RADIUS_HIGH_BOUND: -   

}

OUTPUT

Dataset {
    knowhere::meta::IDS: -                // result IDs with length LIMS[nq]
    knowhere::meta::DISTANCE: -  // result DISTANCES with length LIMS[nq]
    knowhere::meta::LIMS: -            // result offset prefix sum with length nq + 1
}

...

  1. test/milvus/ann_hdf5/binary/sift-4096-hamming-range.hdf5
    1. base dataset and query dataset are identical with sift1m
    2. unified radius radius_low_bound = 0.0, radius_high_bound = 291.0
    3. result length for each nq is different
    4. total result num 1,063,078
  2. test/milvus/ann_hdf5/sift-128-euclidean-range.hdf5
    1. base dataset and query dataset are identical with sift1m
    2. unified radius radius_low_bound = 0.0, radius_high_bound = 186.0
    3. result length for each nq is different
    4. total result num 1,054,377
  3. test/milvus/ann_hdf5/sift-128-euclidean-range-multi.hdf5
    1. base dataset and query dataset are identical with sift1m
    2. ground truth IDs and Distances are identical with sift1m
    3. each nq's radius_low_bound is 0.0, radius_high_bound is set to the last ground truth distance
    4. result length for each nq is 100
    5. total result num 1,000,000
  4. test/milvus/ann_hdf5/glove-200-angular-range.hdf5
    1. base dataset and query dataset are identical with glove200
    2. unified radius_low_bound = 0.52, radius_high_bound = 1.01
    3. result length for each nq is different
    4. total result num 1,060,888

...