Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Current state: Under Discussion

...

Keywords: datacoord, segment, compaction

Released:

Summary

Milvus needs a compaction mechanism to merge small segments and remove deleted rows to save disk space.

Motivation

There are many ways to generate small segments:

  1. DataCoord will auto flush a segment when it is opened for a long time(eg. 24hours)
  2. Users may call flush manually

And deleted rows should be removed after they are not used anymore.

So we have 2 targets:

  1. Merge small segments to improve query efficiency
  2. Remove deleted rows to save disk space

...

  1. The time period of time travel may be very long, such as dozens of days, so it is still necessary to merge small segments within the scope of time travel.
  2. When to trigger a compaction:
    1. After a segment flush, if the total number of segments less than 1/2*max_segment_size at channel&partition level exceeds the compaction_ segment_ num_ threshold.
    2. The time interval from the last compaction is greater than max_ compaction_ interval
    3. call compaction manually
  3. How to choose segments: greedy algorithmalgorithm‘


Some details:

  1. Only merge flushed segments
  2. We choose the max dml position of merged segments as the dml position of the new generated segment.

Test Plan