Current state: Under Discussion
...
Keywords: datacoord, segment, compaction
Released:
Summary
Milvus needs a compaction mechanism to merge small segments and remove deleted rows to save disk space.
Motivation
There are many ways to generate small segments:
- DataCoord will auto flush a segment when it is opened for a long time(eg. 24hours)
- Users may call flush manually
And deleted rows should be removed after they are not used anymore.
So we have 2 targets:
- Merge small segments to improve query efficiency
- Remove deleted rows to save disk space
...
- The time period of time travel may be very long, such as dozens of days, so it is still necessary to merge small segments within the scope of time travel.
- When to trigger a compaction:
- After a segment flush, if the total number of segments less than 1/2*max_segment_size at channel&partition level exceeds the compaction_ segment_ num_ threshold.
- The time interval from the last compaction is greater than max_ compaction_ interval
- call compaction manually
- How to choose segments: greedy algorithmalgorithm‘
Some details:
- Only merge flushed segments
- We choose the max dml position of merged segments as the dml position of the new generated segment.