Current state:
ISSUE:
PRs:
Keywords: delete
Released:
Summary
This document describes how the delete be implemented in Milvus. Milvus provides a new delete API that users can use to delete entities from a collection.
Motivation
In some scenarios, users want to delete some entities from a collection to no longer be searched out. Currently, users can only manually filter out unwanted results in search results. We hope to implement a new function that allows users to delete entities from a collection.
Public Interfaces
def delete(self, condition=None)->MutationResult: """ Delete entities by primary keys. Example: client.delete("_id in [1,10,100]") :param condition: an expression indicates whether an entity should be deleted :type condition: str """
Design Details
Since Milvus's storage is an append-only design, the delete function is implemented through soft delete, setting a flag on the existing data to indicate that the data has been deleted.
This solution needs the algorithm library to support search with a bitset and the deletion offset recorded in Milvus. Now the algorithm library Knowhere is supported to search with a bitset indicated whether an entity is deleted. So we discuss how to store the deleted primary keys here.
Proposal delete operation persistent
- DataNode subscribe the insert channel
- Proxy receives a delete request, split into insert channels by primary keys
- DataNode receives a delete request from the insert channel, save it in buffer, and write it into the delta channel
- ...
- DataNode receives a flush request, write out the deletions saved above
- DataNode notifies IndexNode to building indexes
- finish
Proposal delete operation serving search(sealed+growing)
- QueryNode subscribe the insert channel
- QueryNode load the checkpoint and recovery by the checkpoint
- Proxy receives a delete request, split into insert channels by primary keys
- QueryNode retrieves a delete request from the insert channel, judges the segment to which each deletion belongs, and updates the Inverted Delta Logs(IDL)
- ...
- QueryNode retrieves a search request, search on each segment
- finish
Proposal delete operation serving search(sealed only)
- QueryNode subscribe the delta channel
- QueryNode load the checkpoint and recovery by the checkpoint
- Proxy receives a delete request, split into insert channels
- DataNode filter out all delete requests, and write them into the delta channel
- QueryNode retrieves delete requests from the delta channel, judges the segment to which each deletion belongs, and update the Inverted Delta Logs(IDL)
- ...
- QueryNode retrieve a search request, search on each segment
- finish
Process delete operation in system recovery
Unaffected
SegmentFilter
...
DeltaLog
...
Inverted delta logs
...
Bitset
...
Test Plan
Testcase1
Search a deleted entity, except not in the resultset
client.insert() client.search() client.delete() client.search()