Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Current state:

ISSUE:

PRs:

Keywords: delete

Released:

Summary

This document describes how the delete be implemented in Milvus. Milvus provides a new delete API that users can use to delete entities from a collection.

Motivation

In some scenarios, users want to delete some entities from a collection to no longer be searched out. Currently, users can only manually filter out unwanted results in search results. We hope to implement a new function that allows users to delete entities from a collection.

Public Interfaces

def delete(self, condition=None)->MutationResult:
    """
    Delete entities by primary keys.
    Example: client.delete("_id in [1,10,100]")
    
    :param condition: an expression indicates whether an entity should be deleted
    :type  condition: str
    """

Design Details

Since Milvus's storage is an append-only design, the delete function is implemented through soft delete, setting a flag on the existing data to indicate that the data has been deleted.

This solution needs the algorithm library to support search with a bitset and the deletion offset recorded in Milvus. Now the algorithm library Knowhere is supported to search with a bitset indicated whether an entity is deleted. So we discuss how to store the deleted primary keys here.

Proposal delete operation persistent

  1. DataNode subscribe the insert channel
  2. Proxy receives a delete request, split into insert channels by primary keys
  3. DataNode receives a delete request from the insert channel, save it in buffer, and write it into the delta channel
  4. ...
  5. DataNode receives a flush request, write out the deletions saved above
  6. DataNode notifies IndexNode to building indexes
  7. finish

Proposal delete operation serving search(sealed+growing)

  1. QueryNode subscribe the insert channel
  2. QueryNode load the checkpoint and recovery by the checkpoint
  3. Proxy receives a delete request, split into insert channels by primary keys
  4. QueryNode retrieves a delete request from the insert channel, judges the segment to which each deletion belongs, and updates the Inverted Delta Logs(IDL)
  5. ...
  6. QueryNode retrieves a search request, search on each segment
  7. finish

Proposal delete operation serving search(sealed only)

  1. QueryNode subscribe the delta channel
  2. QueryNode load the checkpoint and recovery by the checkpoint
  3. Proxy receives a delete request, split into insert channels
  4. DataNode filter out all delete requests, and write them into the delta channel
  5. QueryNode retrieves delete requests from the delta channel, judges the segment to which each deletion belongs, and update the Inverted Delta Logs(IDL)
  6. ...
  7. QueryNode retrieve a search request, search on each segment
  8. finish

Process delete operation in system recovery

Unaffected

SegmentFilter

...

DeltaLog

...

Inverted delta logs

...

Bitset

...

Test Plan

Testcase1

Search a deleted entity, except not in the resultset

client.insert()
client.search()
client.delete()
client.search()

Rejected Alternatives


References


  • No labels