Current state: Under Discussion
...
Keywords: etcd, datacoord, datanode
Released:
Summary
DataCoord register channels on etcd and DataNode watch etcd to do watch/release operations.
Motivation
There are several problems when DataCoord sends the WatchDmChannel to the DataNode through grpc:
...
Etcd key:channel / [nodeID] / [channelName],value: ChannelInfo
ChanelInfo contains State,StartTime, SeekPosition. State is a enum whose values are Unwatched, Watched. This means whether datanode watch it successfully.
StartTime is the watch event start time.
If there is a new channel registration, datacoord updates channel / [nodeid] / [channelname]
...
- When the datacoord is started, the channels of offline datanodes are assigned to current online nodes.
- When DataNode comes online, DataCoord may move some channels to the node and change the channels of different nodes through etcd transactions operation.
- When DataNode goes offline, DataCoord reassigns the channels to other nodes, changing them through the etcd transaction.
- Specially, if the last DataNode goes offline and there is no living DataNode at this time, record the channel in channel/remaining/[channelName].
- Start a background goroutine to check states of channels. If a channel's state has't changed to Watched for a long time, maybe we should reallocated it to another node atomically.
DataNode:
- When DataNode starts, the channel of this Node on etcd must be empty, because the nodeID is incremented.
- When DataNode receives an Add event, execute WatchChannel, and transactionlly change state of channel on etcd to Watched.
- When DataNode receives Delete event, execute ReleaseChannel.
...