Current state: Under Discussion
ISSUE: https://github.com/milvus-io/milvus/issues/4812
PRs: TODO
Keywords: Collection Alias, Collection Hot Reload
Released: TODO
MEP: https://wiki.lfaidata.foundation/display/MIL/MEP+10+--+Support+Collection+Alias
Summary
- As the name indicates, CollectionAlias is an alias to an existing collection.
- The collection alias can be updated to a new collection.
- Within
RootCoordinator
,Proxy
, and all the key components,CollectionName
andCollectionAlias
are equal. e.g.MetaTable.GetCollectionByName(collectionName string, ts typeutil.Timestamp)
can receiveCollectionAlias
and return correspondingCollectionInfo
CollectionAlias
∧CollectionName
= ∅ .CollectionAlias
cannot collide with existingCollectionName
s.
Motivation
In recommendation systems, there is a need to update the whole collection(e.g. User
embedding) periodically(hourly
, daily
). To update the whole collection, we can:
upsert
all the items.- It will be problematic if the collection is huge.
- Insert new collection B data & rename the new collection B as A.
- In a distributed system, it is pretty costly(
Performance
,Availability
,Complexity
) to update the state globally.
- In a distributed system, it is pretty costly(
As CollectionAlias
works as an extra pointer to the existing collection in the RootCoordinator
, we can implement hot reloading at a much lower cost compared to the 1
, 2
approaches.
Public Interfaces
New Public APIs
There will be CollectionAlias
related 3 new APIs.
CreateAlias
,DropAlias
,AlterAlias
Code Block |
---|
// milvus.proto
message CreateAliasRequest {
common.MsgBase base = 1;
string collection_name = 2;
string alias = 3;
}
message DropAliasRequest {
common.MsgBase base = 1;
string collection_name = 2;
string alias = 3;
}
message AlterAliasRequest{
common.MsgBase base = 1;
string collection_name = 2;
string alias = 3;
}
service MilvusService {
// NEW
rpc CreateAlias(SetAliasRequest) returns (common.Status) {}
// NEW
rpc DropAlias(DropAliasRequest) returns (common.Status) {}
// NEW
rpc AlterAlias(AlterAliasRequest) returns (common.Status) {}
}
|
Changes to Existing APIs
- Users can't drop the
collection
if thecollection
is referenced by analias
. DescribeCollection
now returnsaliases
message DescribeCollectionResponse {
common.Status status = 1;
schema.CollectionSchema schema = 2;
int64 collectionID = 3;
repeated string virtual_channel_names = 4;
repeated string physical_channel_names = 5;
uint64 created_timestamp = 6;
uint64 created_utc_timestamp = 7;
// NEW
**repeated string aliases = 8;**
}
service MilvusService {
// Users are required to drop the aliases first before dropping the collection.
**rpc DropCollection(DropCollectionRequest) returns (common.Status) {}**
// DescribeCollectionResponse containes `aliases` that refer to this collection.
**rpc DescribeCollection(DescribeCollectionRequest) returns (DescribeCollectionResponse) {}**
}
Design Details
Changes to the MetaTable
type metaTable struct {
client kv.SnapShotKV // client of a reliable kv service, i.e. etcd client
tenantID2Meta map[typeutil.UniqueID]pb.TenantMeta // tenant id to tenant meta
proxyID2Meta map[typeutil.UniqueID]pb.ProxyMeta // proxy id to proxy meta
collID2Meta map[typeutil.UniqueID]pb.CollectionInfo // collection_id -> meta
collName2ID map[string]typeutil.UniqueID // collection name to collection id
// NEW
**collAlias2ID map[string]typeutil.UniqueID**
...
}
As CollectionAlias
& CollectionName
are equal, GetCollectionByName
also checks metaTable.collAlias2ID
when getting the collection by name.
func (mt *metaTable) GetCollectionByName(collectionName string, ts typeutil.Timestamp) (*pb.CollectionInfo, error) {
mt.ddLock.RLock()
defer mt.ddLock.RUnlock()
if ts == 0 {
vid, ok := mt.collName2ID[collectionName]
if !ok {
// NEW
**if vid, ok = mt.collAlias2ID[collectionName]; !ok {**
return nil, fmt.Errorf("can't find collection: " + collectionName)
}
}
...
}
CollectionAlias
also have to be persisted in the etcd
.
const (
ComponentPrefix = "root-coord"
TenantMetaPrefix = ComponentPrefix + "/tenant"
ProxyMetaPrefix = ComponentPrefix + "/proxy"
CollectionMetaPrefix = ComponentPrefix + "/collection"
SegmentIndexMetaPrefix = ComponentPrefix + "/segment-index"
IndexMetaPrefix = ComponentPrefix + "/index"
// NEW Additions
**CollectionAliasMetaPrefix = ComponentPrefix + "/collection-alias"**
)
For persistence in etcd
, the key will be fmt.Sprintf("%s/%s", CollectionAliasMetaPrefix, CollectionAlias)
and the value be CollectionID
.
Changes to the RootCoordinator
// root_coord.proto
service RootCoord {
**// NEW**
// CreateAlias creates 1 to 1 mapping between `alias` and `collection_name`
// 1. If there no `alias` in the metaTable:
// 1.1 new `alias` will be added to the metaTable
// 1.2 `alias` will be persisted in the `etcd`
// 1.3 `dd_op` will sent to log broker.
// 2. If there is `alias/collection` in the metaTable:
// 2.1 An `alias/collection already exists` error will be returned.
rpc CreateAlias(milvus.CreateAliasRequest) returns (common.Status) {}
**// NEW**
// 1. DropAlias
// 1.1 Removes Mapping from metaTable
// 1.2 Removes Mapping from `etcd`
// 1.3 `dd_op` will be sent to log broker.
// 1.4 Invalidates proxy caches
rpc DropAlias(milvus.DropAliasRequest) returns (common.Status) {}
**// NEW**
// 1. AlterAlias
// 1.1 Mapping will be updated in metaTable.
// 1.2 Mapping will be updated in `etcd`
// 1.3 `dd_op` will be sent to log broker.
// 1.4 Invalidates proxy caches.
rpc AlterAlias(milvus.AlterAliasRequest) returns (common.Status) {}
**// UPDATED REQUIRED**
// Collection can't be dropped it is referenced by an `alias`.
rpc DropCollection(DropCollectionRequest) returns (common.Status) {}
**// UPDATED REQUIRED**
// DescribeCollection now returns `aliases`
rpc DescribeCollection(DescribeCollectionRequest) returns (DescribeCollectionResponse) {}
}
Compatibility, Deprecation, and Migration Plan
CollectionAlias
design is not intrusive, there won't be compatibility issues.
Test Plan
- Unit tests
Rejected Alternatives
- Rejected
Physical Collection Rename
Design due to:- high implementation complexity
- potential undesirable performance characteristics
- Rejected
SetAlias
APISetAlias
combinesCreateAlias
&UpdateAlias
semantics, but it may have unexpected behaviors for the user. e.g. If the alias was already in themetaTable
there is a risk that it may be overridden unexpectedly.
References
- ElasticSearch provides similar abstraction https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html