Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Current state: "Under Discussion"

...

Keywords: string tries hybrid search

Released:

Summary

This article introduces the importance of supporting the String data type. Review the data flow and data model of the Milvus system. Design different StringField implementations for Segments according to different data sources (Streaming or Historical). 
For Streaming Segment, introduce the definition of StreamStringField.
For Histrical Segment, two implementations of HistoricalStringField are introduced.
HistoricalStringField1 -- base on std::vectors and unorder_map
HistoricalStringField2 -- base on marisa trie and std::vector

we also Introduce the processing logic of StringField on different working nodes.

Motivation

The data types currently supported by Milvus do not include the String type. According to the feedback in the previous issue list, the support of the String data type is expected by many users. One of the most urgent requirements of the String type is to support the primary key of the custom String type. Take the image search application as an example. In actual needs, users need to uniquely identify a picture with a string type value. The string type value can be the name of the picture, the md5 value of the picture, and so on. Since Milvus does not support the string data type, users need to make an additional int64 to string mapping externally, which reduces the efficiency and increases the maintenance cost of the entire application.

...