2020-12-02 CM - File Lineage

Date

Attendees

Goals

  • Share use cases and design of Egeria's support for lineage of file processing

Discussion items

TimeItemWho

Notes

5minsWelcomeAll
45 minsFile LineageMandy

Attachment reviewed: lineage.drawio

  • Unique challenge with files is that the Asset you're interested in could be at various levels:
    • could be just a file itself (without much concern for the folder it appears within)
    • or it could be all of the contents within a folder that is more of interest (eg. if the folder contains something like rolling logs, or a number of files where each contains a daily snapshot of information, etc) (DataFolder)
    • important to treat these distinctly for various reasons:
      • behaving differently in lineage, eg. avoiding showing a fanning-out of many daily snapshot files when it's really one holistic dataset that happens to have daily snapshots within it (in the case of a DataFolder)
      • need different connector types to be able to read their contents differently (one directly reads files, the other needs to combine the contents of all of the contents of the files)
    • but also to ensure that the DataFolder actually extends FileFolder (so that a given instance is actually both, and can therefore be consumed in both ways depending on the user and their needs in accessing it)
  • Discussed classification of elements (like files) that may be deleted but still need to be present in the lineage
    • for example, a landing file that is picked up by a process and moved elsewhere – we need to know about the landing file (even though it's been deleted) to show the lineage all the way back to its ultimate source
    • we believe there is a relatively common industry term to represent this concept: Tombstone
    • however, we believe this could be considered a sensitive trigger term, and therefore are in favour of some other term to classify such elements in lineage: suggesting we go for Memento
10 minsAOBAll--

Action items