Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: formatting changes

The OpenLineage Technical Steering Committee meetings are Monthly on the third Wednesday from 9:30am to 10:30am US Pacific. Here's the meeting info.

All are welcome.

Table of Contents

...

The OpenLineage Technical Steering Committee meetings are Monthly on the third Wednesday from 9:30am to 10:30am US Pacific. Here's the meeting info.

All are welcome.

Table of Contents

Next meeting: February 19th, 2025 (9:30am PT)

January 15th, 2025 (9:30am PT)

Attendees:

TSC:
- Julien LeDem, Datadog, OpenLineage Project Lead
- Michael Robinson, OpenLineage Community
- Maciej Obuchowski, Software Engineer, GetInData
- Sheeri Cabral, Product Manager, Capital One Software
  
And:
- Dan Rolles, Founder/CEO, BearingNode
- Leo Godin, Data Engineer, NewRelic
Notes:
  • Recent Releases 
  • Presentations
    • Data and Information Observability - Dan Rolles
      • BCBS239 - Only 2 out of 31 banks fully comply with BCBS239 even though it's 10 years old. It's about Risk management.
      • Dan presents a Data & Information Observability Framework (slide screenshot forthcoming)
        • Tried not to duplicate capabilities - e.g. Risk Management and Compliance are covered by Data Governance
      • Discussion points - for a working group
        • Standardizing Financial Data Lineage Events
        • Unstructured Data and LLM Pipeline Observability
        • Value-Aligned Dataset Consumption Patterns
    • OpenLineage in Airflow 3
      • Airflow 3 is rewriting its architecture and eliminating direct connection between workers and the Airflow ?, will be using API now
      • In Airflow 2, users could manually mark tasks/DAG runs as successful or failure, but this was not emitted out with other OpenLineage information. This will be fixed in Airflow 3
      • Future features:
  • Open Discussion
    • Github releases are up-to-date but documentation release notes are not automatically updated.
    • Tagging - on a per-integration basis. Key/value pairs. Discussion of olin vs. ol. Leo will put a proposal in for dbt tags.
  
Meeting:
video links (forthcoming)

2024

December 18th, 2024 (9:30am PT)

November 20th, 2024 (9:30am PT)

...

August 14th, 2024 (9:30am PT)

Needs: Upload video and wiki notes

...

August 14, 2024

...

Attendees:

...

**TSC:**
- Michael Robinson, Astronomer

...

- Sheeri Cabral, Product Manager, Collibra
  
**And:**
- Dan Rolles, Founder/CEO, BearingNode

...

- Chris, Software Engineer, MatilliionMatillion
**Notes:**-
  • Announcements

      ...

        • Meetup - San Francisco, Sept 12th, during Airflow Summir (link to meetup)

      ...

        • New committers - Jens Pfau (Google), Sheeri Cabral (Collibra)

      ...

        • New integrations - Amazon DataZone, Trino

      ...

      • Recent Releases 

          ...

          ...

          ...

          ...

          • - AWS DataZone Integration Update - Priya
          • - OpenLineage consumer - specifically AWS Glue on Redshift
          • - Implementation of compliance/acceptance tests - Tomasz
          • - Framework for consumers and producers to make their OpenLineage compatibility public. LINK TO GITHUB
          • - Discussion Items
          • - Proposal: deprecate support for Spark 2.4 - Maciej
          • - Does anyone have use cases? Let us know in Slack.
          • - Open Discussion
            
          Meeting:
          Slides and video links (forthcoming)

          July 10th, 2024 (9:30am PT)

          Attendees:

          TSC:
          - Michael Robinson, Astronomer

          ...

          Integration matrix
              - Jens suggests expanding on the integration matrix and mentions issues with iceberg support in Spark.
              - Eric reflects on Jens' suggestion.
              - Michael Robinson thanks Jens for the input.

          2023

          December 14, 2023 (10am PT)

          ...

          • TSC:
            • Mike Collado, Staff Software Engineer, Astronomer
            • Julien Le Dem, OpenLineage Project lead
            • Willy Lulciuc, Co-creator of Marquez
            • Michael Robinson, Software Engineer, Dev. Rel., Astronomer
            • Maciej Obuchowski, Software Engineer, GetInData, OpenLineage contributor
            • Mandy Chessell, Egeria Project Lead
            • Daniel Henneberger, Database engineer
            • Will Johnson, Senior Cloud Solution Architect, Azure Cloud, Microsoft
            • Jakub "Kuba" Dardziński, Software Engineer, GetInData, OpenLineage contributor
          • And:
            • Petr Hajek, Information Management Professional, Profinit
            • Harel Shein, Director of Engineering, Astronomer
            • Minkyu Park, Senior Software Engineer, Astronomer
            • Sam Holmberg, Software Engineer, Astronomer
            • Ernie Ostic, SVP of Product, MANTA
            • Sheeri Cabral, Technical Product Manager, Lineage, Collibra
            • John Thomas, Software Engineer, Dev. Rel., Astronomer
            • Bramha Aelem, BigData/Cloud/ML and AI Architect, Tiger Analytics

          ...

          • Announcements
            • OpenLineage earned Incubation status with the LFAI & Data Foundation at their December TAC meeting!
              • Represents our maturation in terms of governance, code quality assurance practices, documentation, more
              • Required earning the OpenSSF Silver Badge, sponsorship, at least 300 GitHub stars
              • Next up: Graduation (expected in early summer)
          • Recent release 0.19.2 [Michael R.]
          • Column-level lineage update [Maciej]
            • What is the OpenLineage SQL parser?
              • At its core, it’s a Rust library that parses SQL statements and extracts lineage data from it 
              • 80/20 solution - we’ll not be able to parse all possible SQL statements - each database has custom extensions and different syntax, so we focus on standard SQL.
              • Good example of complicated extension: Snowflake COPY INTO https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
              • We primarily use the parser in Airflow integration and Great Expectations integration
              • Why? Airflow does not “understand” a lot of what some operators do, for example PostgreSqlOperator
              • We also have Java support package for parser   
            • What changed previously?
              • Parser in current release can emit column-level lineage!
              • Last OL meeting Piotr Wojtczak, primary author of this change presented new core of parser that enabled that functionality
                https://www.youtube.com/watch?v=Lv_bODeAVYQ
              • Still, the fact that Rust code can do that does not mean we have it for free everywhere
            • What has changed recently?
              • We wrote “glue code” that allows us to use new parser constructs in Airflow integration
              • Error handling just got way easier: SQL parser can “partially” parse SQL construct, and report errors it encountered, with particular statements that caused it.
            • Usage
              • Airflow integration extractors based on SqlExtractor (ex. PostgreSqlExtractor, SnowflakeExtractor, TrinoExtractor…) are now able to extract column-level lineage
              • Close future: Spark will be able to extract lineage from JDBCRelation.
          • Recent improvements to the Airflow integration [Kuba]
            • OpenLineage facets
              • Facets are pieces of metadata that can be attached to the core entities: run, job or dataset
              • Facets provide context to OpenLineage events
              • They can be defined as either part of the OpenLineage spec or custom facets
            • Airflow generic facet
              • Previously multiple custom facets with no standard
                • AirflowVersionRunFacet as an example of rapidly growing facet with version unrelated information
              • Introduced AirflowRunFacet with Task, DAG, TaskInstance and DagRun properties
              • Old facets are going to be deprecated soon. Currently both old and new facets are emitted
                • AirflowRunArgsRunFacet, AirflowVersionRunFacet, AirflowMappedTaskRunFacet will be removed
                • All information from above is moved to AirflowRunFacet
            • Other improvements (added in 0.19.2)
              • SQL extractors now send column-level lineage metadata
              • Further facets standardization

                • Introduced ProcessingEngineRunFacet
                  • provides processing engine information, e.g. Airflow or Spark version
                • Improved support for nominal start & end times
                  • makes use of data interval (introduced in Airflow 2.x)
                  • nominal end time now matches next schedule time
                • DAG owner added to OwnershipJobFacet
                • Added support for S3FileTransformOperator and TrinoOperator (@sekikn’s great contribution)
          • Discussion: what does it mean to implement the spec? [Sheeri]
            • What is it mean to meet the spec?
              • 100% compliance is not required
              • OL ecosystem page
                • doesn't say what exactly it does
                • operational lineage not well defined
                • what does a payload look like? hard to find this info
              • Compatibility between producers/consumers is unclear
            • Important if standard is to be adopted widely [Mandy]
              • Egeria: uses compliance test with reports and badging; clarifies compatibility
              • test and test cases available in the Egeria repo, including profiles and clear rules about compliant ways to support Egeria
              • a badly behaving producer or consumer will create problems
              • have to be able to trust what you get
            • What about consumers? [Mike C.]
              • can we determine if they have done the correct thing with facets? [John]
              • what do we call "compliant"?
              • custom facets shouldn't be subject to this – they are by definition custom (and private) [Maciej]
              • only complete events (not start events) should be required – start events not desired outside of operational use cases [Maciej]
            • There's a simple baseline on the one hand and facets on the other [Julien]
            • Note: perfection isn't the goal
              • instead: shared test cases, data such as sample schema that can be tested against
            • Marquez doesn't explain which facets it's using or how [Willy]
              • communication by consumers could be better
            • Effort at documenting this: matrix [Julien]
            • How would we define failing tests? [Maciej]
              • at a minimum we could have a validation mode [Julien]
              • challenge: the spec is always moving, growing [Maciej]
              • ex: in the case of JSON schema validation, facets are versioned individually but there's a reference schema that is versioned that might not be the current schema. Facets can be dereferenced, but the right way to do this is not clear [Danny]
              • one solution could be to split out base times, or we could add a tool that would force us to clean this up
              • client-side proxy presents same problem; tried different validators in Go; a workaround is to validate against the main doc first; by continually validating against the client proxy we can make sure it stays compliant with the spec [Minkyu]
              • Mandy: if Marquez says it's "OK," it's OK; we've been doing it manually [Mandy]
              • Marquez doesn't do any validation for consumers [Mike C.]
              • manual validation is not good enough [Mandy]
              • I like the idea of compliance badges – it would be cool if we had a way to validate consumers and there were a way to prove this and if we could extend validation to integrations like the Airflow integration [Mike C.]
            • Let's follow up on Slack and use the notes from this discussion to collaborate on a proposal [Julien]

          2022

          December 8, 2022 (10am PT)

          ...

          • Release 0.9.0 [Michael R.]
            • We added:
            • For the bug fixes and more information, see the Github repo.
            • Shout out to new contributor Jakub Dardziński, who contributed a bug fix to this release!
          • Snowflake Blog Post [Ross]
            • topic: a new integration between OL and Snowflake
            • integration is the first OL extractor to process query logs
            • design:
              • an Airflow pipeline processes queries against Snowflake
              • separate job: pulls access history and assembles lineage metadata
              • two angles: Airflow sees it, Snowflake records it
            • the meat of the integration: a view that does untold SQL madness to emit JSON to send to OL
            • result: you can study the transformation by asking Snowflake AND Airflow about it
            • required: having access history enabled in your Snowflake account (which requires special access level)
            • Q & A
              • Howard: is the access history task part of the DAG?
              • Ross: yes, there's a separate DAG that pulls the view and emits the events
              • Howard: what's the scope of the metadata?
              • Ross: the account level
              • Michael C: in Airflow integration, there's a parent/child relationship; is this captured?
              • Ross: there are 2 jobs/runs, and there's work ongoing to emit metadata from Airflow (task name)
          • Great Expectations integration [Michael C.]
            • validation actions in GE execute after validation code does
            • metadata extracted from these and transformed into facets
            • recent update: the integration now supports version 3 of the GE API
            • some configuration ongoing: currently you need to set up validation actions in GE
            • Q & A
              • Willy: is the metadata emitted as facets?
              • Michael C.: yes, two
          • dbt integration [Willy]
            • a demo on getting started with the OL-dbt library
              • pip install the integration library and dbt
              • configure the dbt profile
              • run seed command and run command in dbt
              • the integration extracts metadata from the different views
              • in Marquez, the UI displays the input/output datasets, job history, and the SQL
          • Open discussion
            • Howard: what is the process for becoming a committer?
              • Maciej: nomination by a committer then a vote
              • Sheeri: is coding beforehand recommended?
              • Maciej: contribution to the project is expected
              • Willy: no timeline on the process, but we are going to try to hold a regular vote
              • Ross: project documentation covers this but is incomplete
              • Michael C.: is this process defined by the LFAI?
            • Ross: contributions to the website, workshops are welcome!
            • Michael R.: we're in the process of moving the meeting recordings to our YouTube channel

          May 19th, 2022 (10am PT)

          Agenda:

          ...

          • TSC:
            • Mike Collado: Staff Software Engineer, Datakin
            • Maciej Obuchowski: Software Engineer, GetInData, OpenLineage contributor
            • Julien Le Dem: OpenLineage Project lead
            • Willy Lulciuc: Co-creator of Marquez
          • And:
            • Ernie Ostic: SVP of Product, Manta 
            • Sandeep Adwankar: Senior Technical Product Manager, AWS
            • Paweł Leszczyński, Software Engineer, GetinData
            • Howard Yoo: Staff Product Manager, Astronomer
            • Michael Robinson: Developer Relations Engineer, Astronomer
            • Ross Turk: Senior Director of Community, Astronomer
            • Minkyu Park: Senior Software Engineer, Astronomer
            • Will Johnson: Senior Cloud Solution Architect, Azure Cloud, Microsoft

          Meeting:

          Widget Connector
          urlhttp://youtube.com/watch?v=X0ZwMotUARA

          Notes:

          • Releases
          • Communication reminders [Julien]
          • Agenda [Julien]
          • Column-level lineage [Paweł]
            • Linked to 4 PRs, the first being a proposal
            • The second has been merged, but the core mechanism is turned off
            • 3 requirements:
              • Outputs labeled with expression IDs
              • Inputs with expression IDs
              • Dependencies
            • Once it is turned on, each OL event will receive a new JSON field
            • It would be great to be able to extend this API (currently on the roadmap)
            • Q & A
              • Will: handling user-defined functions: is the solution already generic enough?
                • The answer will depend on testing, but I suspect that the answer is yes
                • The team at Microsoft would be excited to learn that the solution will handle UDFs
              • Julien: the next challenge will be to ensure that all the integrations support column-level lineage
          • Open discussion
            • Willy: in Mqz we need to start handling col-level lineage, and has anyone thought about how this might work?
              • Julien: lineage endpoint for col-level lineage to layer on top of what already exists
              • Willy: this makes sense – we could use the method for input and output datasets as a model
              • Michael C.: I don't know that we need to add an endpoint – we could augment the existing one to do something with the data
              • Willy: how do we expect this to be visualized?
                • Julien: not quite sure
                • Michael C.: there are a number of different ways we could do this, including isolating relevant dataset fields 

          ...

          • 0.6.2 release overview [Michael R.]
          • Transports in OpenLineage clients [Maciej]
          • Airflow integration update [Maciej]
          • Dagster integration retrospective [Dalin]
          • Open discussion

          Meeting info:

          Widget Connector
          urlhttp://youtube.com/watch?v=MciFCgrQaxk

          Notes:

          • Introductions
          • Communication channels overview [Julien]
          • Agenda overview [Julien]
          • 0.6.2 release overview [Michael R.]

          ...

          • New committers [Julien]
            • 4 new committers were voted in last week
            • We had fallen behind
            • Congratulations to all
          • Release overview (0.6.0-0.6.1) [Michael R.]
            • Added
              • Extract source code of PythonOperator code similar to SQL facet @mobuchowski (0.6.0)
              • Airflow: extract source code from BashOperator @mobuchowski (0.6.0)
                • These first two additions are similar to SQL facet
                • Offer the ability to see top-level code
              • Add DatasetLifecycleStateDatasetFacet to spec @pawel-big-lebowski (0.6.0)
                • Captures when someone is conducting dataset operations (overwrite, create, etc.)
              • Add generic facet to collect environmental properties (EnvironmentFacet) @harishsune (0.6.0)
                • Collects environment variables
                • Depends on Databricks runtime but can be reused in other environments
              • OpenLineage sensor for OpenLineage-Dagster integration @dalinkim (0.6.0)
                • The first iteration of the Dagster integration to get lineage from Dagster
              • Java-client: make generator generate enums as well @pawel-big-lebowski (0.6.0)
                • Small addition to Java client feat. better types; was string
            • Fixed
              • Airflow: increase import timeout in tests, fix exit from integration @mobuchowski (0.6.0)
                • The former was a particular issue with the Great Expectations integration
              • Reduce logging level for import errors to info @rossturk (0.6.0)
                • Airflow users were seeing warnings about missing packages if they weren't using a part of an integration
                • This fix reduced the level to Info
              • Remove AWS secret keys and extraneous Snowflake parameters from connection URI @collado-mike (0.6.0)
                • Parses Snowflake connection URIs to exclude some parameters that broke lineage or posed security concerns (e.g., login data)
                • Some keys are Snowflake-specific, but more can be added from other data sources
              • Convert to LifecycleStateChangeDatasetFacet @pawel-big-lebowski (0.6.0)
                • Mandates the LifecycleStateChange facet from the global spec rather than the custom tableStateChange facet used in the past
              • Catch possible failures when emitting events and log them @mobuchowski (0.6.1)
                • Previously when an OL event failed to emit, this could break an integration
                • This fix catches possible failures and logs them
          • Process for blog posts [Ross]
            • Moving the process to Github Issues
            • Follow release tracker there

            • Go to https://github.com/OpenLineage/website/tree/main/contents/blog to create posts

            • No one will have a monopoly

            • Proposals for blog posts also welcome and we can support your efforts with outlines, feedback

            • Throw your ideas on the issue tracker on Github

          • Retrospective: Spark integration [Willy et al.]
            • Willy: originally this part of Marquez – the inspiration behind OL

              • OL was prototyped in Marquez with a few integrations, one of which was Spark (other: Airflow)

              • Donated the integration to OL

            • Srikanth: #559 very helpful to Azure

            • Pawel: is anything missing from the Spark integration? E.g., column-level lineage?

            • Will: yes to column-level; also, delta tables are an issue due to complexity; Spark 3.2 support also welcome

            • Maciej: should be more active about tracking projects we have integrations with; add to test matrix 

            • Julien: let’s open some issues to address these

          • Open Discussion
            • Flink updates? [Julien]
              • Maciej: initial exploration is done

                • challenge: Flink has 4 APIs

                • prioritizing Kafka lineage currently because most jobs are writing to/from Kafka

                • track this on Github milestones, contribute, ask questions there

              • Will: can you share thoughts on the data model? How would this show up in MZ? How often are you emitting lineage? 

              • Maciej: trying to model entire Flink run as one event

              • Srikanth: proposed two separate streams, one for data updates and one for metadata

              • Julien: do we have an issue on this topic in the repo?

              • Michael C.: only a general proposal doc, not one on the overall strategy; this worth a proposal doc

              • Julien: see notes for ticket number; MC will create the ticket

              • Srikanth: we can collaborate offline

          ...

          • OpenLineage recent release overview (0.5.1) [Julien]
          • TaskInstanceListener now official way to integrate with Airflow [Julien]
          • Apache Flink integration [Julien]
          • Dagster integration demo [Dalin]
          • Open Discussion

          Meeting:

          Slides

          Widget Connector
          urlhttp://youtube.com/watch?v=cIrXmC0zHLg

          Notes:

          • OpenLineage recent release overview (0.5.1) [Julien]
            • No 0.5.0 due to bug
            • Support for dbt-spark adapter
            • New backend to proxy OL events
            • Support for custom facets
          • TaskInstanceListener now official way to integrate with Airflow [Julien]
            • Integration runs on worker side
            • Will be in next OL release of airflow (2.3)
            • Thanks to Maciej for his work on this
          • Apache Flink integration [Julien]
            • Ticket for discussion available
            • Integration test setup
            • Early stages
          • Dagster integration demo [Dalin]
            • Initiated by Dalin Kim
            • OL used with Dagster on orchestration layer
            • Utilizes Dagster sensor
            • Introduces OL sensor that can be added to Dagster repo definition
            • Uses cursor to keep track of ID
            • Looking for feedback after review complete
            • Discussion:
              • Dalin: needed: way to interpret Dagster asset for OL
              • Julien: common code from Great Expectations/Dagster integrations
              • Michael C: do you pass parent run ID in child job when sending the job to MZ?
              • Hierarchy can be extended indefinitely – parent/child relationship can be modeled
              • Maciej: the sensor kept failing – does this mean the events persisted despite being down?
              • Dalin: yes - the sensor’s cursor is tracked, so even if repo goes down it should be able to pick up from last cursor
              • Dalin: hoping for more feedback
              • Julien: slides will be posted on slack channel, also tickets
          • Open discussion
            • Will: how is OL ensuring consistency of datasets across integrations? 
            • Julien: (jokingly) Read the docs! Naming conventions for datasets can be found there
            • Julien: need for tutorial on creating integrations
            • Srikanth: have done some of this work in Atlas
            • Kevin: are there libraries on the horizon to play this role? (Julien: yes)
            • Srikanth: it would be good to have model spec to provide enforceable standard
            • Julien: agreed; currently models are based on the JSON schema spec
            • Julien: contributions welcome; opening a ticket about this makes sense
            • Will: Flink integration: MZ focused on batch jobs
            • Julien: we want to make sure we need to add checkpointing
            • Julien: there will be discussion in OLMZ communities about this
              • In MZ, there are questions about what counts as a version or not
            • Julien: a consistent model is needed
            • Julien: one solution being looked into is Arrow
            • Julien: everyone should feel welcome to propose agenda items (even old projects)
            • Srikanth: who are you working with on the Flink comms side? Will get back to you.

          ...

          ...

          Proposal to convert licenses to SPDX [Michael]: no objections

          2021

          Dec 8th 2021 (9am PT)

          Attendees:

          ...

          • Attendees: 
            • TSC:
              • Mandy Chessell: Egeria Lead. Integrating OpenLineage in Egeria

              • Michael Collado: Datakin, OpenLineage

              • Maciej Obuchowski: GetInData. OpenLineage integrations
              • Willy Lulciuc: Marquez co-creator.
              • Ryan Blue: Tabular, Iceberg. Interested in collecting lineage across iceberg user with OpenLineage
            • And:
              • Venkatesh Tadinada: BMC workflow automation looking to integrate with Marquez
              • Minkyu Park: Datakin. learning about OpenLineage
              • Arthur Wiedmer: Apple, lineage for Siri and AI ML. Interested in implementing Marquez and OpenLineage
          • Meeting recording:

          Widget Connector
          urlhttp://youtube.com/watch?v=Gk0CwFYm9i4

          • Meeting notes:
            • agenda: 
              • Update on OpenLineage latest release (0.2.1)

                • dbt integration demo

              • OpenLineage 0.3 scope discussion

                • Facet versioning mechanism (Issue #153)

                • OpenLineage Proxy Backend (Issue #152)

                • OpenLineage implementer test data and validation

                • Kafka client

              • Roadmap

                • Iceberg integration
              • Open discussion

            • Slides 

            • Discussions:
              • added to the agenda a Discussion of Iceberg requirements for OpenLineage.

            • Demo of dbt:

              • really easy to try

              • when running from airflow, we can use the wrapper 'dbt-ol run' instead of 'dbt run'

            • Presentation of Proxy Backend design:

              • summary of discussions in Egeria
                • Egeria is less interested in instances (runs) and will keep track of OpenLineage events separately as Operational lineage

                • Two ways to use Egeria with OpenLineage

                  • receives HTTP events and forwards to Kafka

                  • A consumer receives the Kafka events in Egeria

              • Proxy Backend in OpenLineage:

                • direct HTTP endpoint implementation in Egeria

              • Depending on the user they might pick one or the other and we'll document

            • Use a direct OpenLineage endpoint (like Marquez)

              • Deploy the Proxy Backend to write to a queue (ex: Kafka)

              • Follow up items:

          ...

          Aug 11th 2021

          • Attendees: 
            • TSC:
              • Ryan Blue

              • Maciej Obuchowski

              • Michael Collado

              • Daniel Henneberger

              • Willy Lulciuc

              • Mandy Chessell

              • Julien Le Dem

            • And:
              • Peter Hicks

              • Minkyu Park

              • Daniel Avancini

          • Meeting recording:

          Widget Connector
          urlhttp://youtube.com/watch?v=bbAwz-rzo3I

          ...

          • Attendees: 
            • TSC:
              • Julien Le Dem
              • Mandy Chessel
              • Michael Collado
              • Willy Lulciuc
          • Meeting recording:

          Widget Connector
          urlhttp://youtube.com/watch?v=kYzFYrzSpzg

          • Meeting notes
            • Agenda:
            • Notes: 

              Mission statement:

              Spec versioning mechanism:

              • The goal is to commit to compatible changes once 0.1 is published

              • We need a follow up to separate core facet versioning


              => TODO: create a separate github ticket.
              • The lineage event should have a field that identifies what version of the spec it was produced with

                • => TODO: create a github issue for this

              • TODO: Add issue to document version number semantics (SCHEMAVER)

              Extend Event State notion:

              OpenLineage 0.1:

              • finalize a few spec details for 0.1 : a few items left to discuss.

                • In particular job naming

                • parent job model

              • Importing Marquez integrations in OpenLineage

              Open Discussion:

              • connecting the consumer and producer

                • TODO: ticket to track distribution mechanism

                • options:

                  • Would we need a consumption client to make it easy for consumers to get events from Kafka for example?

                  • OpenLineage provides client libraries to serialize/deserialize events as well as sending them.

                • We can have documentation on how to send to backends that are not Marquez using HTTP and existing gateway mechanism to queues.

                • Do we have a mutual third party or the client know where to send?

              • Source code location finalization

              • job naming convention

                • you don't always have a nested execution

                  • can call a parent

                • parent job

                • You can have a job calling another one.

                • always distinguish a job and its run

              • need a separate notion for job dependencies

              • need to capture event driven: TODO: create ticket.


              TODO(Julien): update job naming ticket to have the discussion.

          ...

          • Attendees: 
            • TSC:
              Julien Le Dem: Marquez, Datakin
              Drew Banin: dbt, CPO at fishtown analytics
              Maciej Obuchowski: Marquez, GetIndata consulting company
              Zhamak Dehghani: Datamesh, Open protocol of observability for data ecosystem is a big piece of Datamesh
              Daniel Henneberger: building a database, interested in lineage
              Mandy Chessel: Lead of Egeria, metadata exchange. lineage is a great extension that volunteers lineage
              Willy Lulciuc: co-creator of Marquez
              Michael Collado: Datakin, OpenLineage end-to-end holistic approach.
            • And:
              Kedar Rajwade: consulting on distributed systems.
              Barr Yaron: dbt, PM at Fishtown analytics on metadata.
              Victor Shafran: co-founder at databand.ai pipeline monitoring company. lineage is a common issue
            • Excused: Ryan Blue, James Campbell
          • Meeting recording:

          Widget Connector
          urlhttp://youtube.com/watch?v=er2GDyQtm5M

          ...