Monthly TSC meeting

Monthly TSC meeting

The OpenLineage Technical Steering Committee meetings are Monthly on the third Wednesday from 9:30am to 10:30am US Pacific. Here's the meeting info.

All are welcome.

 

Next meeting: January 28th, 9:30 am PT

December 17, 2025 (9:30 am PT)

November 19, 2025 (9:30 am PT)

TSC Members:

  • Tomasz Nazarewicz, Software Engineer, GetInData

  • Maciej Obuchowski, Software Engineer, GetInData, OpenLineage committer

  • Sheeri Cabral, Engineering Director, myKaarma

  • Harel Shein, Engineering Manager, Datadog

And:

  • Jakub Moravec, Product Manager, IBM MANTA

  • Daniel Rolles, CEO/Founder, BearingNode

  • Adam, Software Engineer, Moody’s

Notes:

  • Announcements

    •  

  • Recent Releases

    • OpenLineage 1.40.0 and 1.40.1

      • 25 commits by 12 contributors in this release, including:

      • Many new spec changes

        • Batch endpoint for OpenLineage events

        • New JobDependenciesRunFacet

        • Support for temporary datasets

        • Improvements to Spark and Hive integrations

  • Presentations

    • OpenLineage at IBM Manta - Jakub Moravec, IBM/Manta

    • dbt producer compatibility tests - Daniel Rolles, BearingNode

  • Open Discussion

    •  

 

Meeting:

Slides

October 15, 2025 (9:30 am PT)

  •  

And:

  •  

Notes:

  • Announcements

    •  

  • Recent Releases

    •  

  • Presentations

    •  

      •  

  • Open Discussion

    •  

 

Meeting:

Slides

Sep 17, 2025 (9:30 am PT)

TSC Members

  • Michael Robinson, OpenLineage Community

  • Minkyu Park, Senior Engineer, Oleander

  • Willy Lulciuc, co-creator of Open Lineage, project lead, Marquez, CEO Oleander

  • Sheeri Cabral, Engineering Director, myKaarma

  • Harel Shein, Engineering Manager, Datadog

And:

  • Jakub Moravec, Product Manager, IBM MANTA

  • Daniel Rolles, Founder/CEO, BearingNode

  • Luke Hoffman, Implementation Engineer, Atlan

Notes:

  • Announcements

  • Recent Releases

    • OpenLineage 1.36.0

      • 73 commits by 8 contributors in this release, including:

      • Spark: more support for Spark 4.0, including delta 4.0

      • dbt: add query ID tracking

      • Python client: Formalize dataset naming conventions

      • 2 first time contributors: Jake Roach (jroachgolf84), Shadi Abdelfatah (Shadi)

    • OpenLineage 1.37.0

      • 46 commits by 10 contributors in this release, including:

      • Spark: add support for WriteDelta and WritelcebergDelta

      • dbt: option to override dot job name

      • Python: optimize gzip compression params, add Datadog transport

      • Java: optimize JSON serialization performance

      • 3 first time contributors: orthoxerox, kyungryun choi,
        SalvadorRomo

  • Presentations

    • ML Support on OpenLineage - Willy Lulciuc

      •  

  • Open Discussion

    •  

 

Meeting:

Slides

July 16, 2025 (9:30 am PT)

TSC Members:

  • Maciej Obuchowski, Software Engineer, GetInData

  • Paweł Leszczyński, Software Engineer, GetInData

  • Tomasz Nazarewicz, Lead Data Engineer, Xebia

  • Sheeri Cabral, Engineering, myKaarma

  • Kacper Muda, Astronomer

  • Julien LeDem, Project Lead, Datadog

And:

  • Mario Fiore Vitale, IBM Debezium

  • Daniel Rolles, Founder/CEO, BearingNode

  • Luke Hoffman, Customer Success, Atlan

Notes:

  • Announcements

  • Recent Releases

  • Presentations

    • Native Data Lineage in Debezium with OpenLineage - Mario Fiore Vitale, IBM

    • Farewell to Spark 2 - PR #3904 - Please give feedback soon, before this PR is merged!

  • Open Discussion

    • Post about vendor compliance with OpenLineage at BearingNode - Daniel Rolles

    • Different dimensions of compliance and governance - Daniel Rolles

      • Working with Jakub at IBM Manta

      • Added Structured and Unstructured as dimensions, another “side of the cube” of the Connected Operating Model, in addition to the other 3 sides: People/Process/Technology/Data, BCBS 2329 compliance categories, and Value/Discover/Track/Comply/Govern

Meeting:

Slides

June 18, 2025 (9:30 am PT)

TSC:

  • Maciej Obuchowski, Software Engineer, GetInData

  • Paweł Leszczyński, Software Engineer, GetInData

  • Jakub Moravec, Product Manager, IBM MANTA

  • Tomasz Nazarewicz, Lead Data Engineer, Xebia

  • Harel Shein, Engineering Manager, Datadog

Notes:

Meeting:

Slides

May 28, 2025 (9:30 am PT)

TSC:

  • Maciej Obuchowski, Software Engineer, GetInData

  • Sheeri Cabral, Product Leader, Capital One Software

  • Michael Collado, Software Engineer, Snowflake

  • Julien LeDem, Project Lead, Datadog

  • Harel Shein, Engineering Manager, Datadog

Notes:

Open Discussion

  • Polaris integration (alternative Iceberg Catalog), Snowflake, and OpenLineage - Michael Collado

Meeting:

Slides

April 16th, 2025 (9:30am PT)

Attendees:

TSC:

  • Maciej Obuchowski, Software Engineer, GetInData

  • Tomasz Nazarewicz, Software Engineer, GetInData

  • Sheeri Cabral, Product, Capital One Software

  • Michael Robinson, OpenLineage Community

  

And:

  • Luke Hoffman, Implementation Engineer, Atlan

  • Chandru Sugunan, Atlan

 

Notes:

Open Discussion

  • None

Meeting:

Slides

 

March 19th, 2025 (9:30am PT)

Attendees:

TSC:

- Harel Shein, Engineering Manager, Datadog

- Tomasz Nazarewicz, Software Engineer, GetInData

- Maciej Obuchowski, Software Engineer, GetInData

- Michael Robinson, OpenLineage Community

- Sheeri Cabral, Product, Capital One Software

- Julien LeDem, Datadog, OpenLineage Project Lead

  

And:

- Luke Hoffman, Software Engineer, Atlan

- Massy Bourennani, Software Engineer, Datadog

- Domnik Dębowczyk, Software Engineer, GetInData

- Dan Rolles, Founder/CEO, BearingNode

- Saurabh Vashist, Implementation Specialist, Atlan

Notes:

  • Announcements

  • Recent Releases 

  • Presentations

    • dbt structured logs, Massy Bourennani, Software Engineer, Datadog

      • manifest.json has dbt notes and relationships between nodes

      • run_results.json has status (success/fail), timing, nodes, compiled SQL, table name

      • Previously in Datadog, when pipeline is completed, run_results.json and manifest.json are used to generate lineage

      • Problems

        • This is not event-driven - you have to wait until the pipeline is completed

        • Only dbt model SQL queries are forwarded by OL event. dbt does much more than that!

      • Solution: use structured logs - structured dbt events. 

        • Written in real-time to stdout and log

        • sent during events - when a command and model/node starts and completes, when SQL query is executed.

        • Datadog continuously consumes these logs 

    • Apache Hive Integration - Tomasz Nazarewicz, Software Engineer, GetInData

      • OpenLineage Hive Integration is in progress and coming to the OpenLineage Hive Repository - PR #3555

      • What's included: ProcessingEngineRunFacet, HivePropertiesFacet, SchemaDatasetFacet, SymlinksDatasetFacet, ColumnLineageDatasetFacet with Transformation Types

      • Limits

        • Only QUERY and CREATETABLE_AS_SELECT are handled, simple inserts and table creations are not handled yet

        • Only POST_EXECT_HOOK and ON_FAILURE_HOOK - no START or RUNNING events, only COMPLETE and FAIL

        • Limits are not due to technical limitations, just time spent coding.

      • How to use it - hive library jar, set a property on hive cluster start

      • Demo

Open Discussion

 

Meeting:

Slides

February 19th, 2025 (9:30am PT)

Attendees:

TSC:

- Harel Shein, Engineering Manager, Datadog

- Tomasz Nazarewicz, Software Engineer, GetInData

- Paweł Leszczyński, Software Engineer, GetInData

- Maciej Obuchowski, Software Engineer, Datadog

- Michael Robinson, OpenLineage Community

- Julien LeDem, Datadog, OpenLineage Project Lead

  

And:

- Daniel Rolles, Founder/CEO, BearingNode

- Leo Godin, Data Engineer, NewRelic

- Luke Hoffman, Solutions Architect, Atlan

Notes:

  • Announcements

  • Recent Releases 

  • Presentations

    • Dataset partitioning and Subset proposal - Paweł Leszczyński, GetInData

      • How to know what datasets are partitioned, what partitions are used/changed by a job 

      • Solution: new dataset facet ("partitions")

      • A partition is a subset of data - how to describe in OpenLineage?

      • Solution - subsets can be defined manually, by a query, a list of physical files/directories, or a string description

    • Tags in integrations + Java - Maciej Obuchowski, Datadog

      • Need to handle key + value, and also the source of the tag.

      • Support in Java client added

      • Spark integration - no native tagging mechanism, can configure via spark.conf

      • Airflow - we capture dag labeling as AirflowRunFacet, but should (future) include it in core facets too

      • Future: dbt integration

    • Open discussion - scaling considerations for very large source files

  

Meeting:

Slides

January 15th, 2025 (9:30am PT)

Attendees:

TSC:

- Julien LeDem, Datadog, OpenLineage Project Lead

- Michael Robinson, OpenLineage Community

- Maciej Obuchowski, Software Engineer, GetInData

- Sheeri Cabral, Product Manager, Capital One Software

  

And:

- Dan Rolles, Founder/CEO, BearingNode

- Leo Godin, Data Engineer, NewRelic

Notes:

  • Recent Releases 

  • Presentations

    • Data and Information Observability - Dan Rolles

      • BCBS239 - Only 2 out of 31 banks fully comply with BCBS239 even though it's 10 years old. It's about Risk management.

      • Dan presents a Data & Information Observability Framework (slide screenshot forthcoming)

        • Tried not to duplicate capabilities - e.g. Risk Management and Compliance are covered by Data Governance

      • Discussion points - for a working group

        • Standardizing Financial Data Lineage Events

        • Unstructured Data and LLM Pipeline Observability

        • Value-Aligned Dataset Consumption Patterns

    • OpenLineage in Airflow 3

      • Airflow 3 is rewriting its architecture and eliminating direct connection between workers and the Airflow ?, will be using API now

      • In Airflow 2, users could manually mark tasks/DAG runs as successful or failure, but this was not emitted out with other OpenLineage information. This will be fixed in Airflow 3

      • Future features:

  • Open Discussion

    • Github releases are up-to-date but documentation release notes are not automatically updated.

    • Tagging - on a per-integration basis. Key/value pairs. Discussion of olin vs. ol. Leo will put a proposal in for dbt tags.

  

Meeting:

Slides

2024

December 18th, 2024 (9:30am PT)

Attendees:

TSC:

  • Harel Shein, Engineering Manager, Datadog

  • Michael Robinson, OpenLineage Community

  • Paweł Leszczyński, GetInData, Astronomer

  • Kacper Muda, Data Engineer, GetInData

  • Willy Lulciuc, co-creator and project lead, Marquez

  • Maciej Obuchowski, Software Engineer, GetInData

  • Jens Pfau, Engineering Manger, Google

And: