Monthly TSC meeting
The OpenLineage Technical Steering Committee meetings are Monthly on the third Wednesday from 9:30am to 10:30am US Pacific. Here's the meeting info.
All are welcome.
- 1.1 Next meeting: January 28th, 9:30 am PT
- 1.2 December 17, 2025 (9:30 am PT)
- 1.3 November 19, 2025 (9:30 am PT)
- 1.4 October 15, 2025 (9:30 am PT)
- 1.5 Sep 17, 2025 (9:30 am PT)
- 1.6 July 16, 2025 (9:30 am PT)
- 1.7 June 18, 2025 (9:30 am PT)
- 1.8 May 28, 2025 (9:30 am PT)
- 1.9 April 16th, 2025 (9:30am PT)
- 1.10 March 19th, 2025 (9:30am PT)
- 1.11 February 19th, 2025 (9:30am PT)
- 1.12 January 15th, 2025 (9:30am PT)
- 2 2024
- 2.1 December 18th, 2024 (9:30am PT)
- 2.2 November 20th, 2024 (9:30am PT)
- 2.3 October 16th, 2024 (9:30am PT)
- 2.4 September 18th, 2024 (9:30am PT)
- 2.5 August 14th, 2024 (9:30am PT)
- 2.6 July 10th, 2024 (9:30am PT)
- 2.7 June 12th, 2024 (9:30am PT)
- 2.8 May 8, 2024 (9:30am PT)
- 2.9 April 10, 2024 (9:30am PT)
- 2.10 March 13, 2024 (9:30am PT)
- 2.11 February 8, 2024 (10am PT)
- 2.12 January 11, 2024 (10am PT)
- 3 2023
- 3.1 December 14, 2023 (10am PT)
- 3.2 November 9, 2023 (10am PT)
- 3.3 October 12, 2023 (10am PT)
- 3.4 September 14, 2023 (10am PT)
- 3.5 August 10, 2023 (10am PT)
- 3.6 July 13, 2023 (8am PT)
- 3.7 June 8, 2023 (10am PT)
- 3.8 May 11, 2023 (10am PT)
- 3.9 April 20, 2023 (10am PT)
- 3.10 March 9, 2023 (10am PT)
- 3.11 February 9, 2023 (10am PT)
- 3.12 January 12, 2023 (10am PT)
- 4 2022
- 4.1 December 8, 2022 (10am PT)
- 4.2 November 10, 2022 (10am PT)
- 4.3 October 13, 2022 (10am PT)
- 4.4 September 8, 2022 (10am PT)
- 4.5 August 11, 2022 (10am PT)
- 4.6 July 14, 2022 (10am PT)
- 4.7 June 9th, 2022 (10am PT)
- 4.8 May 19th, 2022 (10am PT)
- 4.9 Apr 13th, 2022 (9am PT)
- 4.10 Mar 9th, 2022 (9am PT)
- 4.11 Feb 9th 2022 (9am PT)
- 4.12 Jan 12th 2022 (9am PT)
- 5 2021
- 5.1 Dec 8th 2021 (9am PT)
- 5.2 Nov 10th 2021 (9am PT)
- 5.3 Oct 13th 2021
- 5.4 Sept 8th 2021
- 5.5 Aug 11th 2021
- 5.6 July 14th 2021
- 5.7 June 9th 2021
Next meeting: January 28th, 9:30 am PT
December 17, 2025 (9:30 am PT)
November 19, 2025 (9:30 am PT)
TSC Members:
Tomasz Nazarewicz, Software Engineer, GetInData
Maciej Obuchowski, Software Engineer, GetInData, OpenLineage committer
Sheeri Cabral, Engineering Director, myKaarma
Harel Shein, Engineering Manager, Datadog
And:
Jakub Moravec, Product Manager, IBM MANTA
Daniel Rolles, CEO/Founder, BearingNode
Adam, Software Engineer, Moody’s
Notes:
Announcements
Recent Releases
25 commits by 12 contributors in this release, including:
Many new spec changes
Batch endpoint for OpenLineage events
New JobDependenciesRunFacet
Support for temporary datasets
Improvements to Spark and Hive integrations
Presentations
OpenLineage at IBM Manta - Jakub Moravec, IBM/Manta
dbt producer compatibility tests - Daniel Rolles, BearingNode
Open Discussion
Meeting:
October 15, 2025 (9:30 am PT)
And:
Notes:
Announcements
Recent Releases
Presentations
Open Discussion
Meeting:
Sep 17, 2025 (9:30 am PT)
TSC Members
Michael Robinson, OpenLineage Community
Minkyu Park, Senior Engineer, Oleander
Willy Lulciuc, co-creator of Open Lineage, project lead, Marquez, CEO Oleander
Sheeri Cabral, Engineering Director, myKaarma
Harel Shein, Engineering Manager, Datadog
And:
Jakub Moravec, Product Manager, IBM MANTA
Daniel Rolles, Founder/CEO, BearingNode
Luke Hoffman, Implementation Engineer, Atlan
Notes:
Announcements
OpenLineage@ Airflow Summit
October 7, 2025 - October 9, 2025 | Seattle, WA, USA - looking if there’s interest in a meetup
Simplifying Data Lineage: How OpenLineage Empowers Airflow and Beyond Julien and Harel, Wed 8 Oct 15:00-15:30
OpenLineage @ SRECon Dublin
Simplifying Data Lineage: How OpenLineage Empowers Airflow and Beyond - presented by Maciej Obuchowski, Thu 9 Oct 13:50-14:25
Recent Releases
73 commits by 8 contributors in this release, including:
Spark: more support for Spark 4.0, including delta 4.0
dbt: add query ID tracking
Python client: Formalize dataset naming conventions
2 first time contributors: Jake Roach (jroachgolf84), Shadi Abdelfatah (Shadi)
46 commits by 10 contributors in this release, including:
Spark: add support for WriteDelta and WritelcebergDelta
dbt: option to override dot job name
Python: optimize gzip compression params, add Datadog transport
Java: optimize JSON serialization performance
3 first time contributors: orthoxerox, kyungryun choi,
SalvadorRomo
Presentations
ML Support on OpenLineage - Willy Lulciuc
Open Discussion
Meeting:
July 16, 2025 (9:30 am PT)
TSC Members:
Maciej Obuchowski, Software Engineer, GetInData
Paweł Leszczyński, Software Engineer, GetInData
Tomasz Nazarewicz, Lead Data Engineer, Xebia
Sheeri Cabral, Engineering, myKaarma
Kacper Muda, Astronomer
Julien LeDem, Project Lead, Datadog
And:
Mario Fiore Vitale, IBM Debezium
Daniel Rolles, Founder/CEO, BearingNode
Luke Hoffman, Customer Success, Atlan
Notes:
Announcements
Data Observability and OpenLineage - Recording of Maciej and Harel’s Data Observability and OpenLineage talk at DASH June 10, 2025
Recent Releases
Presentations
Native Data Lineage in Debezium with OpenLineage - Mario Fiore Vitale, IBM
Farewell to Spark 2 - PR #3904 - Please give feedback soon, before this PR is merged!
Open Discussion
Post about vendor compliance with OpenLineage at BearingNode - Daniel Rolles
Please contact Daniel on Slack if there are changes/additions
Maciej linked to compatibility tests on github
Tomasz linked to
Compatibility Tests | OpenLineage approaching the issue from another side (not so verbose and automated)
Different dimensions of compliance and governance - Daniel Rolles
Working with Jakub at IBM Manta
Added Structured and Unstructured as dimensions, another “side of the cube” of the Connected Operating Model, in addition to the other 3 sides: People/Process/Technology/Data, BCBS 2329 compliance categories, and Value/Discover/Track/Comply/Govern
Meeting:
June 18, 2025 (9:30 am PT)
TSC:
Maciej Obuchowski, Software Engineer, GetInData
Paweł Leszczyński, Software Engineer, GetInData
Jakub Moravec, Product Manager, IBM MANTA
Tomasz Nazarewicz, Lead Data Engineer, Xebia
Harel Shein, Engineering Manager, Datadog
Notes:
Announcements
Simplifying Data Lineage: How OpenLineage Empowers Airflow and Beyond
at Airflow Summit October 7-9 2025, Seattle
will be given by Maciej Obuchowski, Datadog and Harel Shein, Datadog
Blog post:
Native data lineage in Debezium with OpenLineage
Debezium is a distributed platform for change data capture
Thanks to Mario Fiore Vitale
Available from 3.2.0.Beta2
Recent Releases
Presentations
Quality of Spark connector lineage - Pawel Leszczynski, GetInData
Open Discussion
Lineage for unstructured data - Daniel Rolles
ingesting unstructured data and vectorizing, or ingesting into GenAI, and lineage of that
Meeting:
May 28, 2025 (9:30 am PT)
TSC:
Maciej Obuchowski, Software Engineer, GetInData
Sheeri Cabral, Product Leader, Capital One Software
Michael Collado, Software Engineer, Snowflake
Julien LeDem, Project Lead, Datadog
Harel Shein, Engineering Manager, Datadog
Notes:
Announcements
Last week - Native Data Lineage Support in Apache Flink with OpenLineage at Confluent’s conference, Current, in London. By Pawel Leszczynski, GetInData
Coming up on June 10th in New York City - DASH, Datadog’s conference - Data Observability and OpenLineage - by Maciej Obuchowski and Harel Shein
Recent Releases
Special thanks to new contributor @shinabel
Presentations
Async Python HTTP Client, Maciej Obuchowski
Open Discussion
Polaris integration (alternative Iceberg Catalog), Snowflake, and OpenLineage - Michael Collado
Meeting:
April 16th, 2025 (9:30am PT)
Attendees:
TSC:
Maciej Obuchowski, Software Engineer, GetInData
Tomasz Nazarewicz, Software Engineer, GetInData
Sheeri Cabral, Product, Capital One Software
Michael Robinson, OpenLineage Community
And:
Luke Hoffman, Implementation Engineer, Atlan
Chandru Sugunan, Atlan
Notes:
Recent Releases
Special thanks to new contributors @luke-hoffman1
Presentations
OpenLineage Context, Maciej Obuchowski
Open Discussion
None
Meeting:
March 19th, 2025 (9:30am PT)
Attendees:
TSC:
- Harel Shein, Engineering Manager, Datadog
- Tomasz Nazarewicz, Software Engineer, GetInData
- Maciej Obuchowski, Software Engineer, GetInData
- Michael Robinson, OpenLineage Community
- Sheeri Cabral, Product, Capital One Software
- Julien LeDem, Datadog, OpenLineage Project Lead
And:
- Luke Hoffman, Software Engineer, Atlan
- Massy Bourennani, Software Engineer, Datadog
- Domnik Dębowczyk, Software Engineer, GetInData
- Dan Rolles, Founder/CEO, BearingNode
- Saurabh Vashist, Implementation Specialist, Atlan
Notes:
Announcements
OpenLineage talk March 20, 2025 by Rahul of Atlan at the move(data) conference
Recent Releases
Presentations
dbt structured logs, Massy Bourennani, Software Engineer, Datadog
manifest.json has dbt notes and relationships between nodes
run_results.json has status (success/fail), timing, nodes, compiled SQL, table name
Previously in Datadog, when pipeline is completed, run_results.json and manifest.json are used to generate lineage
Problems
This is not event-driven - you have to wait until the pipeline is completed
Only dbt model SQL queries are forwarded by OL event. dbt does much more than that!
Solution: use structured logs - structured dbt events.
Written in real-time to stdout and log
sent during events - when a command and model/node starts and completes, when SQL query is executed.
Datadog continuously consumes these logs
Apache Hive Integration - Tomasz Nazarewicz, Software Engineer, GetInData
OpenLineage Hive Integration is in progress and coming to the OpenLineage Hive Repository - PR #3555
What's included: ProcessingEngineRunFacet, HivePropertiesFacet, SchemaDatasetFacet, SymlinksDatasetFacet, ColumnLineageDatasetFacet with Transformation Types
Limits
Only QUERY and CREATETABLE_AS_SELECT are handled, simple inserts and table creations are not handled yet
Only POST_EXECT_HOOK and ON_FAILURE_HOOK - no START or RUNNING events, only COMPLETE and FAIL
Limits are not due to technical limitations, just time spent coding.
How to use it - hive library jar, set a property on hive cluster start
Demo
Open Discussion
Bearing Node has a post about the OpenLineage compliance status of third-party vendors. Feedback welcome!
Meeting:
February 19th, 2025 (9:30am PT)
Attendees:
TSC:
- Harel Shein, Engineering Manager, Datadog
- Tomasz Nazarewicz, Software Engineer, GetInData
- Paweł Leszczyński, Software Engineer, GetInData
- Maciej Obuchowski, Software Engineer, Datadog
- Michael Robinson, OpenLineage Community
- Julien LeDem, Datadog, OpenLineage Project Lead
And:
- Daniel Rolles, Founder/CEO, BearingNode
- Leo Godin, Data Engineer, NewRelic
- Luke Hoffman, Solutions Architect, Atlan
Notes:
Announcements
Recent Releases
Special thanks to new contributors @aritrabandyo and @whitleykeith
Presentations
Dataset partitioning and Subset proposal - Paweł Leszczyński, GetInData
How to know what datasets are partitioned, what partitions are used/changed by a job
Solution: new dataset facet ("partitions")
A partition is a subset of data - how to describe in OpenLineage?
Solution - subsets can be defined manually, by a query, a list of physical files/directories, or a string description
Tags in integrations + Java - Maciej Obuchowski, Datadog
Need to handle key + value, and also the source of the tag.
Support in Java client added
Spark integration - no native tagging mechanism, can configure via spark.conf
Airflow - we capture dag labeling as AirflowRunFacet, but should (future) include it in core facets too
Future: dbt integration
Open discussion - scaling considerations for very large source files
Meeting:
January 15th, 2025 (9:30am PT)
Attendees:
TSC:
- Julien LeDem, Datadog, OpenLineage Project Lead
- Michael Robinson, OpenLineage Community
- Maciej Obuchowski, Software Engineer, GetInData
- Sheeri Cabral, Product Manager, Capital One Software
And:
- Dan Rolles, Founder/CEO, BearingNode
- Leo Godin, Data Engineer, NewRelic
Notes:
Recent Releases
Presentations
Data and Information Observability - Dan Rolles
BCBS239 - Only 2 out of 31 banks fully comply with BCBS239 even though it's 10 years old. It's about Risk management.
Dan presents a Data & Information Observability Framework (slide screenshot forthcoming)
Tried not to duplicate capabilities - e.g. Risk Management and Compliance are covered by Data Governance
Discussion points - for a working group
Standardizing Financial Data Lineage Events
Unstructured Data and LLM Pipeline Observability
Value-Aligned Dataset Consumption Patterns
OpenLineage in Airflow 3
Airflow 3 is rewriting its architecture and eliminating direct connection between workers and the Airflow ?, will be using API now
In Airflow 2, users could manually mark tasks/DAG runs as successful or failure, but this was not emitted out with other OpenLineage information. This will be fixed in Airflow 3
Future features:
Using the new Task SDK, a future version of Airflow can have an asynchronous, serialized version of the OpenLineage listener.
Native support for partitioning https://github.com/OpenLineage/OpenLineage/pull/3392
Event-driven Airflow (AIP-82)
Open Discussion
Github releases are up-to-date but documentation release notes are not automatically updated.
Tagging - on a per-integration basis. Key/value pairs. Discussion of olin vs. ol. Leo will put a proposal in for dbt tags.
Meeting:
2024
December 18th, 2024 (9:30am PT)
Attendees:
TSC:
Harel Shein, Engineering Manager, Datadog
Michael Robinson, OpenLineage Community
Paweł Leszczyński, GetInData, Astronomer
Kacper Muda, Data Engineer, GetInData
Willy Lulciuc, co-creator and project lead, Marquez
Maciej Obuchowski, Software Engineer, GetInData
Jens Pfau, Engineering Manger, Google
And: