Deep Dive: Data Governance Conference

October 1 – 3, 2025

A 3-day Virtual Event on Data Governance, and Open Source AI.

Speakers

As part of our original Deep Dive:AI, we gathered a diverse collection of leaders to collaborate in drafting a definition for “Open Source AI”.

Speakers from law, academia, NGOs, enterprise, and the Open Source community shared their thoughts on pressing issues and offered potential solutions in our development and use of AI systems.

Schedule

The time on the schedule below is displayed EDT, UTC -4.

View schedule on mobile and in your timezone

Time (EDT, UTC -4)	Session	Speaker
October 1st	Stewards of the data commons
12:00 PM	Opening Keynote: Data is the key to Open Source AI	Stefano Maffulli
12:15 PM	A data pathway to building public AI	Alek Tarkowski
1:00 PM	Governments as data providers for AI	Neil Majithia
1:45 PM	Copycats and the Commons: Governing Open Data for Trustworthy AI	Natalia-Rozalia, Veronika Cheplygina, Amelia Jiménez Sánchez
2:30 PM	Sovereign by Design: A Blueprint for Federated, Consent-Based AI Systems	Sal Kimmich
3:15 PM	Wrap-Up + Live Q&A	Nick Vidal
October 2nd	Frameworks for data governance
12:00 PM	Keynote: Trends and Insights of China Open Source Ecosystem in AI Era	Nadia Jiang, Emily Chen
12:15 PM	New licensing initiatives for AI training data	Ramya Chandrasekhar
1:00 PM	How Data Provenance Powers Trustworthy AI	Lisa Bobbitt
1:45 PM	The CLeAR Documentation Framework for AI Transparency	Kasia Chmielinski, S. Newman, Chris N. Kranzinger
2:30 PM	Anticipatory Bias Governance in AIED: From Reactive Detection to Proactive Design	Chaeyeon Lim
3:15 PM	Wrap-Up + Live Q&A	Nick Vidal
October 3rd	Building and preserving public datasets
12:00 PM	Keynote: What should open source AI aspire to be?	Stefan Baack, Kasia Odrozek
12:15 PM	Building Public Data for LLMs	Stella Biderman
1:00 PM	A new paradigm for publishing library collections: Institutional Books 1.0, a 242B token dataset	Greg Leppert, Matteo Cargnelutti, Catherine Brobston
1:45 PM	Beyond Extraction: Building Community-Centered Speech Data	Jessica Rose
2:30 PM	Saving What’s Ours: The Data Rescue Project and the Fight for Public Data	Lynda Kellam, Mikala Narlock
3:15 PM	Live Q&A + Closing Remarks	Stefano Maffulli

Program Committee

Alek Tarkowski (Open Future), Anna Tumadóttir (Creative Commons), Carlo Piana (Open Source Initiative), Julie Hunter (Linagora), Masayuki Hatta (Surugadai University), Maximilian Gahntz (Mozilla Foundation), Nick Vidal (Open Source Initiative), Ramya Chandrasekhar (CNRS – Centre national de la recherche scientifique), Stefano Maffulli (Open Source Initiative), Shane Coughlan (OpenChain), and Malcolm Bain (Across Legal).

Read the 2025 white paper

Artificial intelligence (AI) is changing the world at a remarkable pace, with Open Source AI playing a pivotal role in shaping its trajectory. Yet, as AI advances, a fundamental challenge emerges: How do we create a data ecosystem that is not only robust but also equitable and sustainable?

Read the white paper

October 1 – 3, 2025

Speakers

Schedule

Program Committee

Sponsors

Read the 2025 white paper

About

Licenses

Open Source AI

Community