October 1 – 3, 2025

A 3-day Virtual Event on Data Governance, and Open Source AI.


Speakers

As part of our original Deep Dive:AI, we gathered a diverse collection of leaders to collaborate in drafting a definition for “Open Source AI”.

Speakers from law, academia, NGOs, enterprise, and the Open Source community shared their thoughts on pressing issues and offered potential solutions in our development and use of AI systems. 


Schedule

The time on the schedule below is displayed EDT, UTC -4.

View schedule on mobile and in your timezone

Time (EDT, UTC -4)SessionSpeaker
October 1stStewards of the data commons
12:00 PMOpening Keynote: Data is the key to Open Source AIStefano Maffulli
12:15 PMA data pathway to building public AIAlek Tarkowski
1:00 PMGovernments as data providers for AINeil Majithia
1:45 PMCopycats and the Commons: Governing Open Data for Trustworthy AINatalia-Rozalia, Veronika Cheplygina, Amelia Jiménez Sánchez
2:30 PMSovereign by Design: A Blueprint for Federated, Consent-Based AI SystemsSal Kimmich
3:15 PMWrap-Up + Live Q&ANick Vidal
October 2ndFrameworks for data governance
12:00 PMKeynote: Trends and Insights of China Open Source Ecosystem in AI EraNadia Jiang, Emily Chen
12:15 PMNew licensing initiatives for AI training dataRamya Chandrasekhar
1:00 PMHow Data Provenance Powers Trustworthy AILisa Bobbitt
1:45 PMThe CLeAR Documentation Framework for AI TransparencyKasia Chmielinski, S. Newman, Chris N. Kranzinger
2:30 PMAnticipatory Bias Governance in AIED: From Reactive Detection to Proactive DesignChaeyeon Lim
3:15 PMWrap-Up + Live Q&ANick Vidal
October 3rdBuilding and preserving public datasets
12:00 PMKeynote: What should open source AI aspire to be?Stefan Baack, Kasia Odrozek
12:15 PMBuilding Public Data for LLMsStella Biderman
1:00 PMA new paradigm for publishing library collections: Institutional Books 1.0, a 242B token datasetGreg Leppert, Matteo Cargnelutti, Catherine Brobston
1:45 PMBeyond Extraction: Building Community-Centered Speech DataJessica Rose
2:30 PMSaving What’s Ours: The Data Rescue Project and the Fight for Public DataLynda Kellam, Mikala Narlock
3:15 PMLive Q&A + Closing RemarksStefano Maffulli

Program Committee

Alek Tarkowski (Open Future), Anna Tumadóttir (Creative Commons), Carlo Piana (Open Source Initiative), Julie Hunter (Linagora), Masayuki Hatta (Surugadai University), Maximilian Gahntz (Mozilla Foundation), Nick Vidal (Open Source Initiative), Ramya Chandrasekhar (CNRS – Centre national de la recherche scientifique), Stefano Maffulli (Open Source Initiative), Shane Coughlan (OpenChain), and Malcolm Bain (Across Legal).


Sponsors

Alfred P. Sloan Foundation
1 - Automattic

Read the 2025 white paper

Artificial intelligence (AI) is changing the world at a remarkable pace, with Open Source AI playing a pivotal role in shaping its trajectory. Yet, as AI advances, a fundamental challenge emerges: How do we create a data ecosystem that is not only robust but also equitable and sustainable?