About

SynthData @ ICLR 2025

Welcome to the Synthetic Data × Data Access Problem workshop co-located with ICLR 2025!

Accessing large scale and high quality data has been shown to be one of the most important factors to the performance of machine learning models. Recent works show that large (language) models can greatly benefit from training with massive data from diverse (domain specific) sources and aligning with user intention. However, the use of certain data sources can trigger privacy, fairness, copyright, and safety concerns. The impressive performance of generative artificial intelligence popularized the usage of synthetic data, and many recent works suggest (guided) synthesization can be useful for both general purpose and domain specific applications.

Will synthetic data ultimately solve the data access problem for machine learning? This workshop seeks to address this question by highlighting the limitations and opportunities of synthetic data. It aims to bring together researchers working on algorithms and applications of synthetic data, general data access for machine learning, privacy-preserving methods such as federated learning and differential privacy, and large model training experts to discuss lessons learned and chart important future directions.


Topics of interest include, but are not limited to, the following:

  • Risks and limitations of synthetic data.
  • New algorithms for synthetic data generation.
  • New applications of using synthetic data (e.g. in healthcare, finance, gaming and simulation, education, scientific research, or autonomous systems).
  • Synthetic data for model training and evaluation.
  • Synthetic data for improving specific model capabilities (e.g., reasoning, math, coding).
  • Synthetic data to address privacy, fairness, safety and other data concerns.
  • Evaluation of synthetic data quality and models trained on synthetic data.
  • Conditional and unconditional synthetic data generation.
  • Fine-grained control of synthetic data generation.
  • Data access with federated learning and privacy-preserving methods.
  • New paradigm of accessing data for machine learning.
  • Mixing synthetic and natural data.

Calls

Call for Papers

Important Dates
  • Submission Due Date: February 5th, 2025 AoE
    • (extended by 12 hours to Feburary 6th, 4 pm pacific time due to several requests, please submit ASAP as we are not planning to extend it again)
  • Notification of Acceptance: March 5th, 2025, AoE
  • Free Registration Application Due: March 12th, 2025 AoE
  • Camera-ready papers due: April 12th, 2025
  • Workshop Dates: April 27th, 2025, Singapore
Submission Instructions

Submissions should be double-blind, no more than 6 pages long (excluding references), and following the ICLR'25 template. An optional appendix of any length can be put at the end of the draft (after references).

Submissions are processed in OpenReview.

Our workshop does not have formal proceedings, i.e., it is non-archival. Accepted papers and their review comments will be posted on OpenReview in public (after the end of the review process), while rejected and withdrawn papers and their reviews will remain private.

We welcome sumbissions from novel research, ongoing (incomplete) projects, draft currently under review at other venues, as well as recently published results. In addition, we have the following policies.

  • [Submission on previous conference and workshop papers] We request significant updates if the work has previously been presented at major machine learning conferences or workshops before, or has been presented at any conferences or workshops before February 1st 2025.
  • [Submission on previous journal papers] For published work in journals that have not been presented in conferences or workshops, we will let the authors decide how novel it is for the community. Though the machine learning community moves fast, the workshop is inclusive for subareas that may have taken a slower pace, and values submission stands for fundamental long-lasting research.
  • [Dual submission to other workshops at the same time, e.g., another ICLR workshop] We generally discourage dual submission to other workshops at the same time as it would be a waste of our program committees' efforts, and we request an in-person presentation by either talk or poster upon acceptance at our workshop. That being said, as our workshop is non-archival, we leave the final decision to the authors for dual submission.

Tiny Papers Submissions

[Remark] This year, ICLR is discontinuing the separate “Tiny Papers” track, and is instead requiring each workshop to accept short (3–5 pages in ICLR format, exact page length to be determined by each workshop) paper submissions, with an eye towards inclusion; see ​​https://iclr.cc/Conferences/2025/CallForTinyPapers for more details. Authors of these papers will be earmarked for potential funding from ICLR, but need to submit a separate application for Financial Assistance that evaluates their eligibility. This application for Financial Assistance to attend ICLR 2025 will become available on ​https://iclr.cc/Conferences/2025/ at the beginning of February and close on March 2nd.

We encourage submission of short papers relevant to the workshop topics. Following Tiny Papers Track in previous years' ICLR main conference, we encourage submissions from historically underrepresented group, and example topics such as

  • An implementation and experimentation of a novel (not published elsewhere) yet simple idea, or a modest and self-contained theoretical result
  • A follow-up experiment to or re-analysis of a previously published paper
  • A new perspective on a previously published paper

The tiny papers will be peer reviewed. Submissions should be double-blind, no more than 3 pages long (excluding references), and following the ICLR'25 template. Use the same sumbission portal in OpenReview. In addition,

  • Please clearly add a tag [Tiny] at the beginning of the submission title.
Presentation Instructions

All accepted papers are expected to be presented in person. While we aim to provide accessibility to virtual attendees of the workshop, we are not planning to provide support for virtual talks or posters.

Awards

Awards

Best Paper Awards

The organizing committee will select best paper award(s) supported by our sponsors.

Early Career Free Registration

The workshop can provide limited number of free (full ICLR'25 conference) registration to our attendees, which will prioritize early career students, and promote diversity, equity and inclusion (DEI). If you are interested, please email us at synth-workshop-iclr25@googlegroups.com following the instructions:

  • Email has to be sent before March 12th to be considered.
  • Email title starts with [Synth-ICLR25 free registration].
  • Includes link(s) to your accepted, or submitted paper(s) to our workshop.
  • Includes a short paragraph describing why it is important for your research and career.
  • (Optional) includes link(s) to your webpage and resume.
  • The awardees will be announced in March 22nd.
Best Reviewers Free Registration

The workshop encourages high quality reviews. We provide limited number of free (full ICLR'25 conference) registration for self-nominated reviewers who have written high-quality reviews. If you are interested, please email us at synth-workshop-iclr25@googlegroups.com following the instructions:

  • Email has to be sent before March 12th to be considered.
  • Email title starts with [Synth-ICLR25 free registration: reviewer].
  • Includes link(s), or screenshots to your reviews.
  • The awardees will be announced in March 22nd.

Program

Workshop Program


Local Time (UTC+8) Activity
08:55AM - 09:00AM Opening Remarks
09:00AM - 09:30AM Invited Talk #1
09:30AM - 10:00AM Spotlight Talks
10:00AM - 10:30AM Break
10:30AM - 11:00AM Invited Talk #2
11:00AM - 11:30AM Invited Talk #3
11:30AM - 12:30PM Poster
12:30PM - 13:30PM Lunch break
13:30PM - 14:30PM Panel discussion
14:30PM - 15:00PM Invited Talk #4
15:00PM - 15:30PM Spotlight Talks
15:30PM - 16:00PM Break
16:00PM - 16:30PM Invited Talk #5
16:30PM - 17:00PM Invited Talk #6
17:00PM - 17:05PM Concluding Remarks

Talks

Invited Speakers

Mary-Anne Hartley

EPFL & Harvard-Chan & CMU-Africa

Sanmi Koyejo

Stanford

Mihaela van der Schaar

University of Cambridge

Eric Xing

Mohamed bin Zayed University & Carnegie Mellon University

Panel Discussion

Panelists

TBD

Organization

Workshop Organizers

Herbie Bradley

UK AI Safety Institute

Rachel Cummings

Columbia University

Giulia Fanti

Carnegie Mellon University

Peter Kairouz

Google

Chulin Xie

University of Illinois Urbana-Champaign

Zheng Xu

Google

Review

Review

Review Guide

Please take a look at the ICLR'25 reviewer guide. This workshop accepts regular submissions of up to 6 pages and tiny papers of up to 3 pages, both are excluding appendixes. See CFP section for submission formatting.

  • Review period: February 7th, 2025 to February 26th, 2025 AoE
Program Commitee
  • TBD (TBD)

Sponsors

Sponsors

Google                  Gretel