BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data
We present BabyBabelLM, a multilingual collection of datasets modeling the language a person observes from birth until they acquire a native language. We curate developmentally …
EACL 2026
Jaap Jumelet
Abdellah Fourtassi
Akari Haga
Bastian Bunzeck
Bhargav Shandilya
Diana Galvan-Sosa
Faiz Ghifari Haznitrama
Francesca Padovani
Francois Meyer
Hai Hu
Julen Etxaniz
others
