Reproducibility

Lessons from the Trenches on Reproducible Evaluation of Language Models featured image

Lessons from the Trenches on Reproducible Evaluation of Language Models

Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation …

Stella Biderman
,
Hailey Schoelkopf
,
Lintang Sutawika
,
Leo Gao
,
Jonathan Tow
,
Baber Abbasi
,
Alham Fikri Aji
,
Pawan Sasanka Ammanamanchi
,
Sidney Black
,
Jordan Clive
,
Anthony DiPofi
,
Julen Etxaniz
,
Benjamin Fattori
,
Jessica Zosa Forde
,
Charles Foster
,
Jeffrey Hsu
,
Mimansa Jaiswal
,
Wilson Y. Lee
,
Haonan Li
,
Charles Lovering
,
Niklas Muennighoff
,
Ellie Pavlick
,
Jason Phang
,
Aviya Skowron
,
Samson Tan
,
Xiangru Tang
,
Kevin A. Wang
,
Genta Indra Winata
,
François Yvon
,
Andy Zou