Basque

BERnaT: Basque Encoders for Representing Natural Textual Diversity featured image

BERnaT: Basque Encoders for Representing Natural Textual Diversity

Language models depend on massive text corpora that are often filtered for quality, a process that can unintentionally exclude non-standard linguistic varieties, reduce model …

Ekhi Azurmendi
,
Joseba Fernandez de Landa
,
Jaione Bengoetxea
,
Maite Heredia
,
Julen Etxaniz
,
Mikel Zubillaga
,
Ander Soraluze
,
Aitor Soroa
Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque featured image

Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

Current Multimodal Large Language Models exhibit very strong performance for several demanding tasks. While commercial MLLMs deliver acceptable performance in low-resource …

Lukas Arana
,
Julen Etxaniz
,
Ander Salaberria
,
Gorka Azkune
Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque featured image

Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque

Instructing language models with user intent requires large instruction datasets, which are only available for a limited set of languages. In this paper, we explore alternatives to …

Oscar Sainz
,
Naiara Perez
,
Julen Etxaniz
,
Joseba Fernandez de Landa
,
Itziar Aldabe
,
Iker García-Ferrero
,
Aimar Zabala
,
Ekhi Azurmendi
,
German Rigau
,
Eneko Agirre
,
Mikel Artetxe
,
Aitor Soroa
Latxa Euskarazko Hizkuntza-Eredua featured image

Latxa Euskarazko Hizkuntza-Eredua

Artikulu honetan Latxa hizkuntza-ereduak (HE) aurkeztuko ditugu, egun euskararako garatu diren HE handienak. Latxa HEek 7.000 miloi parametrotik 70.000 milioira bitartean dituzte, …

Naiara Perez
,
Julen Etxaniz
,
Oscar Sainz
,
Itziar Aldabe
,
German Rigau
,
Eneko Agirre
,
Ahmed Salem
,
Aitor Ormazabal
,
Mikel Artetxe
,
Aitor Soroa
BertaQA: How Much Do Language Models Know About Local Culture? featured image

BertaQA: How Much Do Language Models Know About Local Culture?

Large Language Models (LLMs) exhibit extensive knowledge about the world, but most evaluations have been limited to global or anglocentric subjects. This raises the question of how …

Julen Etxaniz
,
Gorka Azkune
,
Aitor Soroa
,
Oier Lopez de Lacalle
,
Mikel Artetxe
IKER-GAITU: research on language technology for Basque and other low-resource languages featured image

IKER-GAITU: research on language technology for Basque and other low-resource languages

The general objective of the IKER-GAITU project is to research on language technology to increase the presence of Basque in the digital environment. It will be carried out between …

Eneko Agirre
,
Itziar Aldabe
,
Xabier Arregi
,
Mikel Artetxe
,
Unai Atutxa
,
Ekhi Azurmendi
,
Iker De la Iglesia
,
Julen Etxaniz
,
Victor García-Romillo
,
Inma Hernaez-Rioja
,
others
PDF
XNLIeu: a dataset for cross-lingual NLI in Basque featured image

XNLIeu: a dataset for cross-lingual NLI in Basque

XNLI is a popular Natural Language Inference (NLI) benchmark widely used to evaluate cross-lingual Natural Language Understanding (NLU) capabilities across languages. In this …

Maite Heredia
,
Julen Etxaniz
,
Muitze Zulaika
,
Xabier Saralegi
,
Jeremy Barnes
,
Aitor Soroa
Latxa: An Open Language Model and Evaluation Suite for Basque featured image

Latxa: An Open Language Model and Evaluation Suite for Basque

We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters. Latxa is based on Llama 2, which we continue pretraining on a new Basque …

Julen Etxaniz
,
Oscar Sainz
,
Naiara Perez
,
Itziar Aldabe
,
German Rigau
,
Eneko Agirre
,
Aitor Ormazabal
,
Mikel Artetxe
,
Aitor Soroa