Skip to content

Latest commit

 

History

History

madlad_400

Folders and files

Name Name
Last commit message
Last commit date

parent directory

..
 
 
 
 

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

This repository contains the checkpoints and vocabularies from MADLAD-400: A Multilingual And Document-Level Large Audited Dataset.

Checkpoints

Model Checkpoint
8B parameter LM link
3B parameter MT model link
7.2B parameter MT model link
7.2B parameter MT model (finetuned on backtranslated data) link
10.7B parameter MT model link

Vocabulary

The vocabularies used to train the models listed above are here.

Example usage

We provide a simple colab example showcasing how to use the released checkpoints for translation.

Contact

Please reach out to {snehakudugunta, icaswell}꩜google.com for any questions or observed issues. Issues will be listed on this page to aid future users. For questions about the canaries, reach out to cchoquette@google.com.