In this article, I review and read/summarise several papers around the development of foundations models for natural language.
ELMO —> BERT —> ELECTRA/XLNet/RoBERTa/ALBERT
xxx
T5 and GPT series.
xxx
XLM, T5 and MASS.
xxx
[ELMO] Deep Contextualized Word Representations, EMNLP 2018.
[BERT] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, ACL 2019, and first published at Oct. 11 2018.
[MASS] MASS: Masked Sequence to Sequence Pre-training for Language Generation, ICML 2019.
[XLNet] XLNet: Generalized Autoregressive Pretraining for Language Understanding, NIPS 2019.
[XLM] Cross-lingual Language Model Pretraining, NIPS 2019.
[ALBERT] ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, ICLR 2020.
[RoBERTa] RoBERTa: A Robustly Optimized BERT Pretraining Approach, ICLR 2020 rejected.
[ELECTRA] ELECTRA: Pre-training Text Encoders as Discriminators Rather than Generators, ICLR 2020.