常见AI模型汇总

本篇收集了业界从2018-2023年常见的AI模型,列举了对应的发布日期,模型相关描述。同时也给出了模型对应的论文。可以根据需要解读对应的论文,了解模型背后的实现原理。

Mode Release Date Description
BERT 2018 Bidirectional Encoder Representations from Transformers
GPT 2018 Improving Language Understanding by Generative Pre-Training
RoBERTa 2019 A Robustly Optimized BERT Pretraining Approach
GPT-2 2019 Language Models are Unsupervised Multitask Learners
T5 2019 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
BART 2019 Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
ALBERT 2019 A Lite BERT for Self-supervised Learning of Language Representations
XLNet 2019 Generalized Autoregressive Pretraining for Language Understanding and Generation
CTRL 2019 CTRL: A Conditional Transformer Language Model for Controllable Generation
ERNIE 2019 ERNIE: Enhanced Representation through Knowledge Integration
GShard 2020 GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
GPT-3 2020 Language Models are Few-Shot Learners
LaMDA 2021 LaMDA: Language Models for Dialog Applications
PanGu-α 2021 PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
mT5 2021 mT5: A massively multilingual pre-trained text-to-text transformer
CPM-2 2021 CPM-2: Large-scale Cost-effective Pre-trained Language Models
T0 2021 Multitask Prompted Training Enables Zero-Shot Task Generalization
HyperCLOVA 2021 What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Codex 2021 Evaluating Large Language Models Trained on Code
ERNIE 3.0 2021 ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Jurassic-1 2021 Jurassic-1: Technical Details and Evaluation
FLAN 2021 Finetuned Language Models Are Zero-Shot Learners
MT-NLG 2021 Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Yuan 1.0 2021 Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
WebGPT 2021 WebGPT: Browser-assisted question-answering with human feedback
Gopher 2021 Scaling Language Models: Methods, Analysis & Insights from Training Gopher
ERNIE 3.0 Titan 2021 ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
GLaM 2021 GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
InstructGPT 2022 Training language models to follow instructions with human feedback
GPT-NeoX-20B 2022 GPT-NeoX-20B: An Open-Source Autoregressive Language Model
AlphaCode 2022 Competition-Level Code Generation with AlphaCode
CodeGen 2022 CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
Chinchilla 2022 Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data.
Tk-Instruct 2022 Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
UL2 2022 UL2: Unifying Language Learning Paradigms
PaLM 2022 PaLM: Scaling Language Modeling with Pathways
OPT 2022 OPT: Open Pre-trained Transformer Language Models
BLOOM 2022 BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
GLM-130B 2022 GLM-130B: An Open Bilingual Pre-trained Model
AlexaTM 2022 AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Flan-T5 2022 Scaling Instruction-Finetuned Language Models
Sparrow 2022 Improving alignment of dialogue agents via targeted human judgements
U-PaLM 2022 Transcending Scaling Laws with 0.1% Extra Compute
mT0 2022 Crosslingual Generalization through Multitask Finetuning
Galactica 2022 Galactica: A Large Language Model for Science
OPT-IML 2022 OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
LLaMA 2023 LLaMA: Open and Efficient Foundation Language Models
GPT-4 2023 GPT-4 Technical Report
PanGu-Σ 2023 PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
BloombergGPT 2023 BloombergGPT: A Large Language Model for Finance
PaLM 2 2023 A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM.

捐赠本站(Donate)

weixin_pay
如您感觉文章有用,可扫码捐赠本站!(If the article useful, you can scan the QR code to donate))