Llama 3.1 405B, 70B, 8B is officially out. Llama 3.1 405B is the first openly available model that matches or beats the best closed models across many benchmarks.

Model evaluations#

The performance of 405B model is very similar to Claude 3.5 Sonnet. It beats GPT4 on every single benchmark but one.

70B model has an even more impressive performance. It is significantly better than GPT-3.5 Turbo and beats Nemotron 4 340B on many tests.

Figure 1: Flagship models benchmark
Figure 1: Smaller models benchmark

Try 405B at meta.ai, on WhatsApp or on HuggingChat.

Notable improvements:

  • 128k context length.
  • Multilingual abilities.
  • Function calling and tool use.
  • Open/free weights and code, with a license that enables fine-tuning, distillation into other models, and deployment anywhere πŸ”₯
  • 8B and 70B code generation performance improved up to 12%.
  • FP8 quantized version available for efficient inference. (Hugging Face provides GPTQ and AWQ quants.)
  • Llama Stack API for easy integration.

Important facts:

  • Pre-training cut-off date of Dec 2023.
  • 405B trained on 15.6T tokens and fine-tuned on 25M human and synthetic examples.
  • Leveraged the 405B model to improve the post-training quality of 70B and 8B models.
  • TikToken-based tokenizer.

Llama 3.1 collection of large language models (LLMs) will make history with the largest and most capable open model ever released. Thank you for making AI and LLM more accessible.

Blog post | Llama website

Download weights on llama.meta.com and Hugging Face

Cloud providers playgrounds:

Paper: The Llama 3 Herd of Models (It’s so cool to see an exhaustive and extensive technical report.)

Model card

GitHub repo

Development: All details about Llama 3.1 such as VRAM requirements on Hugging Face blog. Learn how to quantize, fine-tune, distil, run inference and more in this blog post. (Overwhelm? If you can only read one thing, let it be this.)


Original text (draft): GitHub Gist