I’ve been hacking Meta’s LLaMA (paper) models before the leak in my GitHub repository.

I got access to the LLaMA model weights on Mar 2, 2023. I participate in the challenge to make LLaMA models run locally on your own GPU. I help answering questions around LLaMA.

I think I’m currently witnessing the first LLaMA community coming alive.

Coincidentally, today the Stable Diffusion moment is happening again right now, for large language models (LLMs) — the tech behind ChatGPT itself. I’m always curious what life is like when you can run a GPT-3 class model on your own laptop? Now you can, for the first time!

The race is on to release the first fully open language model that gives people ChatGPT-like capabilities on their own devices.

I’m joining the fun and I’ve figured out the hardware requirements for running/inferencing LLaMA models and documented them down:

I will show how you can get them running on 1x A100 40GB GPU. I created this notebook and you can follow along as I walkthrough the code.

I’m also building ChattyLLaMA, a LLaMA-based ChatGPT-like application.

I’m will continue porting LLaMA to support more things, not just hardware.

Not long after the leak, Shawn Presser, who is part of the EleutherAI gang (aka “the people trying to replicate GPT-3 via a public Discord server”) cooked up this project and shows us how he run LLaMA-65B using his fork of Meta’s LLaMA repository. This is getting wild.

The LLaMA community literally exploded at this point. We now have insanely good educational content such as guides and documentations for LLM hackers and engineers. One of them is the unofficial Meta’s LLaMA 4-bit chatbot guide. I also written a guide for installing 8-bit LLaMA with text-generation-webui.

There is so much to cover here but I will stop for now.

This is so exciting! I guess we’re living in interesting times. I will write more soon.

Thank you for reading.