HuggingFace Transformers inference for Alpaca
Guide#
Stanford Alpaca is an instruction-following language model, fine-tuned from the LLaMA-7B model.
The inference code is using Alpaca Native model, which was fine-tuned using the original tatsu-lab/stanford_alpaca repository. The fine-tuning process does not use LoRA, unlike tloen/alpaca-lora.
Hardware and software requirements#
For the Alpaca-7B:
Linux, MacOS
1x GPU 24GB in fp16 or 1x GPU 12GB in int8
PyTorch with CUDA (not the CPU version)
HuggingFace Transformers library
pip install git+https://github.com/huggingface/transformers.git
Currently, the Transformers library only has support for LLaMA through the latest GitHub repository, and not through Python package.
If run in 8-bit (quantized model), install Bitsandbytes and set
load_in_8bit=true
Inference Code#
# Based on: Original Alpaca Model/Dataset/Inference Code by Tatsu-lab
import time, torch
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig
tokenizer = LlamaTokenizer.from_pretrained("./checkpoint-1200/")
def generate_prompt(instruction, input=None):
if input:
return f"""The following is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:"""
else:
return f"""The following is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:"""
model = LlamaForCausalLM.from_pretrained(
"checkpoint-1200",
load_in_8bit=False,
torch_dtype=torch.float16,
device_map="auto"
)
while True:
text = generate_prompt(input("User: "))
time.sleep(1)
input_ids = tokenizer(text, return_tensors="pt").input_ids.to("cuda")
generated_ids = model.generate(
input_ids,
max_new_tokens=250,
do_sample=True,
repetition_penalty=1.0,
temperature=0.8,
top_p=0.75,
top_k=40
)
print(tokenizer.decode(generated_ids[0]))
How to use.
Download model weights from https://huggingface.co/chavinlo/alpaca-native
Change
./checkpoint-1200/
to the directory of your HuggingFace format model files directory.
FAQ#
What if I want to fine-tune Stanford Alpaca myself?
The Replicate team have repeated the training process and published a tutorial about how they did it. It cost less than $100.
Update: I’ve written a simpler tutorial, “Creating a chatbot using Alpaca native and LangChain”.
Original text: GitHub Gist