How to do Inference on the trained weight of GPT 2 model after finishing the training on CPU using train_gpt2.py and train_gpt2 ?

Question

How to do Inference on the trained weight of GPT 2 model after finishing the training on CPU using train_gpt2.py and train_gpt2 ?

asifshaikat opened this issue 2 years ago · comments

Hi Thank you very much for making everything so understandable for even a noob like me. Sorry for such silly question though
I followed the instructions in the repository's README to train a language model on a Bengali text dataset with around 35,169 tokens, using your laptop's CPU (without a GPU). I modified the train_gpt2.py script to set my own starting words in Bengali instead of the default "<|endoftext|>".
Now, I want to know how to check if the trained model weights (the result of the training process) have improved the model's capabilities compared to before the training. I would like to compare the model's predictions with the actual text in the test dataset.
thank you for your time .

mylesgoose · Answer 1 · Wed Jul 31 2024 10:10:03 GMT+0800 (China Standard Time)

there is a file in the fodler /llm.c/dev/eval/export_hf.py which covnerts to safetensors. for example.
python /home/myles/llm.c/dev/eval/export_hf.py --input /home/myles/llm.c/log124M/model_00019560.bin --output converted_model

And then you can do something like this in python:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

def generate_text(prompt, max_length=1000):
tokenizer = AutoTokenizer.from_pretrained("/home/myles/llm.c/converted_model")
model = AutoModelForCausalLM.from_pretrained("/home/myles/llm.c/converted_model", torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32)
model.to('cuda' if torch.cuda.is_available() else 'cpu')

inputs = tokenizer(prompt, return_tensors="pt").to('cuda' if torch.cuda.is_available() else 'cpu')
attention_mask = inputs["attention_mask"]
outputs = model.generate(
    inputs["input_ids"], 
    attention_mask=attention_mask, 
    max_length=max_length, 
    do_sample=True, 
    top_p=0.95, 
    top_k=50, 
    pad_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)

if name == "main":
print("Chatbot: Hi! How can I help you today?")
while True:
user_input = input("You: ")
if user_input.lower() in ["exit", "quit", "stop"]:
print("Chatbot: Goodbye!")
break
prompt = f"User: {user_input}\nChatbot:"
response = generate_text(prompt)
print(f"Chatbot: {response}")