Tokenizer errors out when inferencing llama2

Question

Tokenizer errors out when inferencing llama2

navidsam opened this issue 2 months ago · comments

I was getting failed read error in this line in run.c when I ran ./run llama2_7b.bin (code snippet below if you don't wanna click the link):

    // in build_tokenizer
    int len;
    for (int i = 0; i < vocab_size; i++) {
        if (fread(t->vocab_scores + i, sizeof(float), 1, file) != 1) { fprintf(stderr, "failed read\n"); exit(EXIT_FAILURE);}

Note that I followed the README instructions line by line to get to that stage. The fix that I found is not ideal but seems to be working. That is to simply override the read vocab_size (which seems to be 32016 coming from llama2 configuration) with a value of 32000. This seems to be the size that the current tokenizer.bin in the repo has.

void build_tokenizer(Tokenizer* t, char* tokenizer_path, int vocab_size) {
    // i should have written the vocab_size into the tokenizer file... sigh
    vocab_size = 32000;  // my hacky fix
    t->vocab_size = vocab_size;

Would love to hear what others might be thinking about this or if they have ever run into this issue. I'd be surprised if no one ran into this before 😅 or else I'm doing something wrong.

James Delancey · Answer 1 · Sun May 05 2024 13:18:25 GMT+0800 (China Standard Time)

For me the tokenizer.bin worked fine with the llama2 7B base after exporting it into legacy format.