kyegomez / Andromeda

An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast

Home Page:https://discord.gg/qUtxnK2NMf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build Dataset Script Fails

evannorstrand-mp opened this issue · comments

python3 Andromeda/build_dataset.py --seed 42 --seq_len 8192 --hf_account "" --tokenizer "EleutherAI/gpt-neox-20b" --dataset_name "EleutherAI/the_pile_deduplicated"

Traceback (most recent call last):
File "/home/ubuntu/Andromeda/Andromeda/build_dataset.py", line 70, in
built_dataset(args)
File "/home/ubuntu/Andromeda/Andromeda/build_dataset.py", line 17, in built_dataset
tokenizer = AutoTokenizer.from_pretrained(CFG.Tokenizer)
AttributeError: type object 'CFG' has no attribute 'Tokenizer'

CFG.Tokenizer should be CFG.TOKENIZER