ValueError: BuilderConfig 'pile_freelaw' not found., issue on running PILE eval
Harryalways317 opened this issue · comments
Harish Vadaparty commented
Execution Command
!lm_eval --model hf \
--model_args pretrained=hvadaparty/Featherlite-2.5-Mistral-7B \
--tasks self_consistency,realtoxicityprompts,toxigen,pile \
--device cuda:0 \
--batch_size auto --device cuda
Error Message
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
2024-04-16:19:37:29,582 INFO [__main__.py:251] Verbosity set to INFO
2024-04-16:19:37:33,734 INFO [__main__.py:335] Selected Tasks: ['pile', 'realtoxicityprompts', 'self_consistency', 'toxigen']
2024-04-16:19:37:33,734 INFO [__main__.py:336] Loading selected tasks...
2024-04-16:19:37:33,734 INFO [evaluator.py:131] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2024-04-16:19:37:33,736 INFO [huggingface.py:162] Using device 'cuda:0'
Loading checkpoint shards: 100%|██████████████████| 3/3 [00:01<00:00, 1.57it/s]
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/usr/local/lib/python3.10/dist-packages/datasets/load.py:1429: FutureWarning: The repository for EleutherAI/pile contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/EleutherAI/pile
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
warnings.warn(
Downloading builder script: 100%|██████████| 9.53k/9.53k [00:00<00:00, 55.6MB/s]
Downloading readme: 100%|██████████████████| 14.2k/14.2k [00:00<00:00, 47.1MB/s]
Traceback (most recent call last):
File "/usr/local/bin/lm_eval", line 8, in <module>
sys.exit(cli_evaluate())
File "/usr/local/lib/python3.10/dist-packages/lm_eval/__main__.py", line 342, in cli_evaluate
results = evaluator.simple_evaluate(
File "/usr/local/lib/python3.10/dist-packages/lm_eval/utils.py", line 288, in _wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lm_eval/evaluator.py", line 192, in simple_evaluate
task_dict = get_task_dict(tasks, task_manager)
File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 420, in get_task_dict
task_name_from_string_dict = task_manager.load_task_or_group(
File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 270, in load_task_or_group
collections.ChainMap(*map(self._load_individual_task_or_group, task_list))
File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 253, in _load_individual_task_or_group
**dict(collections.ChainMap(*map(fn, subtask_list))),
File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 161, in _load_individual_task_or_group
return load_task(task_config, task=name_or_config, group=parent_name)
File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 150, in load_task
task_object = ConfigurableTask(config=config)
File "/usr/local/lib/python3.10/dist-packages/lm_eval/api/task.py", line 782, in __init__
self.download(self.config.dataset_kwargs)
File "/usr/local/lib/python3.10/dist-packages/lm_eval/api/task.py", line 871, in download
self.dataset = datasets.load_dataset(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2523, in load_dataset
builder_instance = load_dataset_builder(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2232, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 371, in __init__
self.config, self.config_id = self._create_builder_config(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 592, in _create_builder_config
raise ValueError(
ValueError: BuilderConfig 'pile_freelaw' not found. Available: ['all', 'enron_emails', 'europarl', 'free_law', 'hacker_news', 'nih_exporter', 'pubmed', 'pubmed_central', 'ubuntu_irc', 'uspto', 'github']