在colab上运行finetune.ipynb的时候会报一个huggingface登录的错误，有人遇到同样的错误吗？

Question

在colab上运行finetune.ipynb的时候会报一个huggingface登录的错误，有人遇到同样的错误吗？

lee376 opened this issue 7 months ago · comments

Downloading and preparing dataset generator/default to /root/.cache/huggingface/datasets/generator/default-2eec05f7b1485a75/0.0.0...
Generating train split: 0 examples [00:00, ? examples/s]Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/model_path/chatglm/resolve/main/tokenizer_config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 430, in cached_file
resolved_file = hf_hub_download(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1368, in hf_hub_download
raise head_call_error
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1238, in hf_hub_download
metadata = get_hf_file_metadata(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata
r = _request_wrapper(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper
response = _request_wrapper(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 409, in _request_wrapper
hf_raise_for_status(response)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 323, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-65ca176f-5253ecd507530db441e7bd66;124fc4dc-0217-4158-9d74-b4e7e5f1053e)

Repository Not Found for url: https://huggingface.co/model_path/chatglm/resolve/main/tokenizer_config.json.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1608, in _prepare_split_single
for key, record in generator:
File "/usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/generator/generator.py", line 30, in _generate_examples
for idx, ex in enumerate(self.config.generator(**gen_kwargs)):
File "/content/ChatGLM-Tuning/tokenize_dataset_rows.py", line 43, in read_jsonl
tokenizer = transformers.AutoTokenizer.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 718, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 550, in get_tokenizer_config
resolved_config_file = cached_file(
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 451, in cached_file
raise EnvironmentError(
OSError: model_path/chatglm is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/content/ChatGLM-Tuning/tokenize_dataset_rows.py", line 75, in
main()
File "/content/ChatGLM-Tuning/tokenize_dataset_rows.py", line 68, in main
dataset = datasets.Dataset.from_generator(
File "/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py", line 1020, in from_generator
).read()
File "/usr/local/lib/python3.10/dist-packages/datasets/io/generator.py", line 47, in read
self.builder.download_and_prepare(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 872, in download_and_prepare
self._download_and_prepare(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1649, in _download_and_prepare
super()._download_and_prepare(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 967, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1488, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1644, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

lee376 · Answer 1 · Mon Feb 12 2024 21:06:49 GMT+0800 (China Standard Time)

报错的那段代码是：!python tokenize_dataset_rows.py
--jsonl_path data/alpaca_data.jsonl
--save_path data/alpaca
--max_seq_length 128