moyix / fauxpilot

FauxPilot - an open-source GitHub Copilot server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build fails: is private

MarcSeebold opened this issue · comments

It seems that repository required authentication. This lets the build fail for me.

$ ~/fauxpilot (main) $ ./ 
Models available:
[1] codegen-350M-mono (2GB total VRAM required; Python-only)
[2] codegen-350M-multi (2GB total VRAM required; multi-language)
[3] codegen-2B-mono (7GB total VRAM required; Python-only)
[4] codegen-2B-multi (7GB total VRAM required; multi-language)
[5] codegen-6B-mono (13GB total VRAM required; Python-only)
[6] codegen-6B-multi (13GB total VRAM required; multi-language)
[7] codegen-16B-mono (32GB total VRAM required; Python-only)
[8] codegen-16B-multi (32GB total VRAM required; multi-language)
Enter your choice [6]: 7
Enter number of GPUs [1]: 1
Where do you want to save the model [/home/user/fauxpilot/models]? 
Downloading and converting the model, this will take a while...
Unable to find image 'moyix/model_conveter:latest' locally
latest: Pulling from moyix/model_conveter
[many "Pull complete"s]
Digest: sha256:744858f56b5eef785fde79b0f3bc76887fe34f14d0f8c01b06bf92ccd551b3ac
Status: Downloaded newer image for moyix/model_conveter:latest
Converting model codegen-16B-mono with 1 GPUs
Downloading config.json:   0%|          | 0.00/994 [00:00<?, ?B/s]Loading CodeGen model
Downloading config.json: 100%|██████████| 994/994 [00:00<00:00, 1.59MB/s]
Downloading pytorch_model.bin: 100%|██████████| 30.0G/30.0G [06:19<00:00, 84.9MB/s] line 9:     8 Killed                  python3 --code_model Salesforce/${MODEL} ${MODEL}-hf

=============== Argument ===============
saved_dir: /models/codegen-16B-mono-1gpu/fastertransformer/1
in_file: codegen-16B-mono-hf
trained_gpu_num: 1
infer_gpu_num: 1
processes: 4
weight_data_type: fp32
Traceback (most recent call last):
  File "/transformers/src/transformers/", line 619, in _get_config_dict
    resolved_config_file = cached_path(
  File "/transformers/src/transformers/utils/", line 285, in cached_path
    output_path = get_from_cache(
  File "/transformers/src/transformers/utils/", line 503, in get_from_cache
  File "/transformers/src/transformers/utils/", line 418, in _raise_for_status
    raise RepositoryNotFoundError(
transformers.utils.hub.RepositoryNotFoundError: 401 Client Error: Repository not found for url: If the repo is private, make sure you are authenticated.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "", line 188, in <module>
  File "", line 86, in split_and_convert
    model = GPTJForCausalLM.from_pretrained(args.in_file)
  File "/transformers/src/transformers/", line 1844, in from_pretrained
    config, model_kwargs = cls.config_class.from_pretrained(
  File "/transformers/src/transformers/", line 530, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/transformers/src/transformers/", line 557, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/transformers/src/transformers/", line 631, in _get_config_dict
    raise EnvironmentError(
OSError: codegen-16B-mono-hf is not a local folder and is not a valid model identifier listed on ''
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
Done! Now run ./ to start the FauxPilot server.

Update: It works with "codegen-350M-mono"

Yep, this line is the root cause: line 9:     8 Killed                  python3 --code_model Salesforce/${MODEL} ${MODEL}-hf

While converting the model, it loads the original model and the new model into RAM at once, so for the 16B versions that uses... a lot of RAM 😬

Doing the conversion piecemeal (e.g. layer by layer) is possible but I'd need to look a bit more deeply into the exact format of the pytorch_model.bin format.

Another option is that I could try to host preconverted versions somewhere; they're pretty big though:

1.4G    codegen-350M-mono-1gpu
1.4G    codegen-350M-mono-2gpu
1.4G    codegen-350M-multi-1gpu
1.4G    codegen-350M-multi-2gpu
11G     codegen-2B-mono-1gpu
11G     codegen-2B-mono-2gpu
11G     codegen-2B-multi-1gpu
11G     codegen-2B-multi-2gpu
27G     codegen-6B-mono-1gpu
27G     codegen-6B-mono-2gpu
27G     codegen-6B-multi-1gpu
27G     codegen-6B-multi-2gpu
60G     codegen-16B-mono-1gpu
60G     codegen-16B-mono-2gpu
60G     codegen-16B-multi-1gpu
60G     codegen-16B-multi-2gpu

I'll look into seeing what the options for doing that are.