philschmid / huggingface-sagemaker-workshop-series

Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error: We couldn't connect to 'https://huggingface.co/' to load this model and it looks like None is not the path to a directory conaining a config.json file. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'."

manas86 opened this issue · comments

Hi,

In my training workflow I'm following exact process with little modification such as adding the network configs and all.
So, if I add same training code in Sagemaker default notebook, pipeline runs fine without any issues.

However if I include the same code base as a proper ci/cd then training workflow complains We couldn't connect to ``` 'https://huggingface.co/' to load this model and it looks like None is not the path to a directory conaining a config.json file. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'." OSError: We couldn't connect to 'https://huggingface.co/' to load this model and it looks like None is not the path to a directory conaining a config.json file.
``` for train.py at line
`model = AutoModelForSequenceClassification.from_pretrained(args.model_name)
tokenizer = AutoTokenizer.from_pretrained(args.model_name)`

So my question is how to download model offline or incorporate in this situation.

container = sagemaker.image_uris.retrieve(framework="huggingface",
                                              region=boto3.Session().region_name,
                                              version=transformers_version,
                                              py_version=py_version,
                                              base_framework_version=f"pytorch{pytorch_version}",
                                              instance_type="ml.p3.2xlarge",
                                              image_scope="training",
                                              container_version="cu110-ubuntu18.04")

    print(f"Image container {container}")
    huggingface_estimator = HuggingFace(
        image_uri=container,
        entry_point="train.py",
        source_dir=BASE_DIR,
        base_job_name=base_job_prefix + "/training",
        instance_type=training_instance_type,
        instance_count=training_instance_count,
        role=role,
        transformers_version=transformers_version,
        pytorch_version=pytorch_version,
        py_version=py_version,
        hyperparameters={
            'epochs': epochs,
            'eval_batch_size': eval_batch_size,
            'train_batch_size': train_batch_size,
            'learning_rate': learning_rate,
            'model_id': model_id,
            'fp16': fp16
        },
        sagemaker_session=sagemaker_session,
        subnets=network_config.subnets,
        security_group_ids=network_config.security_group_ids,
        encrypt_inter_container_traffic=True,
        enable_network_isolation=False,
    )

    step_train = TrainingStep(
        name="TrainHuggingFaceModel",
        estimator=huggingface_estimator,
        inputs={
            "train": TrainingInput(
                s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                    "train"
                ].S3Output.S3Uri
            ),
            "test": TrainingInput(
                s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                    "test"
                ].S3Output.S3Uri
            ),
        },
        cache_config=cache_config,
    )

These are my versions

transformers_version = "4.17.0"
 pytorch_version = "1.10.2"
 py_version = "py38"
 model_id_ = "distilbert-base-uncased"
 dataset_name_ = "imdb"
 datasets[s3] = 1.18.4

So following the error message I did download prior the model in my local and refer to train.py as below

model = AutoModelForSequenceClassification.from_pretrained(os.path.join(BASE_DIR, "distilbert-base-uncased-model"))
  tokenizer = AutoTokenizer.from_pretrained(os.path.join(BASE_DIR, "distilbert-base-uncased-model"))

but still I'm getting the error message "OSError: We couldn't connect to 'https://huggingface.co/' to load this model and it looks like /opt/ml/code/distilbert-base-uncased-model is not the path to a directory conaining a config.json file."
Is there something I am missing? how can I copy the model offline to this path.

It's fixed ... I was giving the wrong model in my hyperparameter for train.py.