tensorflow / cloud

The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging and training your Keras and TensorFlow code in a local environment to distributed training in the cloud.

Home Page:https://github.com/tensorflow/cloud

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Running into "Internal error occurred for the current attempt" problem

deep-diver opened this issue · comments

I am using CloudTuner for TFX project, but I keep getting Internal error occurred for the current attempt error, and it doesn't show me what is the actual problem under the hood.

Below is the JSON passed to the CloudTuner, and this is my repository.

The imageUri, I passed the TFX docker image.

{
  "scaleTier": "CUSTOM",
  "masterType": "standard",
  "workerType": "standard",
  "workerCount": "2",
  "region": "us-central1",
  "masterConfig": {
    "imageUri": "gcr.io/gcp-ml-172005/img-classification",
    "containerCommand": [
      "python",
      "-m",
      "tfx.scripts.run_executor",
      "--executor_class_path",
      "tfx.extensions.google_cloud_ai_platform.tuner.executor._WorkerExecutor",
      "--inputs",
      "{\"examples\": [{\"artifact\": {\"id\": \"302652664909979029\", \"uri\": \"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/img-classification/874401645461/img-classification-20220725145617/Transform_-7372794461505454080/transformed_examples\", \"properties\": {\"split_names\": {\"string_value\": \"[\\\"train\\\", \\\"eval\\\"]\"}}, \"custom_properties\": {\"tfx_version\": {\"struct_value\": {\"__value__\": \"1.9.0\"}}}}, \"artifact_type\": {\"name\": \"Examples\", \"properties\": {\"span\": \"INT\", \"version\": \"INT\", \"split_names\": \"STRING\"}, \"base_type\": \"DATASET\"}, \"__artifact_class_module__\": \"tfx.types.standard_artifacts\", \"__artifact_class_name__\": \"Examples\"}], \"transform_graph\": [{\"artifact\": {\"id\": \"7122557137885461129\", \"uri\": \"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/img-classification/874401645461/img-classification-20220725145617/Transform_-7372794461505454080/transform_graph\", \"custom_properties\": {\"tfx_version\": {\"struct_value\": {\"__value__\": \"1.9.0\"}}}}, \"artifact_type\": {\"name\": \"TransformGraph\"}, \"__artifact_class_module__\": \"tfx.types.standard_artifacts\", \"__artifact_class_name__\": \"TransformGraph\"}]}",
      "--outputs",
      "{\"best_hyperparameters\": [{\"artifact\": {\"id\": \"6837211415839241726\", \"uri\": \"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/img-classification/874401645461/img-classification-20220725145617/Tuner_6462263593776709632/best_hyperparameters\"}, \"artifact_type\": {\"name\": \"HyperParameters\"}, \"__artifact_class_module__\": \"tfx.types.standard_artifacts\", \"__artifact_class_name__\": \"HyperParameters\"}]}",
      "--exec-properties",
      "{\"custom_config\": \"{\\\"ai_platform_tuning_args\\\": {\\\"masterConfig\\\": {\\\"imageUri\\\": \\\"gcr.io/gcp-ml-172005/img-classification\\\"}, \\\"project\\\": \\\"gcp-ml-172005\\\", \\\"region\\\": \\\"us-central1\\\", \\\"scaleTier\\\": \\\"STANDARD_1\\\"}, \\\"masterConfig\\\": {\\\"imageUri\\\": \\\"gcr.io/gcp-ml-172005/img-classification\\\"}, \\\"project\\\": \\\"gcp-ml-172005\\\", \\\"region\\\": \\\"us-central1\\\", \\\"remote_trials_working_dir\\\": \\\"gs://gcp-ml-172005-complete-mlops/tfx_pipeline_output/img-classification/trials\\\", \\\"scaleTier\\\": \\\"STANDARD_1\\\"}\", \"eval_args\": \"{\\n  \\\"num_steps\\\": 4\\n}\", \"train_args\": \"{\\n  \\\"num_steps\\\": 160\\n}\", \"tune_args\": \"{\\n  \\\"num_parallel_trials\\\": 3\\n}\", \"tuner_fn\": \"models.model.cloud_tuner_fn\"}"
    ]
  }
}

Hello,
Am Ravi,as part of a collage assignment am interested in solving this issue.For which I need your approval and guidance.Can you accept me as a contributer?