common-voice / commonvoice-fr

Tooling for producing French dataset for Common Voice

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can't build Dockerfile.train

wasertech opened this issue · comments

So basically I tried building DeepSpeech/Dockerfile.train and here is what came up.

Step 72/88 : RUN python util/taskcluster.py     --target="$(pwd)"       --artifact="native_client.tar.xz" && ls -hal generate_scorer_package
 ---> Running in 9cd3a8426198
Downloading https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.deepspeech.native_client.v0.9.1.cpu/artifacts/public/native_client.tar.xz ...
Traceback (most recent call last):
  File "util/taskcluster.py", line 12, in <module>
    dsu_taskcluster.main()
  File "/home/trainer/ds/training/deepspeech_training/util/taskcluster.py", line 128, in main
    maybe_download_tc(target_dir=args.target, tc_url=get_tc_url(args.arch, args.artifact, args.branch))
  File "/home/trainer/ds/training/deepspeech_training/util/taskcluster.py", line 58, in maybe_download_tc
    _, headers = urllib.request.urlretrieve(tc_url, target_file, reporthook=(report_progress if progress else None))
  File "/usr/lib/python3.6/urllib/request.py", line 248, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
The command '/bin/sh -c python util/taskcluster.py      --target="$(pwd)"      --artifact="native_client.tar.xz" && ls -hal generate_scorer_package' returned a non-zero code: 1

After sending a GET request to https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.deepspeech.native_client/artifacts/public/native_client.tar.xz the server replies with 404: method: findArtifactFromTask errorCode: ResourceNotFound.

{
  "code": "ResourceNotFound",
  "message": "Indexed task not found\n\n---\n\n* method:     findArtifactFromTask\n* errorCode:  ResourceNotFound\n* statusCode: 404\n* time:       2021-09-24T17:57:53.960Z",
  "requestInfo": {
    "method": "findArtifactFromTask",
    "params": {
      "0": "public/native_client.tar.xz",
      "indexPath": "project.deepspeech.deepspeech.native_client",
      "name": "public/native_client.tar.xz"
    },
    "payload": {},
    "time": "2021-09-24T17:57:53.960Z"
  }
}

This API has a really bad record when it comes to keeping links alive. It is annoying to see we need it twice to build the docker image.

So basically I tried building DeepSpeech/Dockerfile.train and here is what came up.

Step 72/88 : RUN python util/taskcluster.py     --target="$(pwd)"       --artifact="native_client.tar.xz" && ls -hal generate_scorer_package
 ---> Running in 9cd3a8426198
Downloading https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.deepspeech.native_client.v0.9.1.cpu/artifacts/public/native_client.tar.xz ...
Traceback (most recent call last):
  File "util/taskcluster.py", line 12, in <module>
    dsu_taskcluster.main()
  File "/home/trainer/ds/training/deepspeech_training/util/taskcluster.py", line 128, in main
    maybe_download_tc(target_dir=args.target, tc_url=get_tc_url(args.arch, args.artifact, args.branch))
  File "/home/trainer/ds/training/deepspeech_training/util/taskcluster.py", line 58, in maybe_download_tc
    _, headers = urllib.request.urlretrieve(tc_url, target_file, reporthook=(report_progress if progress else None))
  File "/usr/lib/python3.6/urllib/request.py", line 248, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
The command '/bin/sh -c python util/taskcluster.py      --target="$(pwd)"      --artifact="native_client.tar.xz" && ls -hal generate_scorer_package' returned a non-zero code: 1

After sending a GET request to https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.deepspeech.native_client/artifacts/public/native_client.tar.xz the server replies with 404: method: findArtifactFromTask errorCode: ResourceNotFound.

{
  "code": "ResourceNotFound",
  "message": "Indexed task not found\n\n---\n\n* method:     findArtifactFromTask\n* errorCode:  ResourceNotFound\n* statusCode: 404\n* time:       2021-09-24T17:57:53.960Z",
  "requestInfo": {
    "method": "findArtifactFromTask",
    "params": {
      "0": "public/native_client.tar.xz",
      "indexPath": "project.deepspeech.deepspeech.native_client",
      "name": "public/native_client.tar.xz"
    },
    "payload": {},
    "time": "2021-09-24T17:57:53.960Z"
  }
}

This API has a really bad record when it comes to keeping links alive. It is annoying to see we need it twice to build the docker image.

Unfortunately, DeepSpeech work has been stopped by Mozilla and we moved its CI from TaskCluster to GitHub Actions, so the artifacts referenced in this Docker are dead by now.

Also, I have not been able to find the time to work again on the french model, so the current Dockerfile is actually indeed broken because of that.

You're welcome to send PR to fix it, I guess we should be able to just switch to the GitHub-hosted binaries that we uploaded on latest 0.9.3 release: https://github.com/mozilla/DeepSpeech/releases/tag/v0.9.3

You might also want to start migrating this to use Coqui's codebase. Since I'm not working on speech anymore, I barely have time to hack on that, but I'd welcome and review PR.

I see. Coqui it is then.
Thank you for your time.