Can't build Dockerfile.train
wasertech opened this issue · comments
So basically I tried building DeepSpeech/Dockerfile.train
and here is what came up.
Step 72/88 : RUN python util/taskcluster.py --target="$(pwd)" --artifact="native_client.tar.xz" && ls -hal generate_scorer_package
---> Running in 9cd3a8426198
Downloading https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.deepspeech.native_client.v0.9.1.cpu/artifacts/public/native_client.tar.xz ...
Traceback (most recent call last):
File "util/taskcluster.py", line 12, in <module>
dsu_taskcluster.main()
File "/home/trainer/ds/training/deepspeech_training/util/taskcluster.py", line 128, in main
maybe_download_tc(target_dir=args.target, tc_url=get_tc_url(args.arch, args.artifact, args.branch))
File "/home/trainer/ds/training/deepspeech_training/util/taskcluster.py", line 58, in maybe_download_tc
_, headers = urllib.request.urlretrieve(tc_url, target_file, reporthook=(report_progress if progress else None))
File "/usr/lib/python3.6/urllib/request.py", line 248, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
The command '/bin/sh -c python util/taskcluster.py --target="$(pwd)" --artifact="native_client.tar.xz" && ls -hal generate_scorer_package' returned a non-zero code: 1
After sending a GET
request to https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.deepspeech.native_client/artifacts/public/native_client.tar.xz the server replies with 404
: method: findArtifactFromTask
errorCode: ResourceNotFound
.
{
"code": "ResourceNotFound",
"message": "Indexed task not found\n\n---\n\n* method: findArtifactFromTask\n* errorCode: ResourceNotFound\n* statusCode: 404\n* time: 2021-09-24T17:57:53.960Z",
"requestInfo": {
"method": "findArtifactFromTask",
"params": {
"0": "public/native_client.tar.xz",
"indexPath": "project.deepspeech.deepspeech.native_client",
"name": "public/native_client.tar.xz"
},
"payload": {},
"time": "2021-09-24T17:57:53.960Z"
}
}
This API has a really bad record when it comes to keeping links alive. It is annoying to see we need it twice to build the docker image.
So basically I tried building
DeepSpeech/Dockerfile.train
and here is what came up.Step 72/88 : RUN python util/taskcluster.py --target="$(pwd)" --artifact="native_client.tar.xz" && ls -hal generate_scorer_package ---> Running in 9cd3a8426198 Downloading https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.deepspeech.native_client.v0.9.1.cpu/artifacts/public/native_client.tar.xz ... Traceback (most recent call last): File "util/taskcluster.py", line 12, in <module> dsu_taskcluster.main() File "/home/trainer/ds/training/deepspeech_training/util/taskcluster.py", line 128, in main maybe_download_tc(target_dir=args.target, tc_url=get_tc_url(args.arch, args.artifact, args.branch)) File "/home/trainer/ds/training/deepspeech_training/util/taskcluster.py", line 58, in maybe_download_tc _, headers = urllib.request.urlretrieve(tc_url, target_file, reporthook=(report_progress if progress else None)) File "/usr/lib/python3.6/urllib/request.py", line 248, in urlretrieve with contextlib.closing(urlopen(url, data)) as fp: File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.6/urllib/request.py", line 532, in open response = meth(req, response) File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.6/urllib/request.py", line 570, in error return self._call_chain(*args) File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain result = func(*args) File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: Not Found The command '/bin/sh -c python util/taskcluster.py --target="$(pwd)" --artifact="native_client.tar.xz" && ls -hal generate_scorer_package' returned a non-zero code: 1After sending a
GET
request to https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.deepspeech.native_client/artifacts/public/native_client.tar.xz the server replies with404
: method:findArtifactFromTask
errorCode:ResourceNotFound
.{ "code": "ResourceNotFound", "message": "Indexed task not found\n\n---\n\n* method: findArtifactFromTask\n* errorCode: ResourceNotFound\n* statusCode: 404\n* time: 2021-09-24T17:57:53.960Z", "requestInfo": { "method": "findArtifactFromTask", "params": { "0": "public/native_client.tar.xz", "indexPath": "project.deepspeech.deepspeech.native_client", "name": "public/native_client.tar.xz" }, "payload": {}, "time": "2021-09-24T17:57:53.960Z" } }This API has a really bad record when it comes to keeping links alive. It is annoying to see we need it twice to build the docker image.
Unfortunately, DeepSpeech work has been stopped by Mozilla and we moved its CI from TaskCluster to GitHub Actions, so the artifacts referenced in this Docker are dead by now.
Also, I have not been able to find the time to work again on the french model, so the current Dockerfile is actually indeed broken because of that.
You're welcome to send PR to fix it, I guess we should be able to just switch to the GitHub-hosted binaries that we uploaded on latest 0.9.3 release: https://github.com/mozilla/DeepSpeech/releases/tag/v0.9.3
You might also want to start migrating this to use Coqui's codebase. Since I'm not working on speech anymore, I barely have time to hack on that, but I'd welcome and review PR.
I see. Coqui it is then.
Thank you for your time.