microsoft / nlp-recipes

Natural Language Processing Best Practices & Examples

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] make sure that transformers lib is downloading the pretrained model on a temporary file

miguelgfierro opened this issue · comments

Description

There are some models being downloaded to the root directory. This can be seen in the runs of the the test machine:

389M    9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
4.0K    9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba.json
386M    a803ce83ca27fecf74c355673c434e51c265fb8a3e0e57ac62a80e38ba98d384.681017f415dfb33ec8d0e04fe51a619f3f01532ecea04edbfd48c5d160550d9c
4.0K    a803ce83ca27fecf74c355673c434e51c265fb8a3e0e57ac62a80e38ba98d384.681017f415dfb33ec8d0e04fe51a619f3f01532ecea04edbfd48c5d160550d9c.json
$ head 26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084.json
{"url": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt", "etag": "\"64800d5d8528ce344256daf115d4965e\""}(base)

How do we replicate the bug?

In the test machine:
ls -lha /home/nlpadmin/myagent/_work/11/s

Expected behavior (i.e. solution)

Make sure that all the models are downloaded to TemporaryDirectory

Other Comments

related to #480

@miguelgfierro Can you verify this is fixed and close the issue if it is?

I've been looking at the logs and the file system, it not easy to find out because there are many files. I think the problem is solved, at least I haven't seen any new large file from the last days. I'll close this, in case we detect this problem again, we can review.