shoegazerstella / hf-experiments

Experiments with Hugging Face πŸ€—

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hf-experiments

Machine Learning Experiments with Hugging Face πŸ€—

What's inside

.
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ README.md
β”œβ”€β”€ build.sh
β”œβ”€β”€ install.log
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ run.sh
β”œβ”€β”€ src
β”‚   β”œβ”€β”€ emotions
β”‚   β”œβ”€β”€ sentiment
β”‚   └── summarization
└── wheels
└── models

How to build

To build experiments run

./build.sh

How to run

To run an experiment run

./run.sh [experiment_name] [cache_dir_folder]

where experiment_name is among the following supported experiment names:

Experiments

The following experiments are supported

  • emotions - emotions detection
  • sentiment - sentiment analysis
  • summarization - text summarization

and cache_dir_folder is the directorty where to cache models files. See later about this.

Dependencies

Dependencies are defined in the requirements.txt file and currently are

tensorflow==2.2.0
torch==1.5.0
transformers==3.0.2

These will install a number of dependant libraries that can be found in the install.log.

Wheels? What's that?

I'm using install from local wheels if avaiable. This will speed up build and tests, avoding to transfer several times data over the internet:

Collecting torch==1.5.0
  Downloading https://files.pythonhosted.org/packages/76/58/668ffb25215b3f8231a550a227be7f905f514859c70a65ca59d28f9b7f60/torch-1.5.0-cp37-cp37m-manylinux1_x86_64.whl (752.0MB)

I download once the big wheels for pytorch (752 MB) and tensorflow ((516.2 MB) in the wheels folder and check for them before building:

└── wheels
    β”œβ”€β”€ tensorflow-2.2.0-cp37-cp37m-manylinux2010_x86_64.whl
    └── torch-1.5.0-cp37-cp37m-manylinux1_x86_64.whl

Check the downloadable wheels from pypi here:

Models files

Where are models files saved? Models files are typically big. It's preferable to save them to a custom folder like an external HDD of a shared disk. For this reason a docker environment variable cache_dir can specified at run:

./run.sh emotions models/

the models folder will be assigned to the cache_dir variable to be used as default alternative location to download pretrained models. A os.getenv("cache_dir") will be used to retrieve the environemnt variable in the code.

About

Experiments with Hugging Face πŸ€—

License:MIT License


Languages

Language:Python 68.7%Language:Shell 21.8%Language:Dockerfile 9.5%