galsuchetzky / MLFramework

Final project for the Tel Aviv University NLP course, 2020

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ML project template

Authors:

This project is a template project for a general deep learning project.
It provides a base architecture for the project as well as some useful functions and code,
such as:

  • Logging
  • Base environment
  • Base training loop structure
  • etc...

In each file, you will find documentation for the code and templates already existing in the file,
as well as comments, TODOs and recommendations for what logic might be implemented in that part.

Note that the provided code is pytorch based, but can be translated into tensorflow if necessary.

Files Description

environment.yml

This file is a yml file for setting up an environment using the anaconda package manager.
In it, you are already provided with a template for the file's structure, and some dependencies.
Change this file and add all your dependencies as you go along with your project to keep the setup of a new environment simple.

Creating the environment:

  1. Run the anaconda command line.
  2. run conda env create -f environment.yml

now you have a new conda environment with the name specified in the yml file.
In the case of the template environment, the name of the new environment is MLFramework.

Activating the environment: Each time you would like to work in the environment (via terminal or editors that do not remember the environment) you should run activate MLFramework to activate the environment.

todo: add anaconda installation details, basic commands, package installation details.

Defaults.py

This file contains the list of default values, as well as all the project constants.

Usage:

  • Add to that file all the default values for the command line arguments (see Args.py below) and any project constants.

Args.py

This file is defining all the command line arguments for the all scripts in the project. For example, the command line arguments that specify the sources for the dataset, the learning rate and the data paths.
The file contains basic arguments, and many more examples for possible arguments, and you may change these according to need. The argument parsing is based on the argparse library.

Usage:
In the file itself you will find examples for arguments as well as documentation and explanation on adding new arguments.

Config.py

This file contains the configuration classes for the project.
Each config class takes care of parsing the command-line arguments that are relevant to it. Currently, there are 3 config classes it this file: SetupConfig, TrainConfig, TestConfig.

Usage:

  • If necessary, add more config classes.
  • Edit the SetupConfig, TrainConfig and TestConfig classes to parse all the arguments relevant to this class.
  • Make sure to include processing for any additional arguments that you have added to the Args.py file.

Setup.py

This file is the setup script for the project.
Here, all the required resources for the project will be downloaded, including the dataset. Additionally, the preprocessing of the dataset will occur here as well.
Note that this script will only run once.

Usage:

  • Edit the Args.py to include arguments to receive the resources parameters that are needed.
  • Edit the download function to correctly download the resources (more details in the function).
  • Edit the pre_process function to perform preprocessing to your downloaded dataset.

Models.py

This file will contain the architecture of the models used in the project.
The template is for pytorch based models, for additional information about pytorch see https://pytorch.org/docs/stable/

Usage:

  • Implement your models in this file using the supplied pytorch template.

Layers.py

This file will contain all the custom layers of your models.
Use this file to maintain readability in the Models.py file.

Usage:

  • Implement your custom layers in this file using the supplied pytorch template.

Trainers.py

todo: add explanation and usage details

Train.py

todo: add explanation and usage details

Test.py

todo: add explanation and usage details

Utils.py

todo: add explanation and usage details

Framework usage

todo: Add explanation and recommendation for how one might go about implementing a project based of this template implementation - from which files to start, how to run training, how to test, how to setup the environment...

Setup

  1. Make sure you have Miniconda installed

    1. Conda is a package manager that sandboxes your project’s dependencies in a virtual environment
    2. Miniconda contains Conda and its dependencies with no extra packages by default (as opposed to Anaconda, which installs some extra packages)
  2. cd into src, run conda env create -f environment.yml

    1. This creates a Conda environment called MLFramework
  3. Run source activate MLFramework

    1. This activates the MLFramework environment
    2. Do this each time you want to write/test your code

About

Final project for the Tel Aviv University NLP course, 2020

License:MIT License


Languages

Language:Python 100.0%