Rosetta Stone

🏄 Introducton

make your deep learning life easier

Rosetta Stone is a lightweight framework that aims to make your deep learning life easier. It enables users to performe end-to-end experiment quickly and efficiently. In comparison with the other open source libraries, Rosetta is an alternate low-code toolkit that can be used to perform deep learning tasks with only few lines of code. It easy to use and make you focus on designing your models!

🦆 Version 1.1.15 out now!

*Note: master branch is the developing branch.

Features

yaml-styled model for elegantly configuring complex applications
best practice
Unified design for various applications
Pre-trained models
State-of-the-art performance

🚀 Installation

Requirements

- Python >= 3.6
- Pytorch >= 1.4.0

Setup

Install the latest version from source

# clone the project repository, and install via pip
$ git clone https://git.huya.com/wangfeng2/rosetta_stone.git \
    && cd rosetta_stone \
    && pip install -e .

or released stable version via pip:

$ pip install --upgrade rosetta-stone

For ease-of-use, you can also use rosetta with Docker:

# build docker image
$ docker build --tag huya_ai:rosetta .

# run the docker container
$ docker run --rm -it -v $(PWD):/rosetta --name rosetta huya_ai:rosetta bash

📖 Usage

In rosetta you don’t need to specify a training loop, just define the dataLoaders and the models. For ResNet example,

Step 1: Create YAML Configuration

create a yaml file (usually named as app.yaml) within your repo as the example below.

```yaml
resnet56: &resnet56
  model_module: examples.vision.resnet_model:ResNet
  dataio_module: examples.vision.cifar10:CIFAR10

  batch_size: 256
  num_classes: 10

  n_size: 9
```

Step 2: Define Dataloader
Step 3: Define Model

Step 4: Start to train

training from scratch

$ rosetta train resnet56 --yaml-path app.yaml

overrides parameters defined in yaml file

# the cli paramer `--yaml-path` has default value `app.yaml`
$ rosetta train resnet56 --batch_size=125

training using automatic mixture precision (amp)

$ rosetta train resnet56 --yaml-path app.yaml --use-amp

distributed training using torch.distributed.launch (recommended)

$ python -m torch.distributed.launch --module --nproc_per_node={GPU_NUM} rosetta.main train resnet56

distributed training using horovod (not recommended)
```
$ rosetta train resnet56 --use-horovod
```

👋 Contribution Guide

You can contribute to this project by sending a merge request. After approval, the merge request will be merged by the reviewer.

Before making a contribution, please confirm that:

Code quality stays consistent across the script, module or package.
Code is covered by unit tests.
API is maintainable.

👍 References

flambe: An ML framework to accelerate research and its path to production.
Jacinle: It contains a range of utility functions for python development, including project configuration, file IO, image processing, inter-process communication, etc.
homura: PyTorch utilities including trainer, reporter, etc.
FARM: Fast & easy transfer learning for NLP. Harvesting language models for the industry.
kotonoha: NLP utilities for research
padertorch: A collection of common functionality to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.
Tips, tricks and gotchas in PyTorch
PyTorch Parallel Training: PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）
给训练踩踩油门 —— Pytorch 加速数据读取
高性能PyTorch是如何炼成的？
service-streamer: Boosting your Web Services of Deep Learning Applications.
Masked batchnorm in PyTorch

numb3r3 / rosetta_stone