fighterhit / EET

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Easy and Efficient Transformer

EET

EET(Easy and Efficient Transformer) is an efficient Pytorch inference plugin focus on Transformer-based models with large model sizes and long sequences.

Features

1、Joint-Decoding.  Propose and implement a novel decoding mechanism, an optimization for incremental decoding, increasing the token-parallelism and improving GPU occupancy.
2、High performance. Design highly optimized CUDA kernels, referencing to NVIDIA Faster Transformer, supporting long sequences as well as large model sizes with advanced optimization.
3、Flexible.  Provide op-level and model-level APIs.
4、Easy to use. EET could be integrated into Fairseq and Transformes directly.
5、Smart deployment. Support dynamic batching and variable input length, Combined with python web, EET could be deployed smoothly.

EET has been applied to a variety of NetEase online services. In the future, EET will work on urtra-large-scale model inference of trillion parameters.

compare

A novel Joint-Decoding mechanism

bert

Three-level decoding:

  • Level-1. Joint prompts information insides a sequence, handled in parallel.

  • Level-2. Joint prompts information cross whole batch, along with left-padding to ensure the correctness of inferred results.

  • Level-3. Joint Full-decoding with Incremental-decoding towards the best performance.

Quick Start

Environment

  • cuda:>=10.1
  • python:>=3.7
  • gcc:>= 7.4.0
  • torch:>=1.5.0
  • numpy:>=1.19.1

Installation

From Source

If you are installing from source, you will need install the necessary environment.Then, proceed as follows:

$ git clone git@github.com:NetEase-FuXi/EET.git
$ pip install transformers==3.0.2
$ pip installl fairseq==0.10.0
$ pip install .

Due to the compilation of a large number of cuda kernels, the installation time is relatively long, please be patient.

From Docker

$ git clone git@github.com:NetEase-FuXi/EET.git
$ cd EET/docker
$ docker build -t your_docker_name:your_docker_version .
$ nvidia-docker run -it --net=host -v /your/project/directory/:/root/workspace  Your_Docker_Name:Your_docker_version bash

EET has been installed in the docker.

Run

run BERT in Transformers

$ cd EET/example  
$ python bert_transformers_example.py

run GPT2 in Transformers

$ cd EET/example    
$ python gpt2_transformers_example.py

run GPT2 in Fairseq

$ cd EET/example    
$ python gpt2_fairseq_example.py

Supported Models

We currenly support the GPT-2, Bert.

GPT2

gpt2

BERT

bert

Usage

EET provides python User-friendly APIs(python/eet), integrated into Fairseq and Transformers with just a few lines of code.

1、How to inference

useofbert

2、How to customize model
You can refer to Operators APIs listed below to build your own model structure, just by modifying the files under python/eet.

3、How to integrate EET into fairseq
Replace the original transformer.py in Fairseq with our transformer.py and reinstall the Fairseq, that is all ! Transformer.py in EET corresponds to the fusion of transformer.py and transformer_layer.py in fairseq.

4、How to integrate EET into Transformers Replace the original modeling_bert.py and modeling_gpt2.py in Transformers with our modeling_bert.py and modeling_gpt2.py and reinstall the Transformers, that is all ! modeling_bert.py in EET corresponds to modeling_bert.py in transformers;modeling_gpt2.py in EET corresponds to modelling_gpt2.py in transformers.

5、How to make a server
We choose service-streamer to make the model server, building the service based on your python project directly. Please make sure the dynamic-batch is open if you want a higher throughput.

APIs

  1. model APIs:We provide ready-made APIs for GPT2 and Bert models.

    EET and fairseq class comparison table

    EET fairseq Remarks
    EETTransformerDecoder TransformerDecoder
    EETTransformerDecoderLayer TransformerDecoderLayer
    EETTransformerAttention MultiheadAttention
    EETTransformerFeedforward TransformerDecoderLayer fusion of multiple small operators
    EETTransformerEmbedding Embedding + PositionalEmbedding
    EETTransformerLayerNorm nn.LayerNorm

    EET and transformers class comparison table

    EET transformers Remarks
    EETBertModel BertModel
    EETBertEncoder BertEncoder
    EETBertEncoderLayer BertLayer
    EETBertAttention BertAttention
    EETBertFeedforward BertIntermediate + BertOutput
    EETBertEmbedding BertEmbeddings
    EETGPT2Model GPT2Model
    EETGPT2Decoder GPT2Model transformers has no GPT2Decoder
    EETGPT2DecoderLayer Block
    EETGPT2Attention Attention
    EETGPT2Feedforward MLP
    EETGPT2Embedding nn.Embedding
    EETLayerNorm nn.LayerNorm
  2. operators APIs:We provide all the operators required for Transformer models. You can combine different kernels to build different model structures

    operators APIs Remarks
    masked_multi_head_attention GPT2 self_attention
    cross_multi_head_attention cross_attention
    multi_head_attention Bert self_attention
    ffn FeedForwardNetwork
    embedding transformers & fairseq
    layernorm nn.LayerNorm

Performance

We tested the performance of EET on two GPU hardware platforms. We chose pytorch, NVIDIA Faster Transformers, and lightseq implementations for comparison. For the results of the experiment, please click on the link below.

benchmark

TODO

  1. int8
  2. sparse

Contact us

You can post your problem with github issues.

About

License:Apache License 2.0


Languages

Language:Cuda 93.0%Language:C++ 4.5%Language:Python 2.5%Language:Dockerfile 0.0%