swalpa / LiSHT

This is a Keras implementation of the paper "LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks" - https://arxiv.org/abs/1901.05894

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LiSHT

This repository contains a Keras implementation of the paper "LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks" - (link to the arXiv page)

Roy, Kumar S., Manna, S., Dubey, Ram S., and Chaudhuri, Baran B., 2019. LiSHT: Non-Parametric Linearly Scaled 
Hyperbolic Tangent Activation Function for Neural Networks. arXiv preprint arXiv:1901.05894.

Description

The activation function in neural network is one of the important aspects which facilitates the deep training by introducing the non-linearity into the learning process. However, because of zero-hard rectification, some the of existing activations function such as ReLU and Swish miss to utilize the negative input values and may suffer from the dying gradient problem. Thus, it is important to look for a better activation function which is free from such problems. As a remedy, this paper proposes a new non-parametric function, called Linearly Scaled Hyperbolic Tangent (LiSHT) for Neural Networks (NNs). The proposed LiSHT activation function is an attempt to scale the non-linear Hyperbolic Tangent (Tanh) function by a linear function and tackle the dying gradient problem.

What is a LiSHT?

Most neural networks work by interleaving linear projections and simple (fixed) activation functions, like the ReLU function:

A LiSHT is instead a parametric activation function defined as non-parametric approximator:

Pre-Activation ResNet

Experimental Data

The effectiveness of the proposed LiSHT activation function is evaluated on six benchmark datasets:

Results

The classification performance of MLP for different activations over Cars Evaluation, Iris and MNIST datasets.

Dataset Activation
Function
Training Validation
Loss Accuracy Loss Accuracy
Car Eval. Tanh 0.0341 98.84 0.0989 96.40
Sigmoid 0.0253 98.77 0.1110 96.24
ReLU 0.0285 99.10 0.0769 97.40
Swish 0.0270 99.13 0.0790 97.11
LiSHT 0.0250 99.28 0.0663 97.98
Iris Tanh 0.0937 97.46 0.0898 96.26
Sigmoid 0.0951 97.83 0.0913 96.23
ReLU 0.0983 98.33 0.0886 96.41
Swish 0.0953 98.50 0.0994 96.34
LiSHT 0.0926 98.67 0.0862 97.33
Tanh 1.1534 58.86 1.3759 51.74
Sigmoid 1.1319 59.51 1.3693 52.12
ReLU 1.1776 57.49 1.3731 51.85
Swish 1.1468 58.65 1.3705 51.83
LiSTh 1.1216 59.13 1.3661 52.16
MNIST Tanh 0.0138 99.56 0.0987 98.26
Sigmoid 0.0064 99.60 0.0928 98.43
ReLU 0.0192 99.51 0.1040 98.48
Swish 0.0159 99.58 0.1048 98.45
LiSHT 0.0127 99.68 0.0915 98.60

The classification performance of ResNet for different activations over MNIST and CIFAR-10/100 datasets.

Dataset ResNet
Depth
Activation Functions
Tanh ReLU Swish LiSHT
MNIST 20 99.48 99.56 99.53 99.59
CIFAR-10 164 89.74 91.15 91.60 92.92
CIFAR-100 164 68.80 72.84 74.45 75.32

The classification performance of LSTM for different activations over twitter140 dataset.

Dataset Activation Functions
Tanh ReLU Swish LiSHT
Twitter140 82.27 82.47 82.22 82.47

Citation

If you use this code or a derivative thereof in your research, we would appreciate a citation to the original paper:

@article{roy2019lisht,
        title={LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks},
        author={Roy, Swalpa Kumar and Manna, Suvojit and Dubey, Shiv Ram and Chaudhuri, Bidyut B},
        journal={arXiv preprint arXiv:1901.05894},
        year={2019}
    }

License

The code is released under the MIT License. See the attached LICENSE file.

About

This is a Keras implementation of the paper "LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks" - https://arxiv.org/abs/1901.05894

License:MIT License