Abstract
The choice of activation function is essential for building state-of-the-art neural networks. At present, the most widely-used activation function with effectiveness is ReLU. However, ReLU suffers from the weakness including non-zero mean, negative missing, and unbounded output, thus it has potential disadvantages in the optimization process. In this paper, we propose a novel activation function, namely “Shape Autotuning Activation Function” (SAAF), to overcome these three challenges simultaneously. The SAAF inherits merits of smooth activation functions (such as Sigmoid and Tanh) and piecewise activation functions (such as ReLU and its variants), and avoids their deficiencies. Specifically, the SAAF adaptively adjusts a pair of independent trainable parameters to capture negative information and provide a near-zero mean output, resulting in better generalization performance and faster learning speed. At the same time, it provides bounded outputs to ensure a more stable distribution of output during network training. We evaluated SAAF on deep networks applied to a variety of tasks, including image classification, machine translation, and generative modeling. Comprehensive comparison study shows that the proposed SAAF is superior to state-of-the-art activation functions.
Original language | English (US) |
---|---|
Article number | 114534 |
Journal | Expert Systems with Applications |
Volume | 171 |
DOIs | |
State | Published - Jun 1 2021 |
All Science Journal Classification (ASJC) codes
- General Engineering
- Computer Science Applications
- Artificial Intelligence
Keywords
- Activation functions
- Deep learning