The choice of activation function is essential for building state-of-the-art neural networks. At present, the most widely-used activation function with effectiveness is ReLU. However, ReLU suffers from the weakness including non-zero mean, negative missing, and unbounded output, thus it has potential disadvantages in the optimization process. In this paper, we propose a novel activation function, namely “Shape Autotuning Activation Function” (SAAF), to overcome these three challenges simultaneously. The SAAF inherits merits of smooth activation functions (such as Sigmoid and Tanh) and piecewise activation functions (such as ReLU and its variants), and avoids their deficiencies. Specifically, the SAAF adaptively adjusts a pair of independent trainable parameters to capture negative information and provide a near-zero mean output, resulting in better generalization performance and faster learning speed. At the same time, it provides bounded outputs to ensure a more stable distribution of output during network training. We evaluated SAAF on deep networks applied to a variety of tasks, including image classification, machine translation, and generative modeling. Comprehensive comparison study shows that the proposed SAAF is superior to state-of-the-art activation functions.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Artificial Intelligence
- Activation functions
- Deep learning