Gaussian Error Linear Units (GELUs)

Dan Hendrycks, Kevin Gimpel
2016
9 references

Abstract

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ($x\mathbf{1}_{x>0}$). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.

2 repositories
6 references

Code References

â–¶ pytorch/pytorch
2 files
â–¶ torch/nn/functional.py
2
L2019 See `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`_.
L2382 See `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`_
â–¶ torch/nn/modules/activation.py
1
L442 See `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`_
â–¶ tensorflow/tensorflow
3 files
â–¶ tensorflow/python/keras/activations.py
1
L342 - [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415)
â–¶ tensorflow/python/ops/nn_impl.py
1
L440 (GELUs)" [Hendrycks et al. 2016](https://arxiv.org/abs/1606.08415) and
â–¶ tensorflow/python/ops/nn_ops.py
1
L3738 [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415).
Link copied to clipboard!