Gaussian Error Linear Units (GELUs)

Dan Hendrycks, Kevin Gimpel
2016
13 references

Abstract

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ($x\mathbf{1}_{x>0}$). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.

5 repositories
10 references

Code References

â–¶ huggingface/transformers
1 file
â–¶ src/transformers/activations.py
1
See Gaussian Error Linear Units (Hendrycks et al., https://arxiv.org/abs/1606.08415) where the SiLU (Sigmoid Linear
â–¶ openxla/xla
2 files
â–¶ xla/service/cpu/onednn_contraction_rewriter.cc
1
// (https://arxiv.org/abs/1606.08415), where:
â–¶ xla/service/gpu/transforms/gemm_rewriter.cc
1
// (https://arxiv.org/abs/1606.08415), where:
â–¶ pytorch/pytorch
2 files
â–¶ torch/nn/functional.py
2
See `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`_.
See `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`_
â–¶ torch/nn/modules/activation.py
1
See `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`_
â–¶ ray-project/ray
1 file
â–¶ rllib/models/torch/mingpt.py
1
https://arxiv.org/abs/1606.08415
â–¶ tensorflow/tensorflow
3 files
â–¶ tensorflow/python/keras/activations.py
1
- [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415)
â–¶ tensorflow/python/ops/nn_impl.py
1
(GELUs)" [Hendrycks et al. 2016](https://arxiv.org/abs/1606.08415) and
â–¶ tensorflow/python/ops/nn_ops.py
1
[Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415).
Link copied to clipboard!