Machine Learning

8-bit Numerical Formats for Deep Neural Networks

Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, Carlo Luschi

2022

16 references

Given the current trend of increasing size and complexity of machine learning architectures, it has become of critical importance to identify new approaches to improve the computational efficiency of model training. In this context, we address the ad...

View Paper PDF DOI

A BLOCK ORTHOGONALIZATION PROCEDURE WITH CONSTANT SYNCHRONIZATION REQUIREMENTS

Andreas Stathopoulos, Kesheng Wu

2002

5 references

We propose an alternative orthonormalization method that computes the orthonormal basis from the right singular vectors of a matrix. Its advantage are: a) all operations are matrix-matrix multiplications and thus cache-efficient, b) only one synchron...

View Paper PDF DOI

A Robust and Efficient Implementation of LOBPCG.

Jed A. Duersch, Meiyue Shao, Chao Yang, Ming Gu

2018

5 references

Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is widely\nused to compute eigenvalues of large sparse symmetric matrices. The algorithm\ncan suffer from numerical instability if it is not implemented with care. This\nis especially p...

View Paper PDF DOI

A simple method for generating gamma variables

George Marsaglia, Wai Wan Tsang

2000

4 references

We offer a procedure for generating a gamma variate as the cube of a suitably scaled normal variate. It is fast and simple, assuming one has a fast way to generate normal variables. In brief: generate a normal variate x and a uniform variate U until ...

View Paper PDF DOI

Categorical Reparameterization with Gumbel-Softmax

Eric Jang, Shixiang Gu, Ben Poole

2016

5 references

Categorical variables are a natural choice for representing discrete structure in the world. However, stochastic neural networks rarely use categorical latent variables due to the inability to backpropagate through samples. In this work, we present a...

View Paper PDF DOI

Efficient Memory Management for Deep Neural Net Inference

Yury Pisarchyk, Juhyun Lee

2020

2 references

While deep neural net inference was considered a task for servers only, latest advances in technology allow the task of inference to be moved to mobile and embedded devices, desired for various reasons ranging from latency to privacy. These devices a...

View Paper PDF DOI

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

Nathan Blow

2009

13 references

Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrate...

View Paper PDF DOI

FP8 Formats for Deep Learning

Paulius Micikevicius, Dušan Stošić, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwait...

2022

24 references

FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors. In this paper we propose an 8-bit floating point (FP8) binary interchange format consisting of two encodings - E4M3 ...

View Paper PDF DOI

Gaussian Error Linear Units (GELUs)

Dan Hendrycks, Kevin Gimpel

2016

13 references

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights ...

View Paper PDF DOI

Language Modeling with Gated Convolutional Networks

Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier

2016

2 references

The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked...

View Paper PDF DOI

Leveraging the bfloat16 Artificial Intelligence Datatype For Higher-Precision Computations

Greg Henry, Ping Tang, Alexander Heinecke

2019

4 references

In recent years fused-multiply-add (FMA) units with lower-precision multiplications and higher-precision accumulation have proven useful in machine learning/artificial intelligence applications, most notably in training deep neural networks due to th...

View Paper PDF DOI

On the Computation of Complex-valued Gradients with Application to Statistically Optimum Beamforming

Christoph Boeddeker, Patrick Hanebrink, Lukas Drude, Jahn Heymann, Reinhold Haeb-Umbach

2017

9 references

This report describes the computation of gradients by algorithmic differentiation for statistically optimum beamforming operations. Especially the derivation of complex-valued functions is a key component of this approach. Therefore the real-valued a...

View Paper PDF DOI

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, D...

2017

6 references

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried ou...

View Paper PDF DOI

Reliable and fast DWARF-based stack unwinding

Théophile Bastian, Stephen Kell, Francesco Zappa Nardelli

2019

2 references

Debug information, usually encoded in the DWARF format, is a hidden and obscure component of our computing infrastructure. Debug information is obviously used by debuggers, but it also plays a key role in program analysis tools, and, most surprisingl...

View Paper PDF DOI

Searching for Activation Functions

C. Ramachandran, K. Dhanalakshmi, L. Vanitha

2017

7 references

The choice of activation functions in deep networks has a significant effect on the training dynamics and task performance. Currently, the most successful and widely-used activation function is the Rectified Linear Unit (ReLU). Although various hand-...

View Paper PDF DOI

Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

Stefan Elfwing, Eiji Uchibe, Kenji Doya

2017

6 references

In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro's TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm ...

View Paper PDF DOI

Sparse matrix multiplication package (SMMP)

R. Bank, C. Douglas

1993

2 references

View Paper PDF DOI

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

Chris J. Maddison, Andriy Mnih, Yee Whye Teh

2016

9 references

The reparameterization trick enables optimizing large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with f...

View Paper PDF DOI

The differentiation of pseudo-inverses and non-linear least squares problems whose variables separate

Gene H. Golub, Víctor Pereyra

1972

2 references

For given data $(t_i ,y_i ),i = 1, \cdots ,m$, we consider the least squares fit of nonlinear models of the form \[ \eta ({\bf a},{\boldsymbol \alpha} ;t) = \sum _{j = 1}^n {a_j \varphi _j ({\boldsymbol \alpha} ;t),\qquad {\bf a} \in \mathcal{R}^n ,\...

View Paper DOI

Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method.

Andrew Knyazev

2001

3 references

We describe new algorithms of the locally optimal block preconditioned conjugate gradient (LOBPCG) method for symmetric eigenvalue problems, based on a local optimization of a three-term recurrence, and suggest several other new methods. To be able t...

View Paper DOI

Repositories

huggingface/transformers

microsoft/onnxruntime

mlflow/mlflow

pytorch/pytorch

ray-project/ray

scikit-learn/scikit-learn

tensorflow/tensorflow

Papers

8-bit Numerical Formats for Deep Neural Networks

A BLOCK ORTHOGONALIZATION PROCEDURE WITH CONSTANT SYNCHRONIZATION REQUIREMENTS

A Robust and Efficient Implementation of LOBPCG.

A simple method for generating gamma variables

Categorical Reparameterization with Gumbel-Softmax

Efficient Memory Management for Deep Neural Net Inference

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

FP8 Formats for Deep Learning

Gaussian Error Linear Units (GELUs)

Language Modeling with Gated Convolutional Networks

Leveraging the bfloat16 Artificial Intelligence Datatype For Higher-Precision Computations

On the Computation of Complex-valued Gradients with Application to Statistically Optimum Beamforming

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Reliable and fast DWARF-based stack unwinding

Searching for Activation Functions

Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

Sparse matrix multiplication package (SMMP)

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

The differentiation of pseudo-inverses and non-linear least squares problems whose variables separate

Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method.