Machine Learning

Accuracy and stability of numerical algorithms, Second Edition.

Nicholas J. Higham

2002

1 reference

From the Publisher: What is the most accurate way to sum floating point numbers? What are the advantages of IEEE arithmetic? How accurate is Gaussian elimination and what were the key breakthroughs in the development of error analysis for the method?...

View Paper DOI

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

2011

4 references

We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning....

View Paper DOI

Ad click prediction: a view from the trenches.

H. Brendan McMahan, Gary D. Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, T...

2013

4 references

Predicting ad click-through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a d...

View Paper PDF DOI

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

Yarin Gal, Zoubin Ghahramani

2015

1 reference

Recurrent neural networks (RNNs) stand at the forefront of many recent developments in deep learning. Yet a major difficulty with these models is their tendency to overfit, with dropout shown to fail when applied to recurrent layers. Recent results a...

View Paper PDF DOI

Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond

Fei Wang, Daniel Zheng, James Decker, Xilun Wu, Grégory M. Essertel, Tiark Rompf

2019

1 reference

We propose a static loop vectorization optimization on top of high level dataflow IR used by frameworks like TensorFlow. A new statically vectorized parallel-for abstraction is provided on top of TensorFlow, and used for applications ranging from aut...

View Paper PDF DOI

Differentiation of the Cholesky decomposition

Iain Murray

2016

1 reference

We review strategies for differentiating matrix-based computations, and derive symbolic and algorithmic update rules for differentiating expressions containing the Cholesky decomposition. We recommend new `blocked' algorithms, based on differentiatin...

View Paper PDF DOI

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun

2015

1 reference

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a ...

View Paper PDF DOI

Incorporating Nesterov Momentum into

Timothy Dozat

2015

1 reference

Cognition and behavior exhibit biases consistent with future expectations, and some of these biases result in momentum-like effects and have been linked with the idea of momentum. These momentum-like effects include representational momentum, operati...

View Paper DOI

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.

Kyunghyun Cho, Bart van Merriënboer, Çaǧlar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwe...

2014

1 reference

Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014.

View Paper PDF DOI

Learning Precise Timing with LSTM Recurrent Networks.

Jürgen Schmidhuber, Felix A. Gers, Douglas Eck

2002

1 reference

In response to Rodriguez's recent article (2001), we compare the performance of simple recurrent nets and long short-term memory recurrent nets on context-free and context-sensitive languages.

View Paper DOI

Learning to forget: continual prediction with LSTM

Felix A. Gers, J. Schmidhuber, Fred Cummins

1999

1 reference

Long short-term memory (LSTM) can solve many tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams without explicitly marked sequence ends. ...

View Paper DOI

Long Short-Term Memory.

Sepp Hochreiter, J. Schmidhuber

1997

1 reference

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it b...

View Paper DOI

Neural Optimizer Search with Reinforcement Learning.

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le

2017

4 references

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathem...

View Paper PDF DOI

On the Convergence of Adam and Beyond.

Alessandro Venuta, Francesca Moret, Giovanni Dal Poggetto, Diletta Esposito, Aurore Fraix, Concetta ...

2018

1 reference

View Paper PDF DOI

On the importance of initialization and momentum in deep learning.

Ilya L. Shapiro, Guilherme de Berredo-Peixoto

2013

2 references

View Paper DOI

Optimization of Collective Communication Operations in MPICH.

Rajeev Thakur, Rolf Rabenseifner, William Gropp

2005

1 reference

We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of m...

View Paper PDF DOI

Recurrent Neural Network Regularization

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals

2014

2 references

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper...

View Paper PDF DOI

SGDR: Stochastic Gradient Descent with Warm Restarts.

Evans Lansing Smith

2017

10 references

In this paper, we describe a phenomenon, which we named "super-convergence",\nwhere neural networks can be trained an order of magnitude faster than with\nstandard training methods. The existence of super-convergence is relevant to\nunderstanding why...

View Paper PDF DOI

SSD: Single Shot MultiBox Detector

W. Liu, Dragomir Anguelov, D. Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, A. Berg

2015

2 references

We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map locati...

View Paper PDF DOI

Toeplitz And Circulant Matrices: A Review (Foundations and Trends(R) in Communications and Information Theory)

Anne Langley, Edward G. Gray, K.T. L. Vaughan

2006

1 reference

View Paper DOI

Repositories

huggingface/transformers

microsoft/onnxruntime

mlflow/mlflow

pytorch/pytorch

ray-project/ray

scikit-learn/scikit-learn

tensorflow/tensorflow

Papers

Accuracy and stability of numerical algorithms, Second Edition.

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Ad click prediction: a view from the trenches.

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond

Differentiation of the Cholesky decomposition

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Incorporating Nesterov Momentum into

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.

Learning Precise Timing with LSTM Recurrent Networks.

Learning to forget: continual prediction with LSTM

Long Short-Term Memory.

Neural Optimizer Search with Reinforcement Learning.

On the Convergence of Adam and Beyond.

On the importance of initialization and momentum in deep learning.

Optimization of Collective Communication Operations in MPICH.

Recurrent Neural Network Regularization

SGDR: Stochastic Gradient Descent with Warm Restarts.

SSD: Single Shot MultiBox Detector

Toeplitz And Circulant Matrices: A Review (Foundations and Trends(R) in Communications and Information Theory)