Showing 20 of 63 papers

Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization

Shai Shalev-Shwartz, Tong Zhang
2013
1 reference

We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure. We analyze the runtime of the framework and obtain rates that improve state-of-the-art results for various key machine learning optimization...

A guide to convolution arithmetic for deep learning

Vincent Dumoulin, Francesco Visin
2016
5 references

We introduce a guide to help deep learning practitioners understand and manipulate convolutional neural network architectures. The guide clarifies the relationship between various properties (input shape, kernel shape, zero padding, strides and output shape) of convolutional, pooling and transposed ...

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Shai Shalev-Shwartz, Tong Zhang
2012
1 reference

Stochastic Gradient Descent (SGD) has become popular for solving large scale supervised machine learning optimization problems such as SVM, due to their strong theoretical guarantees. While the closely related Dual Coordinate Ascent (DCA) method has been implemented in various software packages, it ...

Neural Optimizer Search with Reinforcement Learning

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le
2017
2 references

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical update equation based on a list of primitiv...

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kais...
2017
15 references

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer...

Random Walk Initialization for Training Very Deep Feedforward Networks

David Sussillo, L. F. Abbott
2014
2 references

Training very deep networks is an important open problem in machine learning. One of many difficulties is that the norm of the back-propagated error gradient can grow or decay exponentially. Here we show that training very deep feed-forward networks (FFNs) is not as difficult as previously thought. ...

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Anand Ko...
2021
2 references

Large language models have led to state-of-the-art accuracies across a range of tasks. However, training these models efficiently is challenging for two reasons: a) GPU memory capacity is limited, making it impossible to fit large models on even a multi-GPU server, and b) the number of compute opera...

Recurrent Neural Network Regularization

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals
2014
1 reference

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs,...

QR and LQ Decomposition Matrix Backpropagation Algorithms for Square, Wide, and Deep -- Real or Complex -- Matrices and Their Software Implementation

Denisa A. O. Roberts, Lucas R. Roberts
2020
1 reference

This article presents matrix backpropagation algorithms for the QR decomposition of matrices $A_{m, n}$, that are either square (m = n), wide (m < n), or deep (m > n), with rank $k = min(m, n)$. Furthermore, we derive novel matrix backpropagation results for the pivoted (full-rank) QR decomposition ...

A contextual-bandit approach to personalized news article recommendation

Lihong Li, Wei Chu, J. Langford, R. Schapire
2010
1 reference

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamic...

TensorFlow

TensorFlow Developers
2021
1 reference

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

Implementing Neural Turing Machines

Mark Collier, Joeran Beel
2018
1 reference

Neural Turing Machines (NTMs) are an instance of Memory Augmented Neural Networks, a new class of recurrent neural networks which decouple computation from memory by introducing an external memory unit. NTMs have demonstrated superior performance over Long Short-Term Memory Cells in several sequence...

A Tutorial on Thompson Sampling

Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband
2017
1 reference

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance. The algorithm addresse...

Applications of finite automata representing large vocabularies

Cláudio L. Lucchesi, Tomasz Kowaltowski
1993
1 reference

<jats:title>Abstract</jats:title><jats:p>The construction of minimal acyclic deterministic partial finite automata to represent large natural language vocabularies is described. Applications of such automata include spelling checkers and advisers, multilanguage dictionaries, thesauri, minimal perfec...

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

Yarin Gal, Zoubin Ghahramani
2015
1 reference

Recurrent neural networks (RNNs) stand at the forefront of many recent developments in deep learning. Yet a major difficulty with these models is their tendency to overfit, with dropout shown to fail when applied to recurrent layers. Recent results at the intersection of Bayesian modelling and deep ...

Dynamic Control Flow in Large-Scale Machine Learning

Yuan Yu, MartĂ­n Abadi, Paul Barham, Eugene Brevdo, Mike Burrows, Andy Davis, Jeff Dean, Sanjay Ghema...
2018
1 reference

Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent conditional execution, and other features that call for dyna...

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou
2020
1 reference

Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot be deployed to resource-limited mobile devices. In this paper, we...

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
2018
3 references

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both l...

On Using Very Large Target Vocabulary for Neural Machine Translation

Sébastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio
2014
2 references

Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limita...