95 papers
86 files
239 references

Papers Referenced in This Repository

Implicit Reparameterization Gradients.

Mikhail Figurnov, Shakir Mohamed, Andriy Mnih
2018
9 references

By providing a simple and efficient way of computing low-variance gradients of continuous random variables, the reparameterization trick has become the technique of choice for training a variety of latent variable models. However, it is not applicable to a number of important continuous distribution...

Show 9 references in code

Deconvolutional networks

Matthew D. Zeiler, Dilip Krishnan, Graham W. Taylor, Rob Fergus
2010
7 references

Building robust low and mid-level image representations, beyond edge primitives, is a long-standing goal in vision. Many existing feature detectors spatially pool edge information which destroys cues such as edge intersections, parallelism and symmetry. We present a learning framework where features...

Show 7 references in code

Self-Normalizing Neural Networks.

Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter
2017
10 references

The Internet of Things (IoT) gains momentum. Developments regarding smart grids, intelligent transportation systems, and low-power networks for smart cities constitute significant drivers in the evolution of network industries. IoT creates an array of new requirements for information and communicati...

Show 6 references in code

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks.

Andrew M. Saxe, James L. McClelland, Surya Ganguli
2014
6 references

We investigate the use of large state inventories and the softplus nonlinearity for on-device neural network based mobile speech recognition. Large state inventories are achieved by less aggressive context-dependent state tying, and made possible by using a bottleneck layer to contain the number of ...

Show 6 references in code

Understanding the difficulty of training deep feedforward neural networks.

Xavier Glorot, Yoshua Bengio
2010
6 references

Cellular Neural Networks (CNN) [1] main assets are quoted to be their capacity for parallel hardware implementation and their universality. On top, the possibility to add the information of a local sensor on every cell, provides a unique system for massive parallel signal processing responding in ha...

Show 6 references in code

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.

Kaiming He, X. Zhang, Shaoqing Ren, Jian Sun
2015
9 references

Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU ...

Show 6 references in code

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy
2015
30 references

Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriousl...

Show 6 references in code

A guide to convolution arithmetic for deep learning

Tobias Würfl, Florin C. Ghesu, Vincent Christlein, Andreas Maier
2016
146 citations
5 references

We introduce a guide to help deep learning practitioners understand and manipulate convolutional neural network architectures. The guide clarifies the relationship between various properties (input shape, kernel shape, zero padding, strides and output shape) of convolutional, pooling and transposed ...

Show 5 references in code

Soft-NMS -- Improving Object Detection With One Line of Code

Navaneeth Bodla, Bharat Singh, R. Chellappa, L. Davis
2017
5 references

Non-maximum suppression is an integral part of the object detection pipeline. First, it sorts all detection boxes on the basis of their scores. The detection box M with the maximum score is selected and all other detection boxes with a significant overlap (using a pre-defined threshold) with M are s...

Show 5 references in code

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks.

Alex Graves, Santiago Fernández, Faustino Gomez, Jürgen Schmidhuber
2006
5 references

Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would see...

Show 5 references in code

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks.

Lechao Xiao, Yasaman Bahri, Jascha Sohl-Dickstein, Samuel S. Schoenholz, Jeffrey Pennington
2018
5 references

In recent years, electroencephalogram (EEG) e-motion recognition has been becoming an emerging field in artificial intelligence area, which can reflect the relation between emotional states and brain activity. In this paper, we designed a novel architecture, i.e., broad dynamical graph learning syst...

Show 5 references in code

An Empirical Exploration of Recurrent Network Architectures.

Rafal Józefowicz, Wojciech Zaremba, Ilya Sutskever
2015
4 references

This document examines the OData protocol as a new service oriented approach for distributed IT architectures. The main features of OData were compared with properties of well-established solutions like: REST, DCOM and Java RMI. OData's protocol is presented in the context of its application in Serv...

Show 4 references in code

ADADELTA: An Adaptive Learning Rate Method

Matthew D. Zeiler
2012
6 references

We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning...

Show 4 references in code

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

2011
4 references

We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning....

Show 4 references in code

Ad click prediction: a view from the trenches.

H. Brendan McMahan, Gary D. Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, T...
2013
4 references

Predicting ad click-through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a deployed CTR prediction system. These include impro...

Show 4 references in code

Neural Optimizer Search with Reinforcement Learning.

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le
2017
4 references

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical update equation based on a list of primitiv...

Show 4 references in code

Training Deep Networks with Structured Layers by Matrix Backpropagation

Catalin Ionescu, O. Vantzos, C. Sminchisescu
2015
4 references

Deep neural network architectures have recently produced excellent results in a variety of areas in artificial intelligence and visual recognition, well surpassing traditional shallow architectures trained using hand-designed features. The power of deep networks stems both from their ability to perf...

Show 4 references in code

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Shai Shalev-Shwartz, Tong Zhang
2012
9 citations
3 references

Stochastic Gradient Descent (SGD) has become popular for solving large scale supervised machine learning optimization problems such as SVM, due to their strong theoretical guarantees. While the closely related Dual Coordinate Ascent (DCA) method has been implemented in various software packages, it ...

Show 3 references in code

Adding vs. Averaging in Distributed Primal-Dual Optimization

Chenxin Ma, Virginia Smith, Martin Jaggi, Michael I. Jordan, Peter Richtárik, Martin Takáč
2015
16 citations
3 references

Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of the recent ...

Show 3 references in code

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s

Felix Chern, Blake Hechtman, Andy Davis, Ruiqi Guo, David Majnemer, Sanjiv Kumar
2022
5 references

This paper presents a novel nearest neighbor search algorithm achieving TPU (Google Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms with similar level of recall. The design of the proposed algorithm is motivated by an accurate accelerator performance model tha...

Show 3 references in code

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
2018
3 references

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both l...

Show 3 references in code

A Short Note about Kinetics-600

Joao Carreira, Eric Noland, Andras Banki-Horvath, Chloe Hillier, Andrew Zisserman
2018
3 references

We describe an extension of the DeepMind Kinetics human action dataset from 400 classes, each with at least 400 video clips, to 600 classes, each with at least 600 video clips. In order to scale up the dataset we changed the data collection process so it uses multiple queries per class, with some of...

Show 3 references in code

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Dmitriy Anisimov, Tatiana Khanova
2017
3 references

We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that effici...

Show 3 references in code

Gaussian Error Linear Units (GELUs)

Dan Hendrycks, Kevin Gimpel
2016
13 references

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by...

Show 3 references in code

Efficient Object Localization Using Convolutional Networks

Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, Christoph Bregler
2014
11 references

Recent state-of-the-art performance on human-body pose estimation has been achieved with Deep Convolutional Networks (ConvNets). Traditional ConvNet architectures include pooling and sub-sampling layers which reduce computational requirements, introduce invariance and prevent over-training. These be...

Show 3 references in code

On Using Very Large Target Vocabulary for Neural Machine Translation

Sébastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio
2014
3 references

Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limita...

Show 3 references in code

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter
2015
4 references

We introduce the "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies. Like rectified linear units (ReLUs), leaky ReLUs (LReLUs) and parametrized ReLUs (PReLUs), ELUs alleviate the vanishing gradient problem via the identity f...

Show 2 references in code

Stochastic Dual Coordinate Ascent with Adaptive Probabilities

Dominik Csiba, Zheng Qu, Peter Richtárik
2015
55 citations
2 references

This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems. Our modification consists in allowing the method adaptively change the probability distribution over the dual variables throughout the iter...

Show 2 references in code

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, Zbigniew Wojna
2015
6 references

Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend...

Show 2 references in code

Layer Normalization

Chengyong Si, Jianqiang Shen, Xuan Zou, Lei Wang, Qidi Wu
2016
8 references

Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of train...

Show 2 references in code

SSD: Single Shot MultiBox Detector

W. Liu, Dragomir Anguelov, D. Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, A. Berg
2015
2 references

We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scor...

Show 2 references in code

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, D...
2017
6 references

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be impl...

Show 2 references in code

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, Geo...
2017
2 references

The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is ...

Show 2 references in code

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
2015
9 references

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of l...

Show 2 references in code

Searching for Activation Functions

C. Ramachandran, K. Dhanalakshmi, L. Vanitha
2017
7 references

The choice of activation functions in deep networks has a significant effect on the training dynamics and task performance. Currently, the most successful and widely-used activation function is the Rectified Linear Unit (ReLU). Although various hand-designed alternatives to ReLU have been proposed, ...

Show 2 references in code

Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition

Haşim Sak, Andrew Senior, Françoise Beaufays
2014
4 references

Long Short-Term Memory (LSTM) is a recurrent neural network (RNN) architecture that has been designed to address the vanishing and exploding gradient problems of conventional RNNs. Unlike feedforward neural networks, RNNs have cyclic connections making them powerful for modeling sequences. They have...

Show 2 references in code

Random Walk Initialization for Training Very Deep Feedforward Networks

David Sussillo, L. F. Abbott
2014
2 references

Training very deep networks is an important open problem in machine learning. One of many difficulties is that the norm of the back-propagated error gradient can grow or decay exponentially. Here we show that training very deep feed-forward networks (FFNs) is not as difficult as previously thought. ...

Show 2 references in code

Multi-Scale Context Aggregation by Dilated Convolutions

Fisher Yu, Vladlen Koltun
2015
2 references

State-of-the-art models for semantic segmentation are based on adaptations of convolutional networks that had originally been designed for image classification. However, dense prediction and image classification are structurally different. In this work, we develop a new convolutional network module ...

Show 2 references in code

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille
2014
2 references

Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and object detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classifica...

Show 2 references in code

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun
2013
2 references

We present an integrated framework for using Convolutional Networks for classification, localization and detection. We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep learning approach to localization by learning to pr...

Show 2 references in code

Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks

A. Giusti, D. Ciresan, Jonathan Masci, L. Gambardella, J. Schmidhuber
2013
359 citations
2 references

Deep Neural Networks now excel at image classification, detection and segmentation. When used to scan images by means of a sliding window, however, their high computational complexity can bring even the most powerful hardware to its knees. We show how dynamic programming can speedup the process by o...

Show 2 references in code

Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units

Wenling Shang, Kihyuk Sohn, Diogo Almeida, Honglak Lee
2016
4 references

Recently, convolutional neural networks (CNNs) have been used as a powerful tool to solve many problems of machine learning and computer vision. In this paper, we aim to provide insight on the property of convolutional neural networks, as well as a generic method to improve the performance of many C...

Show 2 references in code

Improving neural networks by preventing co-adaptation of feature detectors

Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov
2012
4 references

When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature dete...

Show 2 references in code

Efficient Learning using Forward-Backward Splitting.

John C. Duchi, Yoram Singer
2009
1 citation
2 references

In the wake of the sacramental crisis Asbury established a pattern of relentless travel by horseback across the continent that defined the church for decades to come. He visited New York City, which had been cut off by the war, in August 1783 and also assigned John Dickins to the city. As he travele...

Show 2 references in code

TensorFlow

TensorFlow Developers
2021
1 reference

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

Show 1 reference in code

IndyLSTMs: Independently Recurrent LSTMs

Pedro Gonnet, Thomas Deselaers
2019
1 reference

We introduce Independently Recurrent Long Short-term Memory cells: IndyLSTMs. These differ from regular LSTM cells in that the recurrent weights are not modeled as a full matrix, but as a diagonal matrix, i.e.\ the output and state of each LSTM cell depends on the inputs and its own output/state, as...

Show 1 reference in code

Revisiting ResNets: Improved Training and Scaling Strategies

Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shle...
2021
1 reference

Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to...

Show 1 reference in code

Implementing Neural Turing Machines

Mark Collier, Jöran Beel
2018
1 reference

Neural Turing Machines (NTMs) are an instance of Memory Augmented Neural Networks, a new class of recurrent neural networks which decouple computation from memory by introducing an external memory unit. NTMs have demonstrated superior performance over Long Short-Term Memory Cells in several sequence...

Show 1 reference in code

Dynamic Control Flow in Large-Scale Machine Learning

Yuan Yu, Martín Abadi, P. Barham, E. Brevdo, M. Burrows, Andy Davis, J. Dean, S. Ghemawat, Tim Harle...
2018
1 reference

Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent conditional execution, and other features that call for dyna...

Show 1 reference in code

Reconstruction filters in computer-graphics

Don P. Mitchell, Arun N. Netravali
1988
1 reference

Problems of signal processing arise in image synthesis because of transformations between continuous and discrete representations of 2D images. Aliasing introduced by sampling has received much attention in graphics, but reconstruction of samples into a continuous representation can also cause alias...

Show 1 reference in code

Nonmetric Multidimensional Scaling: A Numerical Method

Joseph B. Kruskal
1964
1 reference

We describe the numerical methods required in our approach to multi-dimensional scaling. The rationale of this approach has appeared previously.

Show 1 reference in code

Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization

Shai Shalev-Shwartz, Tong Zhang
2013
466 citations
1 reference

We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure. We analyze the runtime of the framework and obtain rates that improve state-of-the-art results for various key machine learning optimization...

Show 1 reference in code

Distributed Representations of Words and Phrases and their Compositionality

Tomas E. Gallikson, Petro Sauh, Anatolii M. Kolodnyi, Igor Cepeneda
2013
1 reference

The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the ve...

Show 1 reference in code

Scalable Object Detection using Deep Neural Networks

Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov
2013
1 reference

Deep convolutional neural networks have recently achieved state-of-the-art performance on a number of image recognition benchmarks, including the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC-2012). The winning model on the localization sub-task was a network that predicts a single bound...

Show 1 reference in code

Differentiation of the Cholesky decomposition

Iain Murray
2016
1 reference

We review strategies for differentiating matrix-based computations, and derive symbolic and algorithmic update rules for differentiating expressions containing the Cholesky decomposition. We recommend new `blocked' algorithms, based on differentiating the Cholesky algorithm DPOTRF in the LAPACK libr...

Show 1 reference in code

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
2015
1 reference

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Pr...

Show 1 reference in code

Fast Algorithms for Convolutional Neural Networks

Andrew Lavin, Scott Gray
2015
932 citations
2 references

Deep convolutional neural networks take GPU days of compute time to train on large data sets. Pedestrian detection for self driving cars requires very low latency. Image recognition for mobile phones is constrained by limited processing resources. The success of convolutional neural networks in thes...

Show 1 reference in code

On-Device Neural Net Inference with Mobile GPUs

Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, L...
2019
1 reference

On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing power, thermal constraints, and energy consumption. App dev...

Show 1 reference in code

Fast Sparse ConvNets

Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan
2019
1 reference

Historically, the pursuit of efficient inference has been one of the driving forces behind research into new deep learning architectures and building blocks. Some recent examples include: the squeeze-and-excitation module, depthwise separable convolutions in Xception, and the inverted bottleneck in ...

Show 1 reference in code

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou
2020
1 reference

Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot be deployed to resource-limited mobile devices. In this paper, we...

Show 1 reference in code

Accelerating Large-Scale Inference with Anisotropic Vector Quantization

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, Sanjiv Kumar
2019
1 reference

Quantization based techniques are the current state-of-the-art for scaling maximum inner product search to massive databases. Traditional approaches to quantization aim to minimize the reconstruction error of the database points. Based on the observation that for a given query, the database points t...

Show 1 reference in code

PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model

G. Papandreou, Tyler Lixuan Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, K. Murphy
2018
1 reference

We present a box-free bottom-up approach for the tasks of pose estimation and instance segmentation of people in multi-person images using an efficient single-shot model. The proposed PersonLab model tackles both semantic-level reasoning and object-part associations using part-based modeling. Our mo...

Show 1 reference in code

ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections

Nadiya Straton, Raghava Rao Mukkamala, Ravi Vatrapu
2017
1 reference

Deep neural networks have become ubiquitous for applications related to visual recognition and language understanding tasks. However, it is often prohibitive to use typical neural networks on devices like mobile phones or smart watches since the model sizes are huge and cannot fit in the limited mem...

Show 1 reference in code

MoViNets: Mobile Video Networks for Efficient Video Recognition

D. Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Mingxing Tan, Matthew A. Brown, Boqing Gong
2021
274 citations
1 reference

We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference. 3D convolutional neural networks (CNNs) are accurate at video recognition but require large computation and memory budgets and do not sup...

Show 1 reference in code

NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications

Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam
2018
1 reference

This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify networks based on the number of MACs or weights, optimizing those indirect metrics may not necessarily r...

Show 1 reference in code

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse
2018
1 reference

Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies. Unfortunately, due to the large number of weights, all the examples in a mini-batch typically share the same weight pertur...

Show 1 reference in code

Optimization of Collective Communication Operations in MPICH.

Rajeev Thakur, Rolf Rabenseifner, William Gropp
2005
1 reference

We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizin...

Show 1 reference in code

WaveNet: A Generative Model for Raw Audio

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalc...
2016
1 reference

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with t...

Show 1 reference in code

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting

Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, Wang-chun Woo
2015
1 reference

The goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period of time. Very few previous studies have examined this crucial and challenging weather forecasting problem from the machine learning perspective. In this paper, we formula...

Show 1 reference in code

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kais...
2017
25 references

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer...

Show 1 reference in code

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.

Kyunghyun Cho, Bart van Merriënboer, Çaǧlar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwe...
2014
1 reference

Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014.

Show 1 reference in code

Learning to forget: continual prediction with LSTM

Felix A. Gers, J. Schmidhuber, Fred Cummins
1999
1 reference

Long short-term memory (LSTM) can solve many tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams without explicitly marked sequence ends. Without resets, the internal state values may grow...

Show 1 reference in code

Recurrent Neural Network Regularization

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals
2014
2 references

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs,...

Show 1 reference in code

Long Short-Term Memory.

Sepp Hochreiter, J. Schmidhuber
1997
1 reference

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based m...

Show 1 reference in code

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

Yarin Gal, Zoubin Ghahramani
2015
1 reference

Recurrent neural networks (RNNs) stand at the forefront of many recent developments in deep learning. Yet a major difficulty with these models is their tendency to overfit, with dropout shown to fail when applied to recurrent layers. Recent results at the intersection of Bayesian modelling and deep ...

Show 1 reference in code

Learning Precise Timing with LSTM Recurrent Networks.

Jürgen Schmidhuber, Felix A. Gers, Douglas Eck
2002
1 reference

In response to Rodriguez's recent article (2001), we compare the performance of simple recurrent nets and long short-term memory recurrent nets on context-free and context-sensitive languages.

Show 1 reference in code

On the Convergence of Adam and Beyond.

Alessandro Venuta, Francesca Moret, Giovanni Dal Poggetto, Diletta Esposito, Aurore Fraix, Concetta ...
2018
1 reference
Show 1 reference in code

Incorporating Nesterov Momentum into

Timothy Dozat
2015
1 reference

Cognition and behavior exhibit biases consistent with future expectations, and some of these biases result in momentum-like effects and have been linked with the idea of momentum. These momentum-like effects include representational momentum, operational momentum, attentional momentum, behavioral mo...

Show 1 reference in code

On the difficulty of training Recurrent Neural Networks

Razvan Pascanu, Tomáš Mikolov, Yoshua Bengio
2012
1 reference

There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geo...

Show 1 reference in code

Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow.

K. Sim, A. Narayanan, Tom Bagby, Tara N. Sainath, M. Bacchiani
2017
1 reference

Sequence-level losses are commonly used to train deep neural network acoustic models for automatic speech recognition. The forward-backward algorithm is used to efficiently compute the gradients of the sequence loss with respect to the model parameters. Gradient-based optimization is used to minimiz...

Show 1 reference in code

QR and LQ Decomposition Matrix Backpropagation Algorithms for Square, Wide, and Deep -- Real or Complex -- Matrices and Their Software Implementation

Denisa A. O. Roberts, Lucas R. Roberts
2020
1 reference

This article presents matrix backpropagation algorithms for the QR decomposition of matrices $A_{m, n}$, that are either square (m = n), wide (m < n), or deep (m > n), with rank $k = min(m, n)$. Furthermore, we derive novel matrix backpropagation results for the pivoted (full-rank) QR decomposition ...

Show 1 reference in code

On the Computation of Complex-valued Gradients with Application to Statistically Optimum Beamforming

Christoph Boeddeker, Patrick Hanebrink, Lukas Drude, Jahn Heymann, Reinhold Haeb-Umbach
2017
9 references

This report describes the computation of gradients by algorithmic differentiation for statistically optimum beamforming operations. Especially the derivation of complex-valued functions is a key component of this approach. Therefore the real-valued algorithmic differentiation is extended via the com...

Show 1 reference in code

Accuracy and stability of numerical algorithms, Second Edition.

Nicholas J. Higham
2002
1 reference

From the Publisher: What is the most accurate way to sum floating point numbers? What are the advantages of IEEE arithmetic? How accurate is Gaussian elimination and what were the key breakthroughs in the development of error analysis for the method? The answers to these and many related questions a...

Show 1 reference in code

The relationship between Precision-Recall and ROC curves.

Jesse Davis, Mark Goadrich
2006
3 references

Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more informative picture of an algorithm's performance. We show that a deep conn...

Show 1 reference in code

Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

Stefan Elfwing, Eiji Uchibe, Kenji Doya
2017
6 references

In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro's TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari...

Show 1 reference in code

Conditional Noise-Contrastive Estimation of Unnormalised Models

Ciwan Ceylan, Michael U. Gutmann
2018
1 reference

Many parametric statistical models are not properly normalised and only specified up to an intractable partition function, which renders parameter estimation difficult. Examples of unnormalised models are Gibbs distributions, Markov random fields, and neural network models in unsupervised deep learn...

Show 1 reference in code

Noise-contrastive estimation: A new estimation principle for unnormalized statistical models.

Michael Gutmann, Aapo Hyvärinen
2010
1 reference

We address the problem of articulated 2D human pose estimation in natural images. A well-known person detector - the Implicit Shape Model (ISM) approach introduced by Leibe et al. - is shown not only to be well suited to detect persons, but can also be exploited to derive a person's pose. Therefore,...

Show 1 reference in code

Rectifier Nonlinearities Improve Neural Network Acoustic Models

Andrew L. Maas
2013
1 reference

YouTube is a highly visited video sharing website where over one billion people watch six billion hours of video every month. Improving accessibility to these videos for the hearing impaired and for search and indexing purposes is an excellent application of automatic speech recognition. However, Yo...

Show 1 reference in code

Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond

Fei Wang, Daniel Zheng, James Decker, Xilun Wu, Grégory M. Essertel, Tiark Rompf
2019
1 reference

We propose a static loop vectorization optimization on top of high level dataflow IR used by frameworks like TensorFlow. A new statically vectorized parallel-for abstraction is provided on top of TensorFlow, and used for applications ranging from auto-batching and per-example gradients, to jacobian ...

Show 1 reference in code
Link copied to clipboard!