Papers
Browse academic papers referenced in production code
Some windows with very good sidelobe behavior
Some of the windows presented by Harris [1] are not correct in terms of their reported peak sidelobes and optimal behavior. We present corrected plots of Harris' windows and also derive additional windows with very good sidelobes and optimal behavior...
Fast Algorithms for Convolutional Neural Networks
Deep convolutional neural networks take GPU days of compute time to train on large data sets. Pedestrian detection for self driving cars requires very low latency. Image recognition for mobile phones is constrained by limited processing resources. Th...
Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization
We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure. We analyze the runtime of the framework and obtain rates that improve state-of-the-art res...
Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks
Deep Neural Networks now excel at image classification, detection and segmentation. When used to scan images by means of a sliding window, however, their high computational complexity can bring even the most powerful hardware to its knees. We show ho...
Up or Down? Adaptive Rounding for Post-Training Quantization
When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weigh...
MoViNets: Mobile Video Networks for Efficient Video Recognition
We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference. 3D convolutional neural networks (CNNs) are accurate at video recognition but require l...
A guide to convolution arithmetic for deep learning
We introduce a guide to help deep learning practitioners understand and manipulate convolutional neural network architectures. The guide clarifies the relationship between various properties (input shape, kernel shape, zero padding, strides and outpu...
Stochastic Dual Coordinate Ascent with Adaptive Probabilities
This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems. Our modification consists in allowing the method adaptively change the probability distri...
Empirical Evaluation of Rectified Activations in Convolutional Network
In this paper we investigate the performance of different types of rectified activation functions in convolutional neural network: standard rectified linear unit (ReLU), leaky rectified linear unit (Leaky ReLU), parametric rectified linear unit (PReL...
An Experimental Study of Dynamic Dominators
Motivated by recent applications of dominator computations, we consider the problem of dynamically maintaining the dominators of flow graphs through a sequence of insertions and deletions of edges. Our main theoretical contribution is a simple increm...
Adding vs. Averaging in Distributed Primal-Dual Optimization
Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper...
New cardinality estimation algorithms for HyperLogLog sketches
This paper presents new methods to estimate the cardinalities of data sets recorded by HyperLogLog sketches. A theoretically motivated extension to the original estimator is presented that eliminates the bias for small and large cardinalities. Based ...
Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization
Stochastic Gradient Descent (SGD) has become popular for solving large scale supervised machine learning optimization problems such as SVM, due to their strong theoretical guarantees. While the closely related Dual Coordinate Ascent (DCA) method has ...
Optimizing Function Layout for Mobile Applications
Function layout, also referred to as function reordering or function placement, is one of the most effective profile-guided compiler optimizations. By reordering functions in a binary, compilers are able to greatly improve the performance of large-sc...
Newton’s Method Without Division
Abstract Newton’s Method for root-finding is modified to avoid the division step while maintaining quadratic convergence.
Efficient Learning using Forward-Backward Splitting.
In the wake of the sacramental crisis Asbury established a pattern of relentless travel by horseback across the continent that defined the church for decades to come. He visited New York City, which had been cut off by the war, in August 1783 and als...
Fast Transformer Decoding: One Write-Head is All You Need
Multi-head attention layers, as used in the Transformer neural sequence model, are a powerful alternative to RNNs for moving information across and between sequences. While training these layers is generally fast and simple, due to parallelizability ...
TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s
This paper presents a novel nearest neighbor search algorithm achieving TPU (Google Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms with similar level of recall. The design of the proposed algorithm is motivate...
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot be deployed to ...