Showing 20 of 613 papers

Some windows with very good sidelobe behavior

Albert H. Nuttall
1981
1,062 citations
2 references

Some of the windows presented by Harris [1] are not correct in terms of their reported peak sidelobes and optimal behavior. We present corrected plots of Harris' windows and also derive additional windows with very good sidelobes and optimal behavior...

Fast Algorithms for Convolutional Neural Networks

Andrew Lavin, Scott Gray
2015
920 citations
2 references

Deep convolutional neural networks take GPU days of compute time to train on large data sets. Pedestrian detection for self driving cars requires very low latency. Image recognition for mobile phones is constrained by limited processing resources. Th...

Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization

Shai Shalev-Shwartz, Tong Zhang
2013
467 citations
1 reference

We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure. We analyze the runtime of the framework and obtain rates that improve state-of-the-art res...

Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks

A. Giusti, D. Ciresan, Jonathan Masci, L. Gambardella, J. Schmidhuber
2013
358 citations
2 references

Deep Neural Networks now excel at image classification, detection and segmentation. When used to scan images by means of a sliding window, however, their high computational complexity can bring even the most powerful hardware to its knees. We show ho...

Up or Down? Adaptive Rounding for Post-Training Quantization

Markus Nagel, Rana Ali Amjad, Mart van Baalen, Christos Louizos, Tijmen Blankevoort
2020
274 citations
6 references

When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weigh...

MoViNets: Mobile Video Networks for Efficient Video Recognition

D. Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Mingxing Tan, Matthew A. Brown, Boqing Gong
2021
269 citations
1 reference

We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference. 3D convolutional neural networks (CNNs) are accurate at video recognition but require l...

A guide to convolution arithmetic for deep learning

Tobias Würfl, Florin C. Ghesu, Vincent Christlein, Andreas Maier
2016
144 citations
5 references

We introduce a guide to help deep learning practitioners understand and manipulate convolutional neural network architectures. The guide clarifies the relationship between various properties (input shape, kernel shape, zero padding, strides and outpu...

Stochastic Dual Coordinate Ascent with Adaptive Probabilities

Dominik Csiba, Zheng Qu, Peter Richtárik
2015
55 citations
2 references

This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems. Our modification consists in allowing the method adaptively change the probability distri...

Empirical Evaluation of Rectified Activations in Convolutional Network

Qingyang Xu, Chengjin Zhang, Li Zhang
2015
35 citations
2 references

In this paper we investigate the performance of different types of rectified activation functions in convolutional neural network: standard rectified linear unit (ReLU), leaky rectified linear unit (Leaky ReLU), parametric rectified linear unit (PReL...

An Experimental Study of Dynamic Dominators

Loukas Georgiadis, Giuseppe F. Italiano, Luigi Laura, Federico Santaroni
2016
20 citations
4 references

Motivated by recent applications of dominator computations, we consider the problem of dynamically maintaining the dominators of flow graphs through a sequence of insertions and deletions of edges. Our main theoretical contribution is a simple increm...

Adding vs. Averaging in Distributed Primal-Dual Optimization

Chenxin Ma, Virginia Smith, Martin Jaggi, Michael I. Jordan, Peter Richtárik, Martin Takáč
2015
16 citations
3 references

Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper...

New cardinality estimation algorithms for HyperLogLog sketches

Otmar Ertl
2017
10 citations
5 references

This paper presents new methods to estimate the cardinalities of data sets recorded by HyperLogLog sketches. A theoretically motivated extension to the original estimator is presented that eliminates the bias for small and large cardinalities. Based ...

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Shai Shalev-Shwartz, Tong Zhang
2012
9 citations
3 references

Stochastic Gradient Descent (SGD) has become popular for solving large scale supervised machine learning optimization problems such as SVM, due to their strong theoretical guarantees. While the closely related Dual Coordinate Ascent (DCA) method has ...

Optimizing Function Layout for Mobile Applications

Ellis Hoag, Kyungwoo Lee, Julián Mestre, Sergey Pupyrev
2022
9 citations
2 references

Function layout, also referred to as function reordering or function placement, is one of the most effective profile-guided compiler optimizations. By reordering functions in a binary, compilers are able to greatly improve the performance of large-sc...

Newton’s Method Without Division

Jeffrey D. Blanchard, Marc Chamberland
2023
4 citations
1 reference

Abstract Newton’s Method for root-finding is modified to avoid the division step while maintaining quadratic convergence.

Efficient Learning using Forward-Backward Splitting.

John C. Duchi, Yoram Singer
2009
1 citation
2 references

In the wake of the sacramental crisis Asbury established a pattern of relentless travel by horseback across the continent that defined the church for decades to come. He visited New York City, which had been cut off by the war, in August 1783 and als...

Fast Transformer Decoding: One Write-Head is All You Need

Noam Shazeer
2019
7 references

Multi-head attention layers, as used in the Transformer neural sequence model, are a powerful alternative to RNNs for moving information across and between sequences. While training these layers is generally fast and simple, due to parallelizability ...

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s

Felix Chern, Blake Hechtman, Andy Davis, Ruiqi Guo, David Majnemer, Sanjiv Kumar
2022
5 references

This paper presents a novel nearest neighbor search algorithm achieving TPU (Google Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms with similar level of recall. The design of the proposed algorithm is motivate...

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou
2020
1 reference

Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot be deployed to ...