Papers
Browse academic papers referenced in production code
On-Device Neural Net Inference with Mobile GPUs
On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing power, th...
Oxide: The Essence of Rust
Rust claims to advance industrial programming by bridging the gap between low-level systems programming and high-level application programming. At the heart of the argument that this enables programmers to build more reliable and efficient software i...
ScissorGC: scalable and efficient compaction for Java full garbage collection
Java runtime frees applications from manual memory management through automatic garbage collection (GC). This, however, is usually at the cost of stop-the-world pauses. State-of-the-art collectors leverage multiple generations, which will inevitably ...
SLEEF: A Portable Vectorized Library of C Standard Mathematical Functions
In this paper, we present techniques used to implement our portable\nvectorized library of C standard mathematical functions written entirely in C\nlanguage. In order to make the library portable while maintaining good\nperformance, intrinsic functio...
Soft Actor-Critic for Discrete Action Settings
Soft Actor-Critic is a state-of-the-art reinforcement learning algorithm for continuous action settings that is not applicable to discrete action settings. Many important settings involve discrete actions, however, and so here we derive an alternativ...
A Newton-CG Algorithm with Complexity Guarantees for Smooth Unconstrained Optimization
We consider minimization of a smooth nonconvex objective function using an iterative algorithm based on Newton's method and the linear conjugate gradient algorithm, with explicit detection and use of negative curvature directions for the Hessian of t...
BOHB: Robust and Efficient Hyperparameter Optimization at Scale
Modern deep learning methods are very sensitive to many hyperparameters, and, due to the long training times of state-of-the-art models, vanilla Bayesian hyperparameter optimization is typically computationally infeasible. On the other hand, bandit-b...
Conditional Noise-Contrastive Estimation of Unnormalised Models
Many parametric statistical models are not properly normalised and only specified up to an intractable partition function, which renders parameter estimation difficult. Examples of unnormalised models are Gibbs distributions, Markov random fields, an...
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
Recent advances in deep reinforcement learning have made significant strides in performance on applications such as Go and Atari games. However, developing practical methods to balance exploration and exploitation in complex domains remains largely u...
Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator
Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural network by backp...
Diesel: DSL for linear algebra and neural net computations on GPUs.
We present a domain specific language compiler, Diesel, for basic linear algebra and neural network computations, that accepts input expressions in an intuitive form and generates high performing code for GPUs. The current trend is to represent a neu...
Dynamic Control Flow in Large-Scale Machine Learning
Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent conditiona...
Error Analysis and Improving the Accuracy of Winograd Convolution for Deep Neural Networks
Popular deep neural networks (DNNs) spend the majority of their execution time computing convolutions. The Winograd family of algorithms can greatly reduce the number of arithmetic operations required and is present in many DNN software frameworks. H...
Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches
Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies. Unfortunately, due to the large number of weights, all the examples in a...
Implementing Neural Turing Machines
Neural Turing Machines (NTMs) are an instance of Memory Augmented Neural Networks, a new class of recurrent neural networks which decouple computation from memory by introducing an external memory unit. NTMs have demonstrated superior performance ove...
Learning to Optimize Tensor Programs
We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective deep learnin...
NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications
This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify networks based on the number of MACs or weights, optim...
NetSpectre: Read Arbitrary Memory over Network
In this paper, we present NetSpectre, a generic remote Spectre variant 1 attack. For this purpose, we demonstrate the first access-driven remote Evict+Reload cache attack over network, leaking 15 bits per hour. Beyond retrofitting existing attacks to...
PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model
We present a box-free bottom-up approach for the tasks of pose estimation and instance segmentation of people in multi-person images using an efficient single-shot model. The proposed PersonLab model tackles both semantic-level reasoning and object-p...