Papers
Browse academic papers referenced in production code
Bring Your Own Codegen to Deep Learning Compiler
Deep neural networks (DNNs) have been ubiquitously applied in many applications, and accelerators are emerged as an enabler to support the fast and efficient inference tasks of these applications. However, to achieve high model coverage with high per...
Design and Analysis of a Logless Dynamic Reconfiguration Protocol
Distributed replication systems based on the replicated state machine model have become ubiquitous as the foundation of modern database systems. To ensure availability in the presence of faults, these systems must be able to dynamically replace faile...
Fast and Robust Vectorized In-Place Sorting of Primitive Types.
Modern CPUs provide single instruction-multiple data (SIMD) instructions. SIMD instructions process several elements of a primitive data type simultaneously in fixed-size vectors. Classical sorting algorithms are not directly expressible in SIMD inst...
GoBench: A Benchmark Suite of Real-World Go Concurrency Bugs
Go, a fast growing programming language, is often considered as “the programming language of the cloud”. The language provides a rich set of synchronization primitives, making it easy to write concurrent programs with great parallelism. However. the ...
Linear-time Temporal Logic guided Greybox Fuzzing
Software model checking as well as runtime verification are verification techniques which are widely used for checking temporal properties of software systems. Even though they are property verification techniques, their common usage in practice is i...
LXM: better splittable pseudorandom number generators (and almost as fast)
In 2014, Steele, Lea, and Flood presented SplitMix, an object-oriented pseudorandom number generator (prng) that is quite fast (9 64-bit arithmetic/logical operations per 64 bits generated) and also splittable . A conventional prng object provides a ...
Offline Reinforcement Learning with Implicit Q-Learning
Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to avoid err...
One WITH RECURSIVE is Worth Many GOTOs
PL/SQL integrates an imperative statement-by-statement style of programming with the plan-based evaluation of SQL queries. The disparity of both leads to friction at runtime, slowing PL/SQL execution down significantly. This work describes a compiler...
Optimizing Winograd-Based Convolution with Tensor Cores.
Convolution computing is one of the primary time consuming part of convolutional neural networks (CNNs). State of the art convolutional neural networks use samll, 3 × 3 filters. Recent work on Winograd convolution can reduce the computational complex...
Revisiting ResNets: Improved Training and Scaling Strategies
Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 201...
State Entropy Maximization with Random Encoders for Efficient Exploration
Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL). However, efficient exploration in high-dimensional observation spaces still remains a challenge. This paper presents Random Enc...
Statistical Foundations of Actuarial Learning and its Applications
The aim of this manuscript is to provide the mathematical and statistical foundations of actuarial learning. This is key to most actuarial tasks like insurance pricing, product development, claims reserving and risk management. The basic approach to ...
SyRust: automatic testing of Rust libraries with semantic-aware program synthesis
Rust’s type system ensures the safety of Rust programs; however, programmers can side-step some of the strict typing rules by using the unsafe keyword. A common use of unsafe Rust is by libraries. Bugs in these libraries undermine the safety of the e...
TensorFlow
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deplo...
The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding
There is often variation in the shape and size of input data used for deep learning. In many cases, such data can be represented using tensors with non-uniform shapes, or ragged tensors. Due to limited and non-portable support for efficient execution...
UNIT: Unifying Tensorized Instruction Compilation
Because of the increasing demand for computation in DNN, researchers develope both hardware and software mechanisms to reduce the compute and memory burden. A widely adopted approach is to use mixed precision data types. However, it is hard to levera...
Using Selective Memoization to Defeat Regular Expression Denial of Service (ReDoS)
Regular expressions (regexes) are a denial of service vector in most mainstream programming languages. Recent empirical work has demonstrated that up to 10% of regexes have super-linear worst-case behavior in typical regex engines. It is therefore no...
YJIT: a basic block versioning JIT compiler for CRuby
Ruby is a dynamically typed programming language with a large breadth of features which has grown in popularity with the rise of the modern web, and remains at the core of the implementation of many widely-used websites. CRuby, the default implementa...
ZeRO-Offload: Democratizing Billion-Scale Model Training
Large-scale model training has been a playing ground for a limited few requiring complex model refactoring and access to prohibitively expensive GPU clusters. ZeRO-Offload changes the large model training landscape by making large model training acce...
Ansor: Generating High-Performance Tensor Programs for Deep Learning
High-performance tensor programs are crucial to guarantee efficient execution of deep neural networks. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously challenging. Currently, deep lea...