Papers
Browse academic papers referenced in production code
Computationally easy, spectrally good multipliers for congruential pseudorandom number generators
Congruential pseudorandom number generators rely on good multipliers, that is, integers that have good performance with respect to the spectral test. We provide lists of multipliers with a good lattice structure up to dimension eight and up to lag ei...
Cortex: A Compiler for Recursive Deep Learning Models
Optimizing deep learning models is generally performed in two steps: (i) high-level graph optimizations such as kernel fusion and (ii) low level kernel optimizations such as those found in vendor libraries. This approach often leaves significant perf...
Elle: Inferring Isolation Anomalies from Experimental Observations
Users who care about their data store it in databases, which (at least in principle) guarantee some form of transactional isolation. However, experience shows [Kleppmann 2019, Kingsbury and Patella 2019a] that many databases do not provide the isolat...
NeuroCard: One Cardinality Estimator for All Tables
Query optimizers rely on accurate cardinality estimates to produce good execution plans. Despite decades of research, existing cardinality estimators are inaccurate for complex queries, due to making lossy modeling assumptions and not capturing inter...
PASE: PostgreSQL Ultra-High-Dimensional Approximate Nearest Neighbor Search Extension
Similarity search has been widely used in various fields, particularly in the Alibaba ecosystem. The open-source solutions to a similarity search of vectors can only support a query with a single vector, whereas real-life scenarios generally require ...
QR and LQ Decomposition Matrix Backpropagation Algorithms for Square, Wide, and Deep -- Real or Complex -- Matrices and Their Software Implementation
This article presents matrix backpropagation algorithms for the QR decomposition of matrices $A_{m, n}$, that are either square (m = n), wide (m < n), or deep (m > n), with rank $k = min(m, n)$. Furthermore, we derive novel matrix backpropagation res...
RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem
Researchers and practitioners in the field of reinforcement learning (RL) frequently leverage parallel computation, which has led to a plethora of new algorithms and systems in the last few years. In this paper, we re-examine the challenges posed by ...
The Emergence of Adversarial Communication in Multi-Agent Reinforcement Learning
Many real-world problems require the coordination of multiple autonomous agents. Recent work has shown the promise of Graph Neural Networks (GNNs) to learn explicit communication strategies that enable complex multi-agent coordination. These works us...
The LDBC Social Network Benchmark
The Linked Data Benchmark Council's Social Network Benchmark (LDBC SNB) is an effort intended to test various functionalities of systems used for graph-like data management. For this, LDBC SNB uses the recognizable scenario of operating a social netw...
User-level Threading
An important class of computer software, such as network servers, exhibits concurrency through many loosely coupled and potentially long-running communication sessions. For these applications, a long-standing open question is whether thread-per-sessi...
Accelerating Large-Scale Inference with Anisotropic Vector Quantization
Quantization based techniques are the current state-of-the-art for scaling maximum inner product search to massive databases. Traditional approaches to quantization aim to minimize the reconstruction error of the database points. Based on the observa...
A Deep Probabilistic Model for Customer Lifetime Value Prediction
Accurate predictions of customers' future lifetime value (LTV) given their attributes and past purchase behavior enables a more customer-centric marketing strategy. Marketers can segment customers into various buckets based on the predicted LTV and, ...
A Time and Space-Efficient Compositional Method for Prime and Test Paths Generation.
This paper investigates the problem of prime and test paths generation, which is an important problem in ensuring path coverage in software testing. Most existing methods for prime/test paths generation have little success in generating the set of al...
Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets
Accurate and comprehensive extraction of information from high-dimensional single cell datasets necessitates faithful visualizations to assess biological populations. A state-of-the-art algorithm for non-linear dimension reduction, t-SNE, requires mu...
Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond
We propose a static loop vectorization optimization on top of high level dataflow IR used by frameworks like TensorFlow. A new statically vectorized parallel-for abstraction is provided on top of TensorFlow, and used for applications ranging from aut...
Computing Extremely Accurate Quantiles Using t-Digests
We present on-line algorithms for computing approximations of rank-based statistics that give high accuracy, particularly near the tails of a distribution, with very small sketches. Notably, the method allows a quantile $q$ to be computed with an acc...
Fast Sparse ConvNets
Historically, the pursuit of efficient inference has been one of the driving forces behind research into new deep learning architectures and building blocks. Some recent examples include: the squeeze-and-excitation module, depthwise separable convolu...
IndyLSTMs: Independently Recurrent LSTMs
We introduce Independently Recurrent Long Short-term Memory cells: IndyLSTMs. These differ from regular LSTM cells in that the recurrent weights are not modeled as a full matrix, but as a diagonal matrix, i.e.\ the output and state of each LSTM cell ...
Machine Learning Systems are Stuck in a Rut
In this paper we argue that systems for numerical computing are stuck in a local basin of performance and programmability. Systems researchers are doing an excellent job improving the performance of 5-year-old benchmarks, but gradually making it hard...