Papers
Browse academic papers referenced in production code
Automatic model construction with Gaussian processes
This thesis develops a method for automatically constructing, visualizing and describing a large class of models, useful for forecasting and finding structure in domains such as time series, geological formations, and physical dynamics. These models,...
Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition
Long Short-Term Memory (LSTM) is a recurrent neural network (RNN) architecture that has been designed to address the vanishing and exploding gradient problems of conventional RNNs. Unlike feedforward neural networks, RNNs have cyclic connections maki...
Ad click prediction: a view from the trenches.
Predicting ad click-through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a d...
Improving neural networks by preventing co-adaptation of feature detectors
When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This preve...
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning....
Making and Evaluating Point Forecasts
Typically, point forecasting methods are compared and assessed by means of an error measure or scoring function, with the absolute error and the squared error being key examples. The individual scores are averaged over forecast cases, to result in a ...
LIBLINEAR: A Library for Large Linear Classification
AbstractWe present a generalization of the classical mathematical homogenization theory aimed at accounting for finite unit cell distortions, which gives rise to a nonperiodic asymptotic expansion. We introduce an auxiliary macro‐deformed configurati...
A simple method for generating gamma variables
We offer a procedure for generating a gamma variate as the cube of a suitably scaled normal variate. It is fast and simple, assuming one has a fast way to generate normal variables. In brief: generate a normal variate x and a uniform variate U until ...
Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods
BBMRI-NL, X-omics, VOILA, Medical Delta and the Dutch Research Council (NWO-VENI).
Commonly used software tools produce conflicting and overly-optimistic AUPRC values
The precision-recall curve (PRC) and the area under the precision-recall curve (AUPRC) are useful for quantifying classification performance. They are commonly used in situations with imbalanced classes, such as cancer diagnosis and cell type annotat...
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Scaling Transformers to longer sequence lengths has been a major problem in the last several years, promising to improve performance in language modeling and high-resolution image understanding, as well as to unlock new applications in code, audio, a...
Language models enable zero-shot prediction of the effects of mutations on protein function
Modeling the effect of sequence variation on function is a fundamental problem for understanding and designing proteins. Since evolution encodes information about function into patterns in protein sequences, unsupervised models of variant effects can...
"We do not appreciate being experimented on": Developer and Researcher Views on the Ethics of Experiments on Open-Source Projects
A tenet of open source software development is to accept contributions from users-developers (typically after appropriate vetting). But should this also include interventions done as part of research on open source development? Following an incident ...
Conservative Q-Learning for Offline Reinforcement Learning
Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected, static dataset...
Neural Networks Fail to Learn Periodic Functions and How to Fix It
Previous literature offers limited clues on how to learn a periodic function using modern neural networks. We start with a study of the extrapolation properties of neural networks; we prove and demonstrate experimentally that the standard activations...
A Short Note about Kinetics-600
We describe an extension of the DeepMind Kinetics human action dataset from 400 classes, each with at least 400 video clips, to 600 classes, each with at least 600 video clips. In order to scale up the dataset we changed the data collection process s...
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations fro...
Glow: Graph Lowering Compiler Techniques for Neural Networks
This paper presents the design of Glow, a machine learning compiler for heterogeneous hardware. It is a pragmatic approach to compilation that enables the generation of highly optimized code for multiple targets. Glow lowers the traditional neural ne...