Showing 20 of 613 papers

Automatic model construction with Gaussian processes

David Duvenaud
2014
4 references

This thesis develops a method for automatically constructing, visualizing and describing a large class of models, useful for forecasting and finding structure in domains such as time series, geological formations, and physical dynamics. These models,...

Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition

Haşim Sak, Andrew Senior, Françoise Beaufays
2014
4 references

Long Short-Term Memory (LSTM) is a recurrent neural network (RNN) architecture that has been designed to address the vanishing and exploding gradient problems of conventional RNNs. Unlike feedforward neural networks, RNNs have cyclic connections maki...

Ad click prediction: a view from the trenches.

H. Brendan McMahan, Gary D. Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, T...
2013
4 references

Predicting ad click-through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a d...

Improving neural networks by preventing co-adaptation of feature detectors

Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov
2012
4 references

When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This preve...

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

2011
4 references

We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning....

Making and Evaluating Point Forecasts

T. Gneiting
2009
4 references

Typically, point forecasting methods are compared and assessed by means of an error measure or scoring function, with the absolute error and the squared error being key examples. The individual scores are averaged over forecast cases, to result in a ...

LIBLINEAR: A Library for Large Linear Classification

Jacob Fish, Rong Fan
2008
4 references

AbstractWe present a generalization of the classical mathematical homogenization theory aimed at accounting for finite unit cell distortions, which gives rise to a nonperiodic asymptotic expansion. We introduce an auxiliary macro‐deformed configurati...

A cool and practical alternative to traditional hash tables

Ú. Erlingsson, M. Manasse, Frank McSherry
2006
4 references

A simple method for generating gamma variables

George Marsaglia, Wai Wan Tsang
2000
4 references

We offer a procedure for generating a gamma variate as the cube of a suitably scaled normal variate. It is fast and simple, assuming one has a fast way to generate normal variables. In brief: generate a normal variate x and a uniform variate U until ...

Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods

Hamid Gharavi, Lajos Hanzo
1999
4 references

BBMRI-NL, X-omics, VOILA, Medical Delta and the Dutch Research Council (NWO-VENI).

Commonly used software tools produce conflicting and overly-optimistic AUPRC values

Wenyu Chen, Chen Miao, Zhenghao Zhang, Cathy Sin-Hang Fung, Ran Wang, Yizhen Chen, Yan Qian, Lixin C...
2024
3 references

The precision-recall curve (PRC) and the area under the precision-recall curve (AUPRC) are useful for quantifying classification performance. They are commonly used in situations with imbalanced classes, such as cancer diagnosis and cell type annotat...

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Bridgit Dao
2023
3 references

Scaling Transformers to longer sequence lengths has been a major problem in the last several years, promising to improve performance in language modeling and high-resolution image understanding, as well as to unlock new applications in code, audio, a...

Language models enable zero-shot prediction of the effects of mutations on protein function

Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, Alexander Rives
2021
3 references

Modeling the effect of sequence variation on function is a fundamental problem for understanding and designing proteins. Since evolution encodes information about function into patterns in protein sequences, unsupervised models of variant effects can...

"We do not appreciate being experimented on": Developer and Researcher Views on the Ethics of Experiments on Open-Source Projects

D. Feitelson
2021
3 references

A tenet of open source software development is to accept contributions from users-developers (typically after appropriate vetting). But should this also include interventions done as part of research on open source development? Following an incident ...

Conservative Q-Learning for Offline Reinforcement Learning

Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine
2020
3 references

Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected, static dataset...

Neural Networks Fail to Learn Periodic Functions and How to Fix It

Liu Ziyin, T. Hartwig, Masahito Ueda
2020
3 references

Previous literature offers limited clues on how to learn a periodic function using modern neural networks. We start with a study of the extrapolation properties of neural networks; we prove and demonstrate experimentally that the standard activations...

A Short Note about Kinetics-600

Joao Carreira, Eric Noland, Andras Banki-Horvath, Chloe Hillier, Andrew Zisserman
2018
3 references

We describe an extension of the DeepMind Kinetics human action dataset from 400 classes, each with at least 400 video clips, to 600 classes, each with at least 600 video clips. In order to scale up the dataset we changed the data collection process s...

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
2018
3 references

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations fro...

Glow: Graph Lowering Compiler Techniques for Neural Networks

Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibso...
2018
3 references

This paper presents the design of Glow, a machine learning compiler for heterogeneous hardware. It is a pragmatic approach to compilation that enables the generation of highly optimized code for multiple targets. Glow lowers the traditional neural ne...