🤖

Machine Learning

Machine learning frameworks, algorithms, and training systems

Repositories

(7)

huggingface/transformers

19 papers

microsoft/onnxruntime

18 papers

mlflow/mlflow

0 papers

pytorch/pytorch

104 papers

ray-project/ray

52 papers

scikit-learn/scikit-learn

122 papers

tensorflow/tensorflow

95 papers

Papers

(373)
Showing 20 of 373 papers

Accuracy and stability of numerical algorithms, Second Edition.

Nicholas J. Higham
2002
1 reference

From the Publisher: What is the most accurate way to sum floating point numbers? What are the advantages of IEEE arithmetic? How accurate is Gaussian elimination and what were the key breakthroughs in the development of error analysis for the method?...

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

2011
4 references

We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning....

Ad click prediction: a view from the trenches.

H. Brendan McMahan, Gary D. Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, T...
2013
4 references

Predicting ad click-through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a d...

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

Yarin Gal, Zoubin Ghahramani
2015
1 reference

Recurrent neural networks (RNNs) stand at the forefront of many recent developments in deep learning. Yet a major difficulty with these models is their tendency to overfit, with dropout shown to fail when applied to recurrent layers. Recent results a...

Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond

Fei Wang, Daniel Zheng, James Decker, Xilun Wu, Grégory M. Essertel, Tiark Rompf
2019
1 reference

We propose a static loop vectorization optimization on top of high level dataflow IR used by frameworks like TensorFlow. A new statically vectorized parallel-for abstraction is provided on top of TensorFlow, and used for applications ranging from aut...

Differentiation of the Cholesky decomposition

Iain Murray
2016
1 reference

We review strategies for differentiating matrix-based computations, and derive symbolic and algorithmic update rules for differentiating expressions containing the Cholesky decomposition. We recommend new `blocked' algorithms, based on differentiatin...

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
2015
1 reference

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a ...

Incorporating Nesterov Momentum into

Timothy Dozat
2015
1 reference

Cognition and behavior exhibit biases consistent with future expectations, and some of these biases result in momentum-like effects and have been linked with the idea of momentum. These momentum-like effects include representational momentum, operati...

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.

Kyunghyun Cho, Bart van Merriënboer, Çaǧlar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwe...
2014
1 reference

Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014.

Learning Precise Timing with LSTM Recurrent Networks.

Jürgen Schmidhuber, Felix A. Gers, Douglas Eck
2002
1 reference

In response to Rodriguez's recent article (2001), we compare the performance of simple recurrent nets and long short-term memory recurrent nets on context-free and context-sensitive languages.

Learning to forget: continual prediction with LSTM

Felix A. Gers, J. Schmidhuber, Fred Cummins
1999
1 reference

Long short-term memory (LSTM) can solve many tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams without explicitly marked sequence ends. ...

Long Short-Term Memory.

Sepp Hochreiter, J. Schmidhuber
1997
1 reference

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it b...

Neural Optimizer Search with Reinforcement Learning.

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le
2017
4 references

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathem...

On the Convergence of Adam and Beyond.

Alessandro Venuta, Francesca Moret, Giovanni Dal Poggetto, Diletta Esposito, Aurore Fraix, Concetta ...
2018
1 reference

On the importance of initialization and momentum in deep learning.

Ilya L. Shapiro, Guilherme de Berredo-Peixoto
2013
2 references

Optimization of Collective Communication Operations in MPICH.

Rajeev Thakur, Rolf Rabenseifner, William Gropp
2005
1 reference

We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of m...

Recurrent Neural Network Regularization

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals
2014
2 references

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper...

SGDR: Stochastic Gradient Descent with Warm Restarts.

Evans Lansing Smith
2017
10 references

In this paper, we describe a phenomenon, which we named "super-convergence",\nwhere neural networks can be trained an order of magnitude faster than with\nstandard training methods. The existence of super-convergence is relevant to\nunderstanding why...

SSD: Single Shot MultiBox Detector

W. Liu, Dragomir Anguelov, D. Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, A. Berg
2015
2 references

We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map locati...