Machine Learning
Machine learning frameworks, algorithms, and training systems
Repositories
(7)huggingface/transformers
microsoft/onnxruntime
mlflow/mlflow
pytorch/pytorch
ray-project/ray
scikit-learn/scikit-learn
tensorflow/tensorflow
Papers
(373)Accuracy and stability of numerical algorithms, Second Edition.
From the Publisher: What is the most accurate way to sum floating point numbers? What are the advantages of IEEE arithmetic? How accurate is Gaussian elimination and what were the key breakthroughs in the development of error analysis for the method?...
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning....
Ad click prediction: a view from the trenches.
Predicting ad click-through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a d...
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Recurrent neural networks (RNNs) stand at the forefront of many recent developments in deep learning. Yet a major difficulty with these models is their tendency to overfit, with dropout shown to fail when applied to recurrent layers. Recent results a...
Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond
We propose a static loop vectorization optimization on top of high level dataflow IR used by frameworks like TensorFlow. A new statically vectorized parallel-for abstraction is provided on top of TensorFlow, and used for applications ranging from aut...
Differentiation of the Cholesky decomposition
We review strategies for differentiating matrix-based computations, and derive symbolic and algorithmic update rules for differentiating expressions containing the Cholesky decomposition. We recommend new `blocked' algorithms, based on differentiatin...
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a ...
Incorporating Nesterov Momentum into
Cognition and behavior exhibit biases consistent with future expectations, and some of these biases result in momentum-like effects and have been linked with the idea of momentum. These momentum-like effects include representational momentum, operati...
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.
Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014.
Learning Precise Timing with LSTM Recurrent Networks.
In response to Rodriguez's recent article (2001), we compare the performance of simple recurrent nets and long short-term memory recurrent nets on context-free and context-sensitive languages.
Learning to forget: continual prediction with LSTM
Long short-term memory (LSTM) can solve many tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams without explicitly marked sequence ends. ...
Long Short-Term Memory.
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it b...
Neural Optimizer Search with Reinforcement Learning.
We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathem...
Optimization of Collective Communication Operations in MPICH.
We describe our work on improving the performance of collective communication operations in MPICH for clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of m...
Recurrent Neural Network Regularization
We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper...
SGDR: Stochastic Gradient Descent with Warm Restarts.
In this paper, we describe a phenomenon, which we named "super-convergence",\nwhere neural networks can be trained an order of magnitude faster than with\nstandard training methods. The existence of super-convergence is relevant to\nunderstanding why...
SSD: Single Shot MultiBox Detector
We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map locati...