🤖

Machine Learning

Machine learning frameworks, algorithms, and training systems

Repositories

(7)

huggingface/transformers

19 papers

microsoft/onnxruntime

18 papers

mlflow/mlflow

0 papers

pytorch/pytorch

104 papers

ray-project/ray

52 papers

scikit-learn/scikit-learn

122 papers

tensorflow/tensorflow

95 papers

Papers

(373)
Showing 20 of 373 papers

Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization

Shai Shalev-Shwartz, Tong Zhang
2013
466 citations
1 reference

We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure. We analyze the runtime of the framework and obtain rates that improve state-of-the-art res...

A guide to convolution arithmetic for deep learning

Tobias Würfl, Florin C. Ghesu, Vincent Christlein, Andreas Maier
2016
144 citations
5 references

We introduce a guide to help deep learning practitioners understand and manipulate convolutional neural network architectures. The guide clarifies the relationship between various properties (input shape, kernel shape, zero padding, strides and outpu...

Deconvolutional networks

Matthew D. Zeiler, Dilip Krishnan, Graham W. Taylor, Rob Fergus
2010
7 references

Building robust low and mid-level image representations, beyond edge primitives, is a long-standing goal in vision. Many existing feature detectors spatially pool edge information which destroys cues such as edge intersections, parallelism and symmet...

Distributed Representations of Words and Phrases and their Compositionality

Tomas E. Gallikson, Petro Sauh, Anatolii M. Kolodnyi, Igor Cepeneda
2013
1 reference

The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several ...

Dynamic Control Flow in Large-Scale Machine Learning

Yuan Yu, Martín Abadi, P. Barham, E. Brevdo, M. Burrows, Andy Davis, J. Dean, S. Ghemawat, Tim Harle...
2018
1 reference

Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent conditiona...

Efficient Learning using Forward-Backward Splitting.

John C. Duchi, Yoram Singer
2009
1 citation
2 references

In the wake of the sacramental crisis Asbury established a pattern of relentless travel by horseback across the continent that defined the church for decades to come. He visited New York City, which had been cut off by the war, in August 1783 and als...

Enabling Fast Differentially Private SGD via Just-in-Time Compilation and Vectorization

Pranav Subramani, Nicholas Vadivelu, Gautam Kamath
2020
2 references

A common pain point in differentially private machine learning is the significant runtime overhead incurred when executing Differentially Private Stochastic Gradient Descent (DPSGD), which may be as large as two orders of magnitude. We thoroughly dem...

Fast Algorithms for Convolutional Neural Networks

Andrew Lavin, Scott Gray
2015
926 citations
2 references

Deep convolutional neural networks take GPU days of compute time to train on large data sets. Pedestrian detection for self driving cars requires very low latency. Image recognition for mobile phones is constrained by limited processing resources. Th...

Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks

A. Giusti, D. Ciresan, Jonathan Masci, L. Gambardella, J. Schmidhuber
2013
357 citations
2 references

Deep Neural Networks now excel at image classification, detection and segmentation. When used to scan images by means of a sliding window, however, their high computational complexity can bring even the most powerful hardware to its knees. We show ho...

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro
2019
2 references

Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be quite difficult to train due to memory constraints. In t...

Multi-Scale Context Aggregation by Dilated Convolutions

Fisher Yu, Vladlen Koltun
2015
2 references

State-of-the-art models for semantic segmentation are based on adaptations of convolutional networks that had originally been designed for image classification. However, dense prediction and image classification are structurally different. In this wo...

Nonmetric Multidimensional Scaling: A Numerical Method

Joseph B. Kruskal
1964
1 reference

We describe the numerical methods required in our approach to multi-dimensional scaling. The rationale of this approach has appeared previously.

On the difficulty of training Recurrent Neural Networks

Razvan Pascanu, Tomáš Mikolov, Yoshua Bengio
2012
1 reference

There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by ...

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun
2013
2 references

We present an integrated framework for using Convolutional Networks for classification, localization and detection. We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep l...

Reconstruction filters in computer-graphics

Don P. Mitchell, Arun N. Netravali
1988
1 reference

Problems of signal processing arise in image synthesis because of transformations between continuous and discrete representations of 2D images. Aliasing introduced by sampling has received much attention in graphics, but reconstruction of samples int...

Rectifier Nonlinearities Improve Neural Network Acoustic Models

Andrew L. Maas
2013
1 reference

YouTube is a highly visited video sharing website where over one billion people watch six billion hours of video every month. Improving accessibility to these videos for the hearing impaired and for search and indexing purposes is an excellent applic...

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille
2014
2 references

Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and object detection. This work brings together methods from DCNNs and probabilistic graphical models...

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Shai Shalev-Shwartz, Tong Zhang
2012
9 citations
3 references

Stochastic Gradient Descent (SGD) has become popular for solving large scale supervised machine learning optimization problems such as SVM, due to their strong theoretical guarantees. While the closely related Dual Coordinate Ascent (DCA) method has ...

The Complex Gradient Operator and the CR-Calculus

Ken Kreutz-Delgado
2009
2 references

A thorough discussion and development of the calculus of real-valued functions of complex-valued vectors is given using the framework of the Wirtinger Calculus. The presented material is suitable for exposition in an introductory Electrical Engineeri...

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s

Felix Chern, Blake Hechtman, Andy Davis, Ruiqi Guo, David Majnemer, Sanjiv Kumar
2022
5 references

This paper presents a novel nearest neighbor search algorithm achieving TPU (Google Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms with similar level of recall. The design of the proposed algorithm is motivate...