🤖

Machine Learning

Machine learning frameworks, algorithms, and training systems

Repositories

(7)

huggingface/transformers

microsoft/onnxruntime

mlflow/mlflow

pytorch/pytorch

ray-project/ray

scikit-learn/scikit-learn

tensorflow/tensorflow

Papers

(373)

Showing 20 of 373 papers

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew T...

2017

3 references

Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential solution t...

View Paper PDF DOI

Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks

Dongdi Zhao, Li Fan, Kashif Sharif, Guangmin Xia, Yu Wang

2019

10 references

We propose Additive Powers-of-Two~(APoT) quantization, an efficient non-uniform quantization scheme for the bell-shaped and long-tailed distribution of weights and activations in neural networks. By constraining all quantization levels as the sum of ...

View Paper PDF DOI

Data-Free Quantization Through Weight Equalization and Bias Correction

Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling

2019

4 references

We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quan...

View Paper PDF DOI

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

2015

9 references

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual function...

View Paper PDF DOI

Efficient ConvBN Blocks for Transfer Learning and Beyond

Kaichao You, Guo Qin, Anchang Bao, Meng Cao, Ping Huang, Jiulong Shan, Mingsheng Long

2023

4 references

Convolution-BatchNorm (ConvBN) blocks are integral components in various computer vision tasks and other domains. A ConvBN block can operate in three modes: Train, Eval, and Deploy. While the Train mode is indispensable for training models from scrat...

View Paper PDF DOI

Fast Transformer Decoding: One Write-Head is All You Need

Noam Shazeer

2019

7 references

Multi-head attention layers, as used in the Transformer neural sequence model, are a powerful alternative to RNNs for moving information across and between sequences. While training these layers is generally fast and simple, due to parallelizability ...

View Paper PDF DOI

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, Yi Yang

2018

2 references

Previous works utilized ''smaller-norm-less-important'' criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two req...

View Paper PDF DOI

Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Shuli Jiang, Swanand Kadhe, Yi Zhou, Ling Cai, Nathalie Baracaldo

2023

2 references

Growing applications of large language models (LLMs) trained by a third party raise serious concerns on the security vulnerability of LLMs.It has been demonstrated that malicious actors can covertly exploit these vulnerabilities in LLMs through poiso...

View Paper PDF DOI

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai

2023

13 references

Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up decoder inference. However, MQA can lead to quality degradation, and moreover it may not be desirable to train a separate model just for faster inference. We ...

View Paper PDF DOI

GSPMD: General and Scalable Parallelization for ML Computation Graphs

Yuanzhong Xu, HyoukJoong Lee, Dehao Chen, Blake Hechtman, Yanping Huang, Rahul Joshi, Maxim Krikun, ...

2021

6 references

We present GSPMD, an automatic, compiler-based parallelization system for common machine learning computations. It allows users to write programs in the same way as for a single device, then give hints through a few annotations on how to distribute t...

View Paper PDF DOI

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Chelsea Finn, Pieter Abbeel, Sergey Levine

2017

8 references

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and re...

View Paper PDF DOI

On Multiplicative Integration with Recurrent Neural Networks

Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, Ruslan Salakhutdinov

2016

3 references

We introduce a general and simple structural design called Multiplicative Integration (MI) to improve recurrent neural networks (RNNs). MI changes the way in which information from difference sources flows and is integrated in the computational build...

View Paper PDF DOI

PipeDream: Fast and Efficient Pipeline Parallel DNN Training

Aaron Harlap, Deepak Narayanan, Amar Phanishayee, Vivek Seshadri, Nikhil Devanur, Greg Ganger, Phil ...

2018

2 references

PipeDream is a Deep Neural Network(DNN) training system for GPUs that parallelizes computation by pipelining execution across multiple machines. Its pipeline parallel computing model avoids the slowdowns faced by data-parallel training when large mod...

View Paper PDF DOI

Quantizing deep convolutional networks for efficient inference: A whitepaper

Raghuraman Krishnamoorthi

2018

4 references

We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post-training p...

View Paper PDF DOI

RoFormer: Enhanced Transformer with Rotary Position Embedding

Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, Yunfeng Liu

2021

5 references

Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various methods to in...

View Paper PDF DOI

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Da...

2024

2 references

Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy,...

View Paper PDF DOI

To prune, or not to prune: exploring the efficacy of pruning for model compression

Michael Zhu, Suyog Gupta

2017

2 references

Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks at the co...

View Paper PDF DOI

Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks

Sambhav R. Jain, Albert Gural, Michael Wu, Chris H. Dick

2019

4 references

We propose a method of training quantization thresholds (TQT) for uniform symmetric quantizers using standard backpropagation and gradient descent. Contrary to prior work, we show that a careful analysis of the straight-through estimator for threshol...

View Paper PDF DOI

Up or Down? Adaptive Rounding for Post-Training Quantization

Markus Nagel, Rana Ali Amjad, Mart van Baalen, Christos Louizos, Tijmen Blankevoort

2020

286 citations

6 references

When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weigh...

View Paper PDF DOI

Wav2Letter: an End-to-End ConvNet-based Speech Recognition System

Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve

2016

2 references

This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding. It is trained to output letters, with transcribed speech, without the need for force alignment of phone...

View Paper PDF DOI

Previous Page 5 of 19 Next