Machine Learning

Conformer: Convolution-augmented Transformer for Speech Recognition

Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zh...

2020

2 references

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global i...

View Paper PDF

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

Lingchen Meng, Jianwei Yang, Rui Tian, Xiyang Dai, Zuxuan Wu, Jianfeng Gao, Yu-Gang Jiang

2024

5 references

Most large multimodal models (LMMs) are implemented by feeding visual tokens as a sequence into the first layer of a large language model (LLM). The resulting architecture is simple but significantly increases computation and memory costs, as it has ...

View Paper PDF

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, Stefano Ermon

2020

4 references

Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising dif...

View Paper PDF

Deriving Activation Functions Using Integration

Allen Hao Huang, Imanol Schlag

2024

1 reference

Our work proposes a novel approach to designing activation functions by focusing on their gradients and deriving the corresponding activation functions using integration. We introduce the Expanded Integral of the Exponential Linear Unit (xIELU), a tr...

View Paper PDF

Elucidating the Design Space of Diffusion-Based Generative Models

Tero Karras, M. Aittala, Timo Aila, S. Laine

2022

1 reference

We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify ...

View Paper PDF DOI

Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition.

Mark D. Skowronski, John G. Harris

2004

1 reference

Mel frequency cepstral coefficients (MFCC) are the most widely used speech features in automatic speech recognition systems, primarily because the coefficients fit well with the assumptions used in hidden Markov models and because of the superior noi...

View Paper PDF DOI

Generating Long Sequences with Sparse Transformers

Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever

2019

2 references

Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to $O(n \sqrt{n})$. We also introduce...

View Paper PDF

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi...

2021

2 references

Scaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong results on in-context learning tasks. However, training th...

View Paper PDF

Going deeper with Image Transformers.

Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, Hervé Jeǵou

2021

2 references

Transformers have been recently adapted for large scale image classification, achieving high scores shaking up the long supremacy of convolutional neural networks. However the optimization of image transformers has been little studied so far. In this...

View Paper PDF DOI

Language models enable zero-shot prediction of the effects of mutations on protein function

Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, Alexander Rives

2021

3 references

Modeling the effect of sequence variation on function is a fundamental problem for understanding and designing proteins. Since evolution encodes information about function into patterns in protein sequences, unsupervised models of variant effects can...

View Paper PDF DOI

Neural Networks Fail to Learn Periodic Functions and How to Fix It

Liu Ziyin, T. Hartwig, Masahito Ueda

2020

3 references

Previous literature offers limited clues on how to learn a periodic function using modern neural networks. We start with a study of the extrapolation properties of neural networks; we prove and demonstrate experimentally that the standard activations...

View Paper

On Clustering Validation Techniques.

M. Halkidi, Yannis Batistakis, M. Vazirgiannis

2001

1 reference

View Paper PDF DOI

On the Reliability of Watermarks for Large Language Models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando...

2023

4 references

As LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. Watermarking is a simple and effective strategy for mitigating such harms by enabling the detection and do...

View Paper PDF DOI

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree...

2024

2 references

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi...

View Paper PDF

QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer

2023

2 references

We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bi...

View Paper PDF

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models

Jung Hwan Heo, Jeonghoon Kim, Beomseok Kwon, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee

2023

1 reference

Large Language Models (LLMs) have recently demonstrated remarkable success across various tasks. However, efficiently serving LLMs has been a challenge due to the large memory bottleneck, specifically in small batch inference settings (e.g. mobile de...

View Paper PDF

Self-Attention with Relative Position Representations

Peter Shaw, Jakob Uszkoreit, Ashish Vaswani

2018

1 reference

Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relati...

View Paper PDF

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, Song Han

2022

1 reference

Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, existing methods cannot maintain accuracy and hardware efficiency at the same time. We pr...

View Paper PDF

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

William Fedus, Barret Zoph, Noam Shazeer

2021

2 references

In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead selects different parameters for each incoming example. The result is a sparsely-activated model -- with outrageous numbers ...

View Paper PDF

Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram

2025

1 reference

Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coherence. ...

View Paper PDF

Repositories

huggingface/transformers

microsoft/onnxruntime

mlflow/mlflow

pytorch/pytorch

ray-project/ray

scikit-learn/scikit-learn

tensorflow/tensorflow

Papers

Conformer: Convolution-augmented Transformer for Speech Recognition

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

Denoising Diffusion Implicit Models

Deriving Activation Functions Using Integration

Elucidating the Design Space of Diffusion-Based Generative Models

Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition.

Generating Long Sequences with Sparse Transformers

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Going deeper with Image Transformers.

Language models enable zero-shot prediction of the effects of mutations on protein function

Neural Networks Fail to Learn Periodic Functions and How to Fix It

On Clustering Validation Techniques.

On the Reliability of Watermarks for Large Language Models

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

QLoRA: Efficient Finetuning of Quantized LLMs

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models

Self-Attention with Relative Position Representations

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation