iree-org/iree - PaperGrep

7 papers

7 files

11 references

Paper References by File

▶ compiler/src/iree/compiler/Dialect/LinalgExt/Utils/IndexingUtils.h

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Bridgit Dao

2023

3 references

View Paper PDF DOI View on GitHub

▶ compiler/src/iree/compiler/Dialect/LinalgExt/Utils/WinogradConstants.h

Optimizing Winograd-Based Convolution with Tensor Cores.

Junhong Liu, Dongxu Yang, Junjie Lai

2021

1 reference

View Paper DOI View on GitHub

▶ compiler/src/iree/compiler/GlobalOptimization/QuantizedConvToConv.cpp

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, D...

2017

6 references

View Paper PDF DOI View on GitHub

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, D...

2017

6 references

View Paper PDF DOI View on GitHub

▶ compiler/src/iree/compiler/GlobalOptimization/QuantizedMatmulToMatmul.cpp

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, D...

2017

6 references

View Paper PDF DOI View on GitHub

▶ runtime/src/iree/base/internal/math.h

FP8 Formats for Deep Learning

Paulius Micikevicius, Dušan Stošić, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwait...

2022

24 references

View Paper PDF DOI View on GitHub

FP8 Formats for Deep Learning

Paulius Micikevicius, Dušan Stošić, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwait...

2022

24 references

View Paper PDF DOI View on GitHub

8-bit Numerical Formats for Deep Neural Networks

Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, Carlo Luschi

2022

16 references

View Paper PDF DOI View on GitHub

8-bit Numerical Formats for Deep Neural Networks

Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, Carlo Luschi

2022

16 references

View Paper PDF DOI View on GitHub

▶ runtime/src/iree/base/internal/prng.h

Fast splittable pseudorandom number generators

G. Steele, D. Lea, Christine H. Flood

2014

2 references

View Paper DOI View on GitHub

▶ samples/dynamic_shapes/README.md

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference

Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wa...

2020

2 references

View Paper PDF DOI View on GitHub

Papers Referenced in This Repository

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, D...

2017

6 references

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be impl...

View Paper PDF DOI

Show 3 references in code

compiler/src/iree/compiler/GlobalOptimization/QuantizedConvToConv.cpp:125

compiler/src/iree/compiler/GlobalOptimization/QuantizedConvToConv.cpp:244

compiler/src/iree/compiler/GlobalOptimization/QuantizedMatmulToMatmul.cpp:36

FP8 Formats for Deep Learning

Paulius Micikevicius, Dušan Stošić, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwait...

2022

24 references

FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors. In this paper we propose an 8-bit floating point (FP8) binary interchange format consisting of two encodings - E4M3 (4-bit exponent and 3-bit mantissa) and E5M2 (5-bi...

View Paper PDF DOI

Show 2 references in code

runtime/src/iree/base/internal/math.h:516

runtime/src/iree/base/internal/math.h:521

8-bit Numerical Formats for Deep Neural Networks

Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, Carlo Luschi

2022

16 references

Given the current trend of increasing size and complexity of machine learning architectures, it has become of critical importance to identify new approaches to improve the computational efficiency of model training. In this context, we address the advantages of floating-point over fixed-point repres...

View Paper PDF DOI

Show 2 references in code

runtime/src/iree/base/internal/math.h:532

runtime/src/iree/base/internal/math.h:545

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Bridgit Dao

2023

3 references

Scaling Transformers to longer sequence lengths has been a major problem in the last several years, promising to improve performance in language modeling and high-resolution image understanding, as well as to unlock new applications in code, audio, and video generation. The attention layer is the ma...

View Paper PDF DOI

Show 1 reference in code

compiler/src/iree/compiler/Dialect/LinalgExt/Utils/IndexingUtils.h:38

Optimizing Winograd-Based Convolution with Tensor Cores.

Junhong Liu, Dongxu Yang, Junjie Lai

2021

1 reference

Convolution computing is one of the primary time consuming part of convolutional neural networks (CNNs). State of the art convolutional neural networks use samll, 3 × 3 filters. Recent work on Winograd convolution can reduce the computational complexity a lot, making the convolution computing fast. ...

View Paper DOI

Show 1 reference in code

compiler/src/iree/compiler/Dialect/LinalgExt/Utils/WinogradConstants.h:20

Fast splittable pseudorandom number generators

G. Steele, D. Lea, Christine H. Flood

2014

2 references

We describe a new algorithm SplitMix for an object-oriented and splittable pseudorandom number generator (PRNG) that is quite fast: 9 64-bit arithmetic/logical operations per 64 bits generated. A conventional linear PRNG object provides a generate method that returns one pseudorandom value and updat...

View Paper DOI

Show 1 reference in code

runtime/src/iree/base/internal/prng.h:35

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference

Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wa...

2020

2 references

Modern deep neural networks increasingly make use of features such as dynamic control flow, data structures and dynamic tensor shapes. Existing deep learning systems focus on optimizing and executing static neural networks which assume a pre-determined model architecture and input data shapes--assum...

View Paper PDF DOI

Show 1 reference in code

samples/dynamic_shapes/README.md:112

Link copied to clipboard!