ML Compilers
Deep learning compilation frameworks and optimization techniques
Repositories
(6)apache/tvm
iree-org/iree
onnx/onnx
openxla/xla
pytorch/glow
triton-lang/triton
Papers
(74)Automatic loop interchange.
Parallel and vector machines are becoming increasingly important to many computation intensive applications. Effectively utilizing such architectures, particularly from sequential languages such as Fortran, has demanded increasingly sophisticated com...
Diesel: DSL for linear algebra and neural net computations on GPUs.
We present a domain specific language compiler, Diesel, for basic linear algebra and neural network computations, that accepts input expressions in an intuitive form and generates high performing code for GPUs. The current trend is to represent a neu...
Glow: Graph Lowering Compiler Techniques for Neural Networks
This paper presents the design of Glow, a machine learning compiler for heterogeneous hardware. It is a pragmatic approach to compilation that enables the generation of highly optimized code for multiple targets. Glow lowers the traditional neural ne...
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Bec...
On the Complexity of Loop Fusion.
Loop fusion is a program transformation that combines several loops into one. It is used in parallelizing compilers mainly for increasing the granularity of loops and for improving data reuse. The goal of this paper is to study, from a theoretical po...
Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation.
The polyhedral model for loop parallelization has proved to be an effective tool for advanced optimization and automatic parallelization of programs in higher-level languages. Yet, to integrate such optimizations seamlessly into production compilers,...
Scanning Polyhedra with DO Loops.
Article Scanning polyhedra with DO loops Share on Authors: Corinne Ancourt View Profile , François Irigoin View Profile Authors Info & Claims PPOPP '91: Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming...
The Cache Performance and Optimizations of Blocked Algorithms.
article Free Access Share on The cache performance and optimizations of blocked algorithms Authors: Monica D. Lam View Profile , Edward E. Rothberg View Profile , Michael E. Wolf View Profile Authors Info & Claims ACM SIGOPS Operating Systems ReviewV...
Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code.
This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel commands to explicitly...