Showing 20 of 229 papers

Dependence graphs and compiler optimizations

D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, M. Wolfe
1981
708 citations
1 reference

Dependence graphs can be used as a vehicle for formulating and implementing compiler optimizations. This paper defines such graphs and discusses two kinds of transformations. The first are simple rewriting transformations that remove dependence arcs....

Efficient implementation of lattice operations

H. Aït-Kaci, R. Boyer, P. Lincoln, R. Nasr
1989
261 citations
1 reference

Lattice operations such as greatest lower bound (GLB), least upper bound (LUB), and relative complementation (BUTNOT) are becoming more and more important in programming languages supporting object inheritance. We present a general technique for the ...

Combining analyses, combining optimizations

C. Click, K. Cooper
1995
201 citations
1 reference

Modern optimizing compilers use several passes over a program's intermediate representation to generate good code. Many of these optimizations exhibit a phase-ordering problem. Getting the best code may require iterating optimizations until a fixed p...

FP8 Formats for Deep Learning

Paulius Micikevicius, Dusan Stosic, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwait...
2022
169 citations
10 references

FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors. In this paper we propose an 8-bit floating point (FP8) binary interchange format consisting of two encodings - E4M3 ...

Machine Learning Systems are Stuck in a Rut

P. Barham, M. Isard
2019
76 citations
1 reference

In this paper we argue that systems for numerical computing are stuck in a local basin of performance and programmability. Systems researchers are doing an excellent job improving the performance of 5-year-old benchmarks, but gradually making it hard...

IR2VEC

S. VenkataKeerthy, Rohit Aggarwal, Shalini Jain, M. Desarkar, Ramakrishna Upadrasta, Y. Srikant
2020
65 citations
3 references

We propose IR2VEC, a Concise and Scalable encoding infrastructure to represent programs as a distributed embedding in continuous space. This distributed embedding is obtained by combining representation learning methods with flow information to captu...

Accurate garbage collection in an uncooperative environment

Fergus Henderson
2002
65 citations
1 reference

Previous attempts at garbage collection in uncooperative environments have generally used conservative or mostly-conservative approaches. We describe a technique for doing fully type-accurate garbage collection in an uncooperative environment, using ...

New cardinality estimation algorithms for HyperLogLog sketches

Otmar Ertl
2017
34 citations
3 references

This paper presents new methods to estimate the cardinalities of data sets recorded by HyperLogLog sketches. A theoretically motivated extension to the original estimator is presented that eliminates the bias for small and large cardinalities. Based ...

Concurrent Hash Tables

Tobias Maier, P. Sanders, Roman Dementiev
2019
17 citations
1 reference

Concurrent hash tables are one of the most important concurrent data structures, which are used in numerous applications. For some applications, it is common that hash table accesses dominate the execution time. To efficiently solve these problems in...

How to read floating point numbers accurately

William D. Clinger
1990
15 citations
1 reference

<jats:p> Consider the problem of converting decimal scientific notation for a number into the best binary floating point approximation to that number, for some fixed precision. This problem cannot be solved using arithmetic of any fixed p...

RL4ReAl: Reinforcement Learning for Register Allocation

S. VenkataKeerthy, Siddhartha Jain, Rohit Aggarwal, Albert Cohen, Ramakrishna Upadrasta
2022
8 citations
3 references

We aim to automate decades of research and experience in register allocation, leveraging machine learning. We tackle this problem by embedding a multi-agent reinforcement learning algorithm within LLVM, training it with the state of the art technique...

RL4ReAl: Reinforcement Learning for Register Allocation

S. VenkataKeerthy, Siddharth Jain, Anilava Kundu, Rohit Aggarwal, Albert Cohen, Ramakrishna Upadrast...
2022
8 citations
1 reference

We aim to automate decades of research and experience in register allocation, leveraging machine learning. We tackle this problem by embedding a multi-agent reinforcement learning algorithm within LLVM, training it with the state of the art technique...

An abstract interpretation for SPMD divergence on reducible control flow graphs

Julian Rosemann, Simon Moll, Sebastian Hack
2021
8 citations
1 reference

Vectorizing compilers employ divergence analysis to detect at which program point a specific variable is uniform, i.e. has the same value on all SPMD threads that execute this program point. They exploit uniformity to retain branching to counter bran...

Number Parsing at a Gigabyte per Second

Daniel Lemire
2021
7 citations
1 reference

With disks and networks providing gigabytes per second, parsing decimal numbers from strings becomes a bottleneck. We consider the problem of parsing decimal numbers to the nearest binary floating-point value. The general problem requires variable-pr...

GPU Accelerated Automatic Differentiation With Clad

Ioana Ifrim, Vassil Vassilev, David J Lange
2022
5 citations
1 reference

Automatic Differentiation (AD) is instrumental for science and industry. It is a tool to evaluate the derivative of a function specified through a computer program. The range of AD application domain spans from Machine Learning to Robotics to High En...

Ryū revisited: printf floating point conversion

Ulf Adams
2019
4 citations
2 references

Ryū Printf is a new algorithm to convert floating-point numbers to decimal strings according to the printf %f, %e, and %g formats: %f generates ‘full’ output (integer part of the input, dot, configurable number of digits), %e generates scientific out...

The Swift Language from a Reverse Engineering Perspective

Malte Kraus, Vincent Haupert
2018
3 citations
1 reference

Over the last decade, mobile devices have taken over the consumer market for computer hardware. Almost all these mobile devices run either Android or iOS as their operating systems. In 2014, Apple introduced the Swift programming language as an alter...

OMPTBench – OpenMP Tool Interface Conformance Testing

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieberm...
2024
1 citation
1 reference

OpenMP® is a highly relevant parallelization standard in high-performance computing and all major compiler vendors support it. The standard defines the OpenMP Tool Interface (OMPT) as a mechanism for third-party tools to obtain information on dedicat...

ompTest – Unit Testing with OMPT

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieberm...
2024
1 reference

OpenMP® is a widely used API in high-performance computing that enables parallelization on the host as well as offload work to an accelerator, such as a GPU. The OpenMP specification defines an OpenMP Tool Interface (OMPT), which allows a third-party...