llvm/llvm-project - PaperGrep

We propose IR2VEC, a Concise and Scalable encoding infrastructure to represent programs as a distributed embedding in continuous space. This distributed embedding is obtained by combining representation learning methods with flow information to capture the syntax as well as the semantics of the inpu...

View Paper PDF DOI

Show 4 references in code

llvm/docs/CommandGuide/llvm-ir2vec.rst:346

llvm/docs/MLGO.rst:552

llvm/include/llvm/Analysis/IR2Vec.h:24

llvm/include/llvm/Analysis/IR2Vec.h:25

RL4ReAl: Reinforcement Learning for Register Allocation

Shalini Jain, Yashas Andaluri, S. VenkataKeerthy, Ramakrishna Upadrasta

2022

4 references

We aim to automate decades of research and experience in register allocation, leveraging machine learning. We tackle this problem by embedding a multi-agent reinforcement learning algorithm within LLVM, training it with the state of the art techniques. We formalize the constraints that precisely def...

View Paper PDF DOI

Show 4 references in code

llvm/docs/CommandGuide/llvm-ir2vec.rst:349

llvm/docs/MLGO.rst:668

llvm/include/llvm/CodeGen/MIR2Vec.h:35

llvm/include/llvm/CodeGen/MIR2Vec.h:36

Efficient chaotic iteration strategies with widenings.

François Bourdoncle

1993

6 references

View Paper DOI

Show 3 references in code

clang/include/clang/Analysis/Analyses/IntervalPartition.h:15

clang/include/clang/Analysis/Analyses/IntervalPartition.h:40

clang/include/clang/Analysis/Analyses/IntervalPartition.h:48

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S....

2018

5 references

Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding, ranking user preferences, ad placement, etc. Competing framework...

View Paper PDF DOI

Show 3 references in code

mlir/docs/Dialects/Linalg/OpDSL.md:40

mlir/docs/Rationale/RationaleLinalgDialect.md:127

mlir/docs/Rationale/RationaleLinalgDialect.md:273

Memory Tagging and how it improves C/C++ memory safety

Youren Shen, Yu Chen, Kang Chen, Hongliang Tian, Shoumeng Yan

2018

2 references

Memory safety in C and C++ remains largely unresolved. A technique usually called "memory tagging" may dramatically improve the situation if implemented in hardware with reasonable overhead. This paper describes two existing implementations of memory tagging: one is the full hardware implementation ...

View Paper PDF DOI

Show 2 references in code

clang/docs/HardwareAssistedAddressSanitizerDesign.rst:43

llvm/docs/MemTagSanitizer.rst:99

RLIBM-PROG: Progressive Polynomial Approximations for Fast Correctly Rounded Math Libraries

Jay P. Lim, Mridul Aanjaneya, John L. Gustafson, Santosh Nagarakatte

2021

2 references

This paper presents a novel method for generating a single polynomial approximation that produces correctly rounded results for all inputs of an elementary function for multiple representations. The generated polynomial approximation has the nice property that the first few lower degree terms produc...

View Paper PDF DOI

Show 2 references in code

libc/src/math/generic/log10f.cpp:56

libc/src/math/generic/log2f.cpp:53

Ryū revisited: printf floating point conversion

Ulf Adams

2019

2 references

Ryū Printf is a new algorithm to convert floating-point numbers to decimal strings according to the printf %f, %e, and %g formats: %f generates ‘full’ output (integer part of the input, dot, configurable number of digits), %e generates scientific output (one leading digit, dot, configurable number o...

View Paper PDF DOI

Show 2 references in code

libc/src/stdio/printf_core/float_dec_converter.h:485

libc/src/__support/float_to_string.h:85

FP8 Formats for Deep Learning

Paulius Micikevicius, Dušan Stošić, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwait...

2022

24 references

FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors. In this paper we propose an 8-bit floating point (FP8) binary interchange format consisting of two encodings - E4M3 (4-bit exponent and 3-bit mantissa) and E5M2 (5-bi...

View Paper PDF DOI

Show 2 references in code

llvm/include/llvm/ADT/APFloat.h:201

llvm/include/llvm/ADT/APFloat.h:214

8-bit Numerical Formats for Deep Neural Networks

Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, Carlo Luschi

2022

16 references

Given the current trend of increasing size and complexity of machine learning architectures, it has become of critical importance to identify new approaches to improve the computational efficiency of model training. In this context, we address the advantages of floating-point over fixed-point repres...

View Paper PDF DOI

Show 2 references in code

llvm/include/llvm/ADT/APFloat.h:204

llvm/include/llvm/ADT/APFloat.h:219

An Experimental Study of Dynamic Dominators

Loukas Georgiadis, Giuseppe F. Italiano, Luigi Laura, Federico Santaroni

2016

26 citations

4 references

Motivated by recent applications of dominator computations, we consider the problem of dynamically maintaining the dominators of flow graphs through a sequence of insertions and deletions of edges. Our main theoretical contribution is a simple incremental algorithm that maintains the dominator tree ...

View Paper PDF DOI

Show 2 references in code

llvm/include/llvm/Support/GenericDomTreeConstruction.h:33

llvm/include/llvm/Support/GenericDomTreeConstruction.h:1426

GPU Accelerated Automatic Differentiation With Clad

Ioana Ifrim, Vassil M. Vassilev, D. J. Lange

2022

1 reference

Automatic Differentiation (AD) is instrumental for science and industry. It is a tool to evaluate the derivative of a function specified through a computer program. The range of AD application domain spans from Machine Learning to Robotics to High Energy Physics. Computing gradients with the help of...

View Paper PDF DOI

Show 1 reference in code

clang/docs/ClangRepl.rst:658

Efficient Implementation of Lattice Operations.

Hassan Aı̈t-Kaci, Robert S. Boyer, Patrick Lincoln, Roger Nasr

1989

1 reference

Lattice operations such as greatest lower bound (GLB), least upper bound (LUB), and relative complementation (BUTNOT) are becoming more and more important in programming languages supporting object inheritance. We present a general technique for the efficient implementation of such operations based ...

View Paper PDF DOI

Show 1 reference in code

clang/docs/DataFlowAnalysisIntro.md:90

Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.6

Dan Bonachea, Katherine Rasmussen

2025

1 reference

This document specifies an interface to support the multi-image parallelism features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a solution in which a runtime library is primarily responsible for implementing coarray allocation, deallocation and accesses, image synch...

View Paper PDF DOI

Show 1 reference in code

flang/docs/ParallelMultiImageFortranRuntime.md:17

Newton’s Method Without Division

Jeffrey D. Blanchard, Marc Chamberland

2023

5 citations

1 reference

Abstract Newton’s Method for root-finding is modified to avoid the division step while maintaining quadratic convergence.

View Paper PDF DOI

Show 1 reference in code

libc/src/__support/fixed_point/sqrt.h:174

Number Parsing at a Gigabyte per Second

Daniel Lemire

2021

10 references

With disks and networks providing gigabytes per second, parsing decimal numbers from strings becomes a bottleneck. We consider the problem of parsing decimal numbers to the nearest binary floating-point value. The general problem requires variable-precision arithmetic. However, we need at most 17 di...

View Paper PDF DOI

Show 1 reference in code

libc/src/__support/str_to_float.h:82

How to read floating point numbers accurately

William Clinger

1990

2 references

Consider the problem of converting decimal scientific notation for a number into the best binary floating point approximation to that number, for some fixed precision. This problem cannot be solved using arithmetic of any fixed precision. Hence the IEEE Standard for Binary Floating-Point Arithmetic ...

View Paper PDF DOI

Show 1 reference in code

libc/src/__support/str_to_float.h:460

Concurrent Hash Tables

Tobias Maier, Peter Sanders, Roman Dementiev

2019

2 references

Concurrent hash tables are one of the most important concurrent data structures, which are used in numerous applications. For some applications, it is common that hash table accesses dominate the execution time. To efficiently solve these problems in parallel, we need implementations that achieve sp...

View Paper DOI

Show 1 reference in code

lld/COFF/DebugTypes.cpp:917

An abstract interpretation for SPMD divergence on reducible control flow graphs

Julian Rosemann, Simon Moll, Sebastian Hack

2021

1 reference

Vectorizing compilers employ divergence analysis to detect at which program point a specific variable is uniform, i.e. has the same value on all SPMD threads that execute this program point. They exploit uniformity to retain branching to counter branch divergence and defer computations to scalar pro...

View Paper PDF DOI

Show 1 reference in code

llvm/docs/ConvergenceAndUniformity.rst:54

Dependence Graphs and Compiler Optimizations.

David J. Kuck, Robert H. Kuhn, David Padua, Bruce Leasure, Michael Wolfe

1981

1 reference

Dependence graphs can be used as a vehicle for formulating and implementing compiler optimizations. This paper defines such graphs and discusses two kinds of transformations. The first are simple rewriting transformations that remove dependence arcs. The second are abstraction transformations that d...

View Paper PDF DOI

Show 1 reference in code

llvm/docs/DependenceGraphs/index.rst:139

The program dependence graph and its use in optimization

Jeanne Ferrante, Karl J. Ottenstein, Joe Warren

1987

1 reference

In this paper we present an intermediate program representation, called the program dependence graph ( PDG ), that makes explicit both the data and control dependences for each operation in a program. Data dependences have been used to represent only the relevant data flow relationships of a program...

View Paper PDF DOI

Show 1 reference in code

llvm/docs/DependenceGraphs/index.rst:140

Accurate garbage collection in an uncooperative environment.

Fergus Henderson

2002

1 reference

Previous attempts at garbage collection in uncooperative environments have generally used conservative or mostly-conservative approaches. We describe a technique for doing fully type-accurate garbage collection in an uncooperative environment, using a "shadow stack" to link structs of pointer-contai...

View Paper DOI

Show 1 reference in code

llvm/docs/GarbageCollection.rst:1027

Optimistic and Scalable Global Function Merging

Kyungwoo Lee, Manman Ren, Ellis Hoag

2024

1 reference

Function merging is a pivotal technique for reducing code size by combining identical or similar functions into a single function. While prior research has extensively explored this technique, it has not been assessed in conjunction with function outlining and linker’s identical code folding, despit...

View Paper PDF DOI

Show 1 reference in code

llvm/include/llvm/CodeGen/GlobalMergeFunctions.h:21

Optimizing Function Layout for Mobile Applications

Ellis Hoag, Kyungwoo Lee, Julián Mestre, Sergey Pupyrev

2022

9 citations

2 references

Function layout, also referred to as function reordering or function placement, is one of the most effective profile-guided compiler optimizations. By reordering functions in a binary, compilers are able to greatly improve the performance of large-scale applications or reduce the compressed size of ...

View Paper PDF DOI

Show 1 reference in code

llvm/include/llvm/Support/BalancedPartitioning.h:35

On-line construction of suffix trees

Esko Ukkonen

1995

2 references

View Paper PDF DOI

Show 1 reference in code

llvm/include/llvm/Support/SuffixTree.h:26

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, D...

2017

6 references

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be impl...

View Paper PDF DOI

Show 1 reference in code

mlir/docs/Quantization.md:11

Machine Learning Systems are Stuck in a Rut

Paul Barham, Michael Isard

2019

1 reference

In this paper we argue that systems for numerical computing are stuck in a local basin of performance and programmability. Systems researchers are doing an excellent job improving the performance of 5-year-old benchmarks, but gradually making it harder to explore innovative machine learning research...

View Paper PDF DOI

Show 1 reference in code

mlir/docs/Rationale/RationaleLinalgDialect.md:317

Combining analyses, combining optimizations

Cliff Click, Keith D. Cooper

1995

1 reference

Modern optimizing compilers use several passes over a program's intermediate representation to generate good code. Many of these optimizations exhibit a phase-ordering problem. Getting the best code may require iterating optimizations until a fixed point is reached. Combining these phases can lead t...

View Paper PDF DOI

Show 1 reference in code

mlir/docs/Rationale/RationaleLinalgDialect.md:588

ompTest – Unit Testing with OMPT

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva R. Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieb...

2024

1 reference

OpenMP® is a widely used API in high-performance computing that enables parallelization on the host as well as offload work to an accelerator, such as a GPU. The OpenMP specification defines an OpenMP Tool Interface (OMPT), which allows a third-party tool be notified about OpenMP runtime events. Ens...

View Paper DOI

Show 1 reference in code

openmp/tools/omptest/README.md:276

OMPTBench – OpenMP Tool Interface Conformance Testing

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva R. Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieb...

2024

1 reference

OpenMP® is a highly relevant parallelization standard in high-performance computing and all major compiler vendors support it. The standard defines the OpenMP Tool Interface (OMPT) as a mechanism for third-party tools to obtain information on dedicated runtime events. However, the implementation sta...

View Paper DOI

Show 1 reference in code

openmp/tools/omptest/README.md:279

Link copied to clipboard!