⚙️

Classical Compilers

Production implementations of compiler algorithms and techniques

Repositories

(4)

gcc-mirror/gcc

14 papers

llvm/llvm-project

29 papers

rust-lang/rust

20 papers

swiftlang/swift

11 papers

Papers

(70)

32-bit cyclic redundancy codes for Internet applications

P. Koopman
2002
1 reference

Standardized 32-bit Cyclic Redundancy Codes provide fewer bits of guaranteed error detection than they could, achieving a Hamming Distance (HD) of only 4 for maximum-length Ethernet messages, whereas HD=6 is possible. Although research has revealed i...

8-bit Numerical Formats for Deep Neural Networks

Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, Carlo Luschi
2022
12 references

Given the current trend of increasing size and complexity of machine learning architectures, it has become of critical importance to identify new approaches to improve the computational efficiency of model training. In this context, we address the ad...

Accurate garbage collection in an uncooperative environment.

Fergus Henderson
2002
1 reference

Previous attempts at garbage collection in uncooperative environments have generally used conservative or mostly-conservative approaches. We describe a technique for doing fully type-accurate garbage collection in an uncooperative environment, using ...

Alias burying: Unique variables without destructive reads

J. Boyland
2001
1 reference

Abstract An unshared object can be accessed without regard to possible conflicts with other parts of a system, whether concurrent or single‐threaded. A unique variable (sometimes known as a ‘free’ or ‘linear’ variable) is one that either is null or e...

An abstract interpretation for SPMD divergence on reducible control flow graphs

Julian Rosemann, Simon Moll, Sebastian Hack
2021
1 reference

Vectorizing compilers employ divergence analysis to detect at which program point a specific variable is uniform, i.e. has the same value on all SPMD threads that execute this program point. They exploit uniformity to retain branching to counter bran...

AN ANALYSIS OF THE LANCZOS GAMMA APPROXIMATION

Glendon Ralph Pugh
2004
1 reference

This thesis is an analysis of C . Lanczos' approximation of the classical gamma function Γ(z + 1) as given in his 1964 paper "A Precision Approximation of the Gamma Function". The purposes of this study are: (i) to explain the details of Lanczos' pap...

An Experimental Study of Dynamic Dominators

Loukas Georgiadis, Giuseppe F. Italiano, Luigi Laura, Federico Santaroni
2016
20 citations
2 references

Motivated by recent applications of dominator computations, we consider the problem of dynamically maintaining the dominators of flow graphs through a sequence of insertions and deletions of edges. Our main theoretical contribution is a simple increm...

A Time and Space-Efficient Compositional Method for Prime and Test Paths Generation.

Ebrahim Fazli, Mohsen Afsharchi
2019
1 reference

This paper investigates the problem of prime and test paths generation, which is an important problem in ensuring path coverage in software testing. Most existing methods for prime/test paths generation have little success in generating the set of al...

BlockQuicksort: Avoiding Branch Mispredictions in Quicksort

2016
1 reference

Since the work of Kaligosi and Sanders (2006), it is well-known that Quicksort - which is commonly considered as one of the fastest in-place sorting algorithms - suffers in an essential way from branch mispredictions. We present a novel approach to a...

Combining analyses, combining optimizations

C. Click, K. Cooper
1995
1 reference

Modern optimizing compilers use several passes over a program's intermediate representation to generate good code. Many of these optimizations exhibit a phase-ordering problem. Getting the best code may require iterating optimizations until a fixed p...

Concurrent Hash Tables

Tobias Maier, P. Sanders, Roman Dementiev
2019
1 reference

Concurrent hash tables are one of the most important concurrent data structures, which are used in numerous applications. For some applications, it is common that hash table accesses dominate the execution time. To efficiently solve these problems in...

Consistently faster and smaller compressed bitmaps with Roaring

Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser
2016
1 reference

Compressed bitmap indexes are used in databases and search engines. Many bitmap compression techniques have been proposed, almost all relying primarily on run-length encoding (RLE). However, on unsorted data, we can get superior performance with a hy...

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
2015
9 references

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual function...

Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator

Fei Wang, Daniel Zheng, James Decker, Xilun Wu, Grégory M. Essertel, Tiark Rompf
2018
1 reference

Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural network by backp...

Dependence Graphs and Compiler Optimizations.

David J. Kuck, Robert H. Kuhn, David A. Padua, Bruce Leasure, Michael Wolfe
1981
1 reference

Dependence graphs can be used as a vehicle for formulating and implementing compiler optimizations. This paper defines such graphs and discusses two kinds of transformations. The first are simple rewriting transformations that remove dependence arcs....

Dynamic storage allocation: A survey and critical review

Paul R. Wilson, Mark S. Johnstone, Michael J. Neely, David B. Boles
1995
1 reference

Efficient Implementation of Lattice Operations.

Hassan Aït-Kaci, Robert S. Boyer, Patrick Lincoln, Roger Nasr
1989
1 reference

Lattice operations such as greatest lower bound (GLB), least upper bound (LUB), and relative complementation (BUTNOT) are becoming more and more important in programming languages supporting object inheritance. We present a general technique for the ...

Emulation of a FMA and Correctly Rounded Sums: Proved Algorithms Using Rounding to Odd

Sylvie Boldo, Guillaume Melquiond
2008
1 reference

International audience

Experience Report: Developing the Servo Web Browser Engine using Rust

2015
1 reference

All modern web browsers - Internet Explorer, Firefox, Chrome, Opera, and Safari - have a core rendering engine written in C++. This language choice was made because it affords the systems programmer complete control of the underlying hardware feature...

Fast and scalable minimal perfect hashing for massive key sets

Antoine Limasset, Guillaume Rizk, Rayan Chikhi, Pierre Peterlongo
2017
1 reference

Minimal perfect hash functions provide space-efficient and collision-free hashing on static sets. Existing algorithms and implementations that build such functions have practical limitations on the number of input elements they can process, due to hi...

Fast Deterministic Selection.

Andrei Kulikovsky
2017
1 reference

Faster Remainder by Direct Computation: Applications to Compilers and Software Libraries

Daniel Lemire, Owen Kaser, Nathan Kurz
2019
1 reference

On common processors, integer multiplication is many times faster than integer division. Dividing a numerator n by a divisor d is mathematically equivalent to multiplication by the inverse of the divisor (n / d = n x 1/d). If the divisor is known in ...

Fast Random Integer Generation in an Interval

Daniel Lemire
2018
2 references

In simulations, probabilistic algorithms and statistical tests, we often generate random integers in an interval (e.g., [0,s)). For example, random integers in an interval are essential to the Fisher-Yates random shuffle. Consequently, popular langua...

Fast splittable pseudorandom number generators

G. Steele, D. Lea, Christine H. Flood
2014
2 references

We describe a new algorithm SplitMix for an object-oriented and splittable pseudorandom number generator (PRNG) that is quite fast: 9 64-bit arithmetic/logical operations per 64 bits generated. A conventional linear PRNG object provides a generate me...

FP8 Formats for Deep Learning

Paulius Micikevicius, Dusan Stosic, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwait...
2022
14 references

FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors. In this paper we propose an 8-bit floating point (FP8) binary interchange format consisting of two encodings - E4M3 ...

GPU Accelerated Automatic Differentiation With Clad

Ioana Ifrim, Vassil Vassilev, David J Lange
2022
1 reference

Automatic Differentiation (AD) is instrumental for science and industry. It is a tool to evaluate the derivative of a function specified through a computer program. The range of AD application domain spans from Machine Learning to Robotics to High En...

Graph-Based Algorithms for Boolean Function Manipulation

R. Bryant
1986
1 reference

In this paper we present a new data structure for representing Boolean functions and an associated set of manipulation algorithms. Functions are represented by directed, acyclic graphs in a manner similar to the representations introduced by Lee [1] ...

How to make ad-hoc polymorphism less ad hoc

P. Wadler, Stephen Blott
1989
1 reference

This paper presents type classes, a new approach to ad-hoc polymorphism. Type classes permit overloading of arithmetic operators such as multiplication, and generalise the "eqtype variables" of Standard ML. Type classes extend the Hindley/Milner poly...

How to read floating point numbers accurately

William D. Clinger
1990
1 reference

Consider the problem of converting decimal scientific notation for a number into the best binary floating point approximation to that number, for some fixed precision. This problem cannot be solved using arithmetic of any fixed precision. Hence the I...

Implementing a Generic Radix Trie in Rust

Michael Cochez, Ferrante Neri
2015
1 reference

IR2VEC

S. VenkataKeerthy, Rohit Aggarwal, Shalini Jain, M. Desarkar, Ramakrishna Upadrasta, Y. Srikant
2020
4 references

We propose IR2VEC, a Concise and Scalable encoding infrastructure to represent programs as a distributed embedding in continuous space. This distributed embedding is obtained by combining representation learning methods with flow information to captu...

Machine Learning Systems are Stuck in a Rut

P. Barham, M. Isard
2019
1 reference

In this paper we argue that systems for numerical computing are stuck in a local basin of performance and programmability. Systems researchers are doing an excellent job improving the performance of 5-year-old benchmarks, but gradually making it hard...

Macros that Work Together

2012
3 references

Abstract Racket is a large language that is built mostly within itself. Unlike the usual approach taken by non-Lisp languages, the self-hosting of Racket is not a matter of bootstrapping one implementation through a previous implementation, but inste...

Memory Tagging and how it improves C/C++ memory safety

Kostya Serebryany, Evgenii Stepanov, Aleksey Shlyapnikov, Vlad Tsyrklevich, Dmitry Vyukov
2018
2 references

Memory safety in C and C++ remains largely unresolved. A technique usually called "memory tagging" may dramatically improve the situation if implemented in hardware with reasonable overhead. This paper describes two existing implementations of memory...

MOLPIPx: an end-to-end differentiable package for permutationally invariant polynomials in Python and Rust

Manuel S. Drehwald, Asma Jamali, Rodrigo A. Vargas-Hernández
2024
1 reference

In this work, we present MOLPIPx, a versatile library designed to seamlessly integrate Permutationally Invariant Polynomials (PIPs) with modern machine learning frameworks, enabling the efficient development of linear models, neural networks, and Gau...

Newton’s Method Without Division

Jeffrey D. Blanchard, M. Chamberland
2023
4 citations
1 reference

Abstract Newton’s Method for root-finding is modified to avoid the division step while maintaining quadratic convergence.

Number Parsing at a Gigabyte per Second

Daniel Lemire
2021
7 references

With disks and networks providing gigabytes per second, parsing decimal numbers from strings becomes a bottleneck. We consider the problem of parsing decimal numbers to the nearest binary floating-point value. The general problem requires variable-pr...

OMPTBench – OpenMP Tool Interface Conformance Testing

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieberm...
2024
1 reference

OpenMP® is a highly relevant parallelization standard in high-performance computing and all major compiler vendors support it. The standard defines the OpenMP Tool Interface (OMPT) as a mechanism for third-party tools to obtain information on dedicat...

ompTest – Unit Testing with OMPT

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieberm...
2024
1 reference

OpenMP® is a widely used API in high-performance computing that enables parallelization on the host as well as offload work to an accelerator, such as a GPU. The OpenMP specification defines an OpenMP Tool Interface (OMPT), which allows a third-party...

Optimistic and Scalable Global Function Merging

Kyungwoo Lee, Manman Ren, Ellis Hoag
2024
1 reference

Function merging is a pivotal technique for reducing code size by combining identical or similar functions into a single function. While prior research has extensively explored this technique, it has not been assessed in conjunction with function out...

Optimization of cyclic redundancy-check codes with 24 and 32 parity bits

Guy Castagnoli, Stefan Brauer, M. Herrmann
1993
1 reference

The method developed by T. Fujiwara et al. (1985) for efficiently computing the minimum distance of shortened Hamming codes using the weight distribution of their dual codes is extended to treat arbitrary shortened cyclic codes. Using this method imp...

Optimizing Function Layout for Mobile Applications

Ellis Hoag, Kyungwoo Lee, Julián Mestre, Sergey Pupyrev
2022
9 citations
1 reference

Function layout, also referred to as function reordering or function placement, is one of the most effective profile-guided compiler optimizations. By reordering functions in a binary, compilers are able to greatly improve the performance of large-sc...

Oxide: The Essence of Rust

Aaron Weiss, Olek Gierczak, Daniel Patterson, Amal Ahmed
2019
1 reference

Rust claims to advance industrial programming by bridging the gap between low-level systems programming and high-level application programming. At the heart of the argument that this enables programmers to build more reliable and efficient software i...

Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.6

Dan Bonachea, Katherine Rasmussen
2025
1 reference

This document specifies an interface to support the multi-image parallelism features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a solution in which a runtime library is primarily responsible for implementing coarray ...

Polymorphism, subtyping, and type inference in MLsub

Stephen Dolan, Alan Mycroft
2017
1 reference

We present a type system combining subtyping and ML-style parametric polymorphism. Unlike previous work, our system supports type inference and has compact principal types. We demonstrate this system in the minimal language MLsub, which types a stric...

Practical type inference for arbitrary-rank types

Simon L. Peyton Jones, Dimitrios Vytiniotis, Stephanie Weirich, Mark Shields
2007
2 references

Abstract Haskell's popularity has driven the need for ever more expressive type system features, most of which threaten the decidability and practicality of Damas-Milner type inference. One such feature is the ability to write functions with higher-r...

Printing floating-point numbers: a faster, always correct method

Marc Andrysco, Ranjit Jhala, Sorin Lerner
2016
1 reference

Floating-point numbers are an essential part of modern software, recently gaining particular prominence on the web as the exclusive numeric format of Javascript. To use floating-point numbers, we require a way to convert binary machine representation...

Printing floating-point numbers quickly and accurately with integers

Florian Loitsch
2010
1 reference

We present algorithms for accurately converting floating-point numbers to decimal representation. They are fast (up to 4 times faster than commonly used algorithms that use high-precision integers) and correct: any printed number will evaluate to the...

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, D...
2017
6 references

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried ou...

Reenix: Implementing a Unix-Like Operating System in Rust

2015
1 reference

This paper describes the experience, problems and successes found in implementing a unix-like operating system kernel in rust. Using the basic design and much of the lowest-level support code from the Weenix operating system written for CS167/9 I was...

RL4ReAl: Reinforcement Learning for Register Allocation

S. VenkataKeerthy, Siddhartha Jain, Rohit Aggarwal, Albert Cohen, Ramakrishna Upadrasta
2022
4 references

We aim to automate decades of research and experience in register allocation, leveraging machine learning. We tackle this problem by embedding a multi-agent reinforcement learning algorithm within LLVM, training it with the state of the art technique...

RLIBM-PROG: Progressive Polynomial Approximations for Fast Correctly Rounded Math Libraries

Mridul Aanjaneya, Jay P. Lim, Santosh Nagarakatte
2021
2 references

This paper presents a novel method for generating a single polynomial approximation that produces correctly rounded results for all inputs of an elementary function for multiple representations. The generated polynomial approximation has the nice pro...

Rust as a language for high performance GC implementation

Yi Lin, Stephen M. Blackburn, Antony L. Hosking, Michael Norrish
2018
1 reference

High performance garbage collectors build upon performance-critical low-level code, typically exhibit multiple levels of concurrency, and are prone to subtle bugs. Implementing, debugging and maintaining such collectors can therefore be extremely cha...

Ryū: fast float-to-string conversion

Ulf Adams
2018
2 references

We present Ryū, a new routine to convert binary floating point numbers to their decimal representations using only fixed-size integer operations, and prove its correctness. Ryū is simpler and approximately three times faster than the previously faste...

Ryū revisited: printf floating point conversion

Ulf Adams
2019
2 references

Ryū Printf is a new algorithm to convert floating-point numbers to decimal strings according to the printf %f, %e, and %g formats: %f generates ‘full’ output (integer part of the input, dot, configurable number of digits), %e generates scientific out...

Scrambled Linear Pseudorandom Number Generators

David Blackman, Sebastiano Vigna
2018
1 reference

$\mathbf F_2$-linear pseudorandom number generators are very popular due to their high speed, to the ease with which generators with a sizable state space can be created, and to their provable theoretical properties. However, they suffer from linear ...

String-rewriting systems

1993
1 reference

In this chapter we introduce the string-rewriting systems and study their basic properties. Such systems are the primary subject of this work. We provide formal definitions of string-rewriting systems and their induced reduction relations and Thue co...

SyRust: automatic testing of Rust libraries with semantic-aware program synthesis

Yoshiki Takashima, Ruben Martins, Limin Jia, C. Păsăreanu
2021
1 reference

Rust’s type system ensures the safety of Rust programs; however, programmers can side-step some of the strict typing rules by using the unsafe keyword. A common use of unsafe Rust is by libraries. Bugs in these libraries undermine the safety of the e...

Taming the wildcards: combining definition- and use-site variance

John Altidor, S. Huang, Y. Smaragdakis
2011
1 reference

Variance allows the safe integration of parametric and subtype polymorphism. Two flavors of variance, definition-site versus use-site variance, have been studied and have had their merits hotly debated. Definition-site variance (as in Scala and C#) o...

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S....
2018
3 references

Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding, ranking user p...

The program dependence graph and its use in optimization

1987
1 reference

In this paper we present an intermediate program representation, called the program dependence graph ( PDG ), that makes explicit both the data and control dependences for each operation in a program. Data dependences have been used to represent only...

The simple essence of automatic differentiation

Conal Elliott
2018
1 reference

Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep understan...

Three layer cake for shared-memory programming

A. Robison, Ralph E. Johnson
2010
1 reference

There are many different styles of parallel programming for shared-memory hardware. Each style has strengths, but can conflict with other styles. How can we use a variety of these styles in one program and minimize their conflict and maximize perform...

Work-first and help-first scheduling policies for async-finish task parallelism

Yi Guo, R. Barik, Raghavan Raman, Vivek Sarkar
2009
1 reference

Multiple programming models are emerging to address an increased need for dynamic task parallelism in applications for multicore processors and shared-address-space parallel computing. Examples include OpenMP 3.0, Java Concurrency Utilities, Microsof...

Xorshift RNGs

2003
1 reference

Description of a class of simple, extremely fast random number generators (RNGs) with periods 2k - 1 for k = 32, 64, 96, 128, 160, 192. These RNGs seem to pass tests of randomness very well.