Showing 10 of 10 papers

ompTest – Unit Testing with OMPT

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieberm...
2024
1 reference

OpenMP® is a widely used API in high-performance computing that enables parallelization on the host as well as offload work to an accelerator, such as a GPU. The OpenMP specification defines an OpenMP Tool Interface (OMPT), which allows a third-party tool be notified about OpenMP runtime events. Ens...

OMPTBench – OpenMP Tool Interface Conformance Testing

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieberm...
2024
1 citation
1 reference

OpenMP® is a highly relevant parallelization standard in high-performance computing and all major compiler vendors support it. The standard defines the OpenMP Tool Interface (OMPT) as a mechanism for third-party tools to obtain information on dedicated runtime events. However, the implementation sta...

An abstract interpretation for SPMD divergence on reducible control flow graphs

Julian Rosemann, Simon Moll, Sebastian Hack
2021
8 citations
1 reference

Vectorizing compilers employ divergence analysis to detect at which program point a specific variable is uniform, i.e. has the same value on all SPMD threads that execute this program point. They exploit uniformity to retain branching to counter branch divergence and defer computations to scalar pro...

The program dependence graph and its use in optimization

Jeanne Ferrante, Karl J. Ottenstein, Joe D. Warren
1987
1 reference

<jats:p>In this paper we present an intermediate program representation, called the<jats:italic>program dependence graph</jats:italic>(<jats:italic>PDG</jats:italic>), that makes explicit both the data and control dependences for each operation in a program. Data dependences have been used to repres...

Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.6

Dan Bonachea, Katherine Rasmussen
2025
1 reference

This document specifies an interface to support the multi-image parallelism features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a solution in which a runtime library is primarily responsible for implementing coarray allocation, deallocation and accesses, image synch...

Memory Tagging and how it improves C/C++ memory safety

Kostya Serebryany, Evgenii Stepanov, Aleksey Shlyapnikov, Vlad Tsyrklevich, Dmitry Vyukov
2018
2 references

Memory safety in C and C++ remains largely unresolved. A technique usually called "memory tagging" may dramatically improve the situation if implemented in hardware with reasonable overhead. This paper describes two existing implementations of memory tagging: one is the full hardware implementation ...

Optimizing Function Layout for Mobile Applications

Ellis Hoag, Kyungwoo Lee, Julián Mestre, Sergey Pupyrev
2022
1 reference

Function layout, also referred to as function reordering or function placement, is one of the most effective profile-guided compiler optimizations. By reordering functions in a binary, compilers are able to greatly improve the performance of large-scale applications or reduce the compressed size of ...

Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Srinivas Sridharan, Taekyung Heo, Louis Feng, Zhaodong Wang, Matt Bergeron, Wenyin Fu, Shengbao Zhen...
2023
4 references

Benchmarking and co-design are essential for driving optimizations and innovation around ML models, ML software, and next-generation hardware. Full workload benchmarks, e.g. MLPerf, play an essential role in enabling fair comparison across different software and hardware stacks especially once syste...

Reliable and fast DWARF-based stack unwinding

T. Bastian, Stephen Kell, Francesco Zappa Nardelli
2019
2 references

Debug information, usually encoded in the DWARF format, is a hidden and obscure component of our computing infrastructure. Debug information is obviously used by debuggers, but it also plays a key role in program analysis tools, and, most surprisingly, it can be relied upon by the runtime of high-le...

Machine Learning Systems are Stuck in a Rut

P. Barham, M. Isard
2019
76 citations
1 reference

In this paper we argue that systems for numerical computing are stuck in a local basin of performance and programmability. Systems researchers are doing an excellent job improving the performance of 5-year-old benchmarks, but gradually making it harder to explore innovative machine learning research...