Papers - PaperGrep

Efficient Simulation of the von Mises Distribution

D. J. Best, N. I. Fisher

1979

2 references

View Paper DOI

An Efficient Method for Generating Discrete Random Variables with General Distributions

A.J. Walker

1977

2 references

article Free Access Share on An Efficient Method for Generating Discrete Random Variables with General Distributions Author: Alastair J. Walker Department of Electrical Engineering, University of Witwatersrand, 1 Jan Smuts Ave., Johnnesburg 2001, Sou...

View Paper PDF DOI

The differentiation of pseudo-inverses and non-linear least squares problems whose variables separate

Gene H. Golub, Víctor Pereyra

1972

2 references

For given data $(t_i ,y_i ),i = 1, \cdots ,m$, we consider the least squares fit of nonlinear models of the form \[ \eta ({\bf a},{\boldsymbol \alpha} ;t) = \sum _{j = 1}^n {a_j \varphi _j ({\boldsymbol \alpha} ;t),\qquad {\bf a} \in \mathcal{R}^n ,\...

View Paper DOI

On Grouping for Maximum Homogeneity

Walter D. Fisher

1958

2 references

Abstract Given a set of arbitrary numbers, what is a practical procedure for grouping them so that the variance within groups is minimized? An answer to this question, including a description of an automatic computer program, is given for problems up...

View Paper DOI

VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY

Glenn W. Brier

1950

2 references

View Paper DOI

Evolutionary-scale prediction of atomic level protein structure with a language model

Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Ver...

2 references

AbstractArtificial intelligence has the potential to open insight into the structure of proteins at the scale of evolution. It has only recently been possible to extend protein structure prediction to two hundred million cataloged proteins. Character...

View Paper DOI

A Stop-the-World Debugger for Erlang (and the BEAM)

Daniel Gorin, Björn Gustavsson, Roberto Aloi

2025

1 reference

Erlang and the BEAM are remarkable for their tracing capabilities and the type of troubleshooting this enables on live production systems. At other stages of the development cycle, though, a traditional debugger is arguably more natural and convenien...

View Paper DOI

Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization

Vage Egiazarian, Roberto L. Castro, Denis Kuznedelev, Andrei Panferov, Eldar Kurtic, Shubhra Pandit,...

2025

1 reference

The recent hardware-accelerated microscaling 4-bit floating-point formats such as MXFP4 and NVFP4, supported on NVIDIA and AMD GPUs, promise to revolutionize large language model (LLM) inference. Yet, their practical benefits remain unproven. We pres...

View Paper PDF

Decoding the Molecular Language of Proteins with Evolla

Xibin Zhou, Chenchen Han, Yingqi Zhang, Jin Su, Kai Zhuang, Shiyu Jiang, Zichen Yuan, Wei Zheng, Fen...

2025

1 reference

Abstract Proteins, nature’s intricate molecular machines, are the products of billions of years of evolution and play fundamental roles in sustaining life. Yet, deciphering their molecular language - that is, understanding how protein sequences and s...

View Paper PDF DOI

Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.6

Dan Bonachea, Katherine Rasmussen

2025

1 reference

This document specifies an interface to support the multi-image parallelism features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a solution in which a runtime library is primarily responsible for implementing coarray ...

View Paper PDF DOI

Recipes for Pre-training LLMs with MXFP8

Asit Mishra, Dusan Stosic, Simon Layton, Paulius Micikevicius

2025

1 reference

Using fewer bits to represent model parameters and related tensors during pre-training has become a required technique for improving GPU efficiency without sacrificing accuracy. Microscaling (MX) formats introduced in NVIDIA Blackwell generation of G...

View Paper PDF

Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram

2025

1 reference

Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coherence. ...

View Paper PDF

Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler

Size Zheng, Wenlei Bao, Qi Hou, Xuegui Zheng, Jin Fang, Chenhui Huang, Tianqi Li, Haojie Duanmu, Ren...

2025

1 reference

In this report, we propose Triton-distributed, an extension of existing Triton compiler, to overcome the programming challenges in distributed AI systems. Triton-distributed is the first compiler that supports native overlapping optimizations for dis...

View Paper PDF

Compact Language Models via Pruning and Knowledge Distillation

Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary...

2024

1 reference

Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive. In this paper, we investigate if pruning an existing LLM and then re-train...

View Paper PDF

Deriving Activation Functions Using Integration

Allen Hao Huang, Imanol Schlag

2024

1 reference

Our work proposes a novel approach to designing activation functions by focusing on their gradients and deriving the corresponding activation functions using integration. We introduce the Expanded Integral of the Exponential Linear Unit (xIELU), a tr...

View Paper PDF

Lightweight, Modular Verification for WebAssembly-to-Native Instruction Selection

Alexa VanHattum, Monica Pardeshi, Chris Fallin, Adrian Sampson, Fraser Brown

2024

1 reference

Language-level guarantees---like module runtime isolation for WebAssembly (Wasm)---are only as strong as the compiler that produces a final, native-machine-specific executable. The process of lowering language-level constructions to ISA-specific inst...

View Paper PDF DOI

MOLPIPx: an end-to-end differentiable package for permutationally invariant polynomials in Python and Rust

Manuel S. Drehwald, Asma Jamali, Rodrigo A. Vargas–Hernández

2024

1 reference

In this work, we present MOLPIPx, a versatile library designed to seamlessly integrate Permutationally Invariant Polynomials (PIPs) with modern machine learning frameworks, enabling the efficient development of linear models, neural networks, and Gau...

View Paper PDF DOI

OMPTBench – OpenMP Tool Interface Conformance Testing

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva R. Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieb...

2024

1 reference

OpenMP® is a highly relevant parallelization standard in high-performance computing and all major compiler vendors support it. The standard defines the OpenMP Tool Interface (OMPT) as a mechanism for third-party tools to obtain information on dedicat...

View Paper DOI

ompTest – Unit Testing with OMPT

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva R. Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieb...

2024

1 reference

OpenMP® is a widely used API in high-performance computing that enables parallelization on the host as well as offload work to an accelerator, such as a GPU. The OpenMP specification defines an OpenMP Tool Interface (OMPT), which allows a third-party...

View Paper DOI

Optimistic and Scalable Global Function Merging

Kyungwoo Lee, Manman Ren, Ellis Hoag

2024

1 reference

Function merging is a pivotal technique for reducing code size by combining identical or similar functions into a single function. While prior research has extensively explored this technique, it has not been assessed in conjunction with function out...

View Paper PDF DOI