Showing 20 of 613 papers

An Efficient Method for Generating Discrete Random Variables with General Distributions

A.J. Walker
1977
2 references

article Free Access Share on An Efficient Method for Generating Discrete Random Variables with General Distributions Author: Alastair J. Walker Department of Electrical Engineering, University of Witwatersrand, 1 Jan Smuts Ave., Johnnesburg 2001, Sou...

The differentiation of pseudo-inverses and non-linear least squares problems whose variables separate

Gene H. Golub, Víctor Pereyra
1972
2 references

For given data $(t_i ,y_i ),i = 1, \cdots ,m$, we consider the least squares fit of nonlinear models of the form \[ \eta ({\bf a},{\boldsymbol \alpha} ;t) = \sum _{j = 1}^n {a_j \varphi _j ({\boldsymbol \alpha} ;t),\qquad {\bf a} \in \mathcal{R}^n ,\...

On Grouping for Maximum Homogeneity

Walter D. Fisher
1958
2 references

Abstract Given a set of arbitrary numbers, what is a practical procedure for grouping them so that the variance within groups is minimized? An answer to this question, including a description of an automatic computer program, is given for problems up...

Evolutionary-scale prediction of atomic level protein structure with a language model

Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Ver...
2 references

AbstractArtificial intelligence has the potential to open insight into the structure of proteins at the scale of evolution. It has only recently been possible to extend protein structure prediction to two hundred million cataloged proteins. Character...

A Stop-the-World Debugger for Erlang (and the BEAM)

Daniel Gorin, Björn Gustavsson, Roberto Aloi
2025
1 reference

Erlang and the BEAM are remarkable for their tracing capabilities and the type of troubleshooting this enables on live production systems. At other stages of the development cycle, though, a traditional debugger is arguably more natural and convenien...

Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization

Vage Egiazarian, Roberto L. Castro, Denis Kuznedelev, Andrei Panferov, Eldar Kurtic, Shubhra Pandit,...
2025
1 reference

The recent hardware-accelerated microscaling 4-bit floating-point formats such as MXFP4 and NVFP4, supported on NVIDIA and AMD GPUs, promise to revolutionize large language model (LLM) inference. Yet, their practical benefits remain unproven. We pres...

Decoding the Molecular Language of Proteins with Evolla

Xibin Zhou, Chenchen Han, Yingqi Zhang, Jin Su, Kai Zhuang, Shiyu Jiang, Zichen Yuan, Wei Zheng, Fen...
2025
1 reference

Abstract Proteins, nature’s intricate molecular machines, are the products of billions of years of evolution and play fundamental roles in sustaining life. Yet, deciphering their molecular language - that is, understanding how protein sequences and s...

Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.6

Dan Bonachea, Katherine Rasmussen
2025
1 reference

This document specifies an interface to support the multi-image parallelism features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a solution in which a runtime library is primarily responsible for implementing coarray ...

Recipes for Pre-training LLMs with MXFP8

Asit Mishra, Dusan Stosic, Simon Layton, Paulius Micikevicius
2025
1 reference

Using fewer bits to represent model parameters and related tensors during pre-training has become a required technique for improving GPU efficiency without sacrificing accuracy. Microscaling (MX) formats introduced in NVIDIA Blackwell generation of G...

Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram
2025
1 reference

Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coherence. ...

Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler

Size Zheng, Wenlei Bao, Qi Hou, Xuegui Zheng, Jin Fang, Chenhui Huang, Tianqi Li, Haojie Duanmu, Ren...
2025
1 reference

In this report, we propose Triton-distributed, an extension of existing Triton compiler, to overcome the programming challenges in distributed AI systems. Triton-distributed is the first compiler that supports native overlapping optimizations for dis...

Compact Language Models via Pruning and Knowledge Distillation

Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary...
2024
1 reference

Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive. In this paper, we investigate if pruning an existing LLM and then re-train...

Deriving Activation Functions Using Integration

Allen Hao Huang, Imanol Schlag
2024
1 reference

Our work proposes a novel approach to designing activation functions by focusing on their gradients and deriving the corresponding activation functions using integration. We introduce the Expanded Integral of the Exponential Linear Unit (xIELU), a tr...

Lightweight, Modular Verification for WebAssembly-to-Native Instruction Selection

Alexa VanHattum, Monica Pardeshi, Chris Fallin, Adrian Sampson, Fraser Brown
2024
1 reference

Language-level guarantees---like module runtime isolation for WebAssembly (Wasm)---are only as strong as the compiler that produces a final, native-machine-specific executable. The process of lowering language-level constructions to ISA-specific inst...

MOLPIPx: an end-to-end differentiable package for permutationally invariant polynomials in Python and Rust

Manuel S. Drehwald, Asma Jamali, Rodrigo A. Vargas–Hernández
2024
1 reference

In this work, we present MOLPIPx, a versatile library designed to seamlessly integrate Permutationally Invariant Polynomials (PIPs) with modern machine learning frameworks, enabling the efficient development of linear models, neural networks, and Gau...

OMPTBench – OpenMP Tool Interface Conformance Testing

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva R. Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieb...
2024
1 reference

OpenMP® is a highly relevant parallelization standard in high-performance computing and all major compiler vendors support it. The standard defines the OpenMP Tool Interface (OMPT) as a mechanism for third-party tools to obtain information on dedicat...

ompTest – Unit Testing with OMPT

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva R. Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieb...
2024
1 reference

OpenMP® is a widely used API in high-performance computing that enables parallelization on the host as well as offload work to an accelerator, such as a GPU. The OpenMP specification defines an OpenMP Tool Interface (OMPT), which allows a third-party...

Optimistic and Scalable Global Function Merging

Kyungwoo Lee, Manman Ren, Ellis Hoag
2024
1 reference

Function merging is a pivotal technique for reducing code size by combining identical or similar functions into a single function. While prior research has extensively explored this technique, it has not been assessed in conjunction with function out...