Showing 20 of 232 papers

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun
2013
2 references

We present an integrated framework for using Convolutional Networks for classification, localization and detection. We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep l...

Proximal Stochastic Dual Coordinate Ascent

Shai Shalev-Shwartz, Tong Zhang
2012
2 references

We introduce a proximal version of dual coordinate ascent method. We demonstrate how the derived algorithmic framework can be used for numerous regularized loss minimization problems, including $\ell_1$ regularization and structured output SVM. The c...

The Complex Gradient Operator and the CR-Calculus

Ken Kreutz-Delgado
2009
2 references

A thorough discussion and development of the calculus of real-valued functions of complex-valued vectors is given using the framework of the Wirtinger Calculus. The presented material is suitable for exposition in an introductory Electrical Engineeri...

Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method.

Andrew V. Knyazev
2001
2 references

We describe new algorithms of the locally optimal block preconditioned conjugate gradient (LOBPCG) method for symmetric eigenvalue problems, based on a local optimization of a three-term recurrence, and suggest several other new methods. To be able t...

Compression and Structure of Monolayers of Charged Latex Particles at Air/Water and Octane/Water Interfaces

{"Robert Aveyard","John H. Clint","Dieter Nees","Vesselin N. Paunov"}
1999
2 references

We have studied the compression and structure of compressed monolayers of sulfate polystyrene latex particles on air/water and octane/water interfaces. If compressed sufficiently (on a Langmuir trough) the monolayers at air/water surfaces give rafts ...

Estimating the mean and variance of the target probability distribution

D. Nix, A. Weigend
1994
2 references

Introduces a method that estimates the mean and the variance of the probability distribution of the target as a function of the input, given an assumed target error-distribution model. Through the activation of an auxiliary output unit, this method p...

The differentiation of pseudo-inverses and non-linear least squares problems whose variables separate

G. Golub, V. Pereyra
1972
2 references

For given data $(t_i ,y_i ),i = 1, \cdots ,m$, we consider the least squares fit of nonlinear models of the form \[ \eta ({\bf a},{\boldsymbol \alpha} ;t) = \sum _{j = 1}^n {a_j \varphi _j ({\boldsymbol \alpha} ;t),\qquad {\bf a} \in \mathcal{R}^n ,\...

Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.6

Dan Bonachea, Katherine Rasmussen
2025
1 reference

This document specifies an interface to support the multi-image parallelism features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a solution in which a runtime library is primarily responsible for implementing coarray ...

OMPTBench – OpenMP Tool Interface Conformance Testing

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieberm...
2024
1 reference

OpenMP® is a highly relevant parallelization standard in high-performance computing and all major compiler vendors support it. The standard defines the OpenMP Tool Interface (OMPT) as a mechanism for third-party tools to obtain information on dedicat...

ompTest – Unit Testing with OMPT

Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieberm...
2024
1 reference

OpenMP® is a widely used API in high-performance computing that enables parallelization on the host as well as offload work to an accelerator, such as a GPU. The OpenMP specification defines an OpenMP Tool Interface (OMPT), which allows a third-party...

Optimistic and Scalable Global Function Merging

Kyungwoo Lee, Manman Ren, Ellis Hoag
2024
1 reference

Function merging is a pivotal technique for reducing code size by combining identical or similar functions into a single function. While prior research has extensively explored this technique, it has not been assessed in conjunction with function out...

Newton’s Method Without Division

Jeffrey D. Blanchard, M. Chamberland
2023
1 reference

Abstract Newton’s Method for root-finding is modified to avoid the division step while maintaining quadratic convergence.

GPU Accelerated Automatic Differentiation With Clad

Ioana Ifrim, Vassil Vassilev, David J Lange
2022
1 reference

Automatic Differentiation (AD) is instrumental for science and industry. It is a tool to evaluate the derivative of a function specified through a computer program. The range of AD application domain spans from Machine Learning to Robotics to High En...

Optimizing Function Layout for Mobile Applications

Ellis Hoag, Kyungwoo Lee, Julián Mestre, Sergey Pupyrev
2022
1 reference

Function layout, also referred to as function reordering or function placement, is one of the most effective profile-guided compiler optimizations. By reordering functions in a binary, compilers are able to greatly improve the performance of large-sc...

RL4ReAl: Reinforcement Learning for Register Allocation

S. VenkataKeerthy, Siddharth Jain, Anilava Kundu, Rohit Aggarwal, Albert Cohen, Ramakrishna Upadrast...
2022
1 reference

We aim to automate decades of research and experience in register allocation, leveraging machine learning. We tackle this problem by embedding a multi-agent reinforcement learning algorithm within LLVM, training it with the state of the art technique...

An abstract interpretation for SPMD divergence on reducible control flow graphs

Julian Rosemann, Simon Moll, Sebastian Hack
2021
1 reference

Vectorizing compilers employ divergence analysis to detect at which program point a specific variable is uniform, i.e. has the same value on all SPMD threads that execute this program point. They exploit uniformity to retain branching to counter bran...

Number Parsing at a Gigabyte per Second

Daniel Lemire
2021
1 reference

With disks and networks providing gigabytes per second, parsing decimal numbers from strings becomes a bottleneck. We consider the problem of parsing decimal numbers to the nearest binary floating-point value. The general problem requires variable-pr...

Revisiting ResNets: Improved Training and Scaling Strategies

Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shle...
2021
1 reference

Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 201...

TensorFlow

TensorFlow Developers
2021
1 reference

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deplo...