Papers - PaperGrep

Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method.

Andrew Knyazev

2001

3 references

We describe new algorithms of the locally optimal block preconditioned conjugate gradient (LOBPCG) method for symmetric eigenvalue problems, based on a local optimization of a three-term recurrence, and suggest several other new methods. To be able t...

View Paper DOI

Twofish : A 128-bit block cipher

B. Schneier

1998

3 references

View Paper

Dynamic storage allocation: A survey and critical review

Paul R. Wilson, Mark S. Johnstone, Michael J. Neely, David B. Boles

1995

3 references

View Paper DOI

Asymptotics for the minimum covariance determinant estimator

Ronald W. Butler, P. L. Davies, Myoungshic Jhun

1993

3 references

Consistency is shown for the minimum covariance determinant (MCD) estimators of multivariate location and scale and asymptotic normality is shown for the former. The proofs are made possible by showing a separating ellipsoid property for the MCD subs...

View Paper PDF DOI

Silhouettes: A graphical aid to the interpretation and validation of cluster analysis

Peter J. Rousseeuw

1987

3 references

View Paper DOI

Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using $\mathbb{F}_2$

Keren Zhou, Mario Lezcano, Adam Goucher, Akhmed Rakhmati, Jeff Niu, Justin Lebar, Pawel Szczerbuk, P...

2025

2 references

Efficient tensor computation is a cornerstone of modern deep learning (DL) workloads, yet existing approaches struggle to achieve flexible and performant design and implementation of tensor layouts -- mappings between logical tensors and hardware res...

View Paper PDF

Muon is Scalable for LLM Training

Jingyuan Liu, Jianlin Su, Xingcheng Yao, Zhejun Jiang, Guokun Lai, Yulun Du, Yiming Qin, Wei-Xin Xu,...

2025

2 references

Recently, the Muon optimizer based on matrix orthogonalization has demonstrated strong results in training small-scale language models, but the scalability to larger models has not been proven. We identify two crucial techniques for scaling up Muon: ...

View Paper PDF DOI

DeepSeek-V3 Technical Report

DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, C...

2024

2 references

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) an...

View Paper PDF DOI

DoRA: Weight-Decomposed Low-Rank Adaptation

Dacao Zhang, Fan Yang, Kun Zhang, Xin-Qiao Li, Wei Si, Richang Hong, Meng Wang

2024

2 references

Among the widely used parameter-efficient fine-tuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods an...

View Paper PDF DOI

Module-Lattice-Based Key-Encapsulation Mechanism Standard

National Institute of Standards and Technology (US)

2024

2 references

A key-encapsulation mechanism (KEM) is a set of algorithms that, under certain conditions, can be used by two parties to establish a shared secret key over a public channel. A shared secret key that is securely established using a KEM can then be use...

View Paper DOI

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree...

2024

2 references

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi...

View Paper PDF

Predicate Caching: Query-Driven Secondary Indexing for Cloud Data Warehouses

Tobias Schmidt, Andreas Kipf, Dominik Horn, Gaurav Saxena, Tim Kraska

2024

2 references

Cloud data warehouses are today's standard for analytical query processing. Multiple cloud vendors offer state-of-the-art systems, such as Amazon Redshift. We have observed that customer workloads experience highly repetitive query patterns, i.e., us...

View Paper DOI

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Da...

2024

2 references

Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy,...

View Paper PDF DOI

Evaluating YJIT’s Performance in a Production Context: A Pragmatic Approach

Maxime Chevalier-Boisvert, Takashi Kokubun, Noah Gibbs, Si Xing Wu, Aaron Patterson, Jemma Issroff

2023

2 references

Ruby is a dynamically-typed programming language with a large breadth of features which has grown in popularity with the rise of the modern web, and remains at the core of the implementation of widely-used online platforms such as Shopify, GitHub, Di...

View Paper PDF DOI

Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Shuli Jiang, Swanand Kadhe, Yi Zhou, Ling Cai, Nathalie Baracaldo

2023

2 references

Growing applications of large language models (LLMs) trained by a third party raise serious concerns on the security vulnerability of LLMs.It has been demonstrated that malicious actors can covertly exploit these vulnerabilities in LLMs through poiso...

View Paper PDF DOI

QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer

2023

2 references

We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bi...

View Paper PDF

Breadth-First Pipeline Parallelism

Joel Lamy-Poirier

2022

2 references

We introduce Breadth-First Pipeline Parallelism, a novel training schedule which optimizes the combination of pipeline and data parallelism. Breadth-First Pipeline Parallelism lowers training time, cost and memory usage by combining a high GPU utiliz...

View Paper PDF DOI

Model Comparison and Calibration Assessment: User Guide for Consistent Scoring Functions in Machine Learning and Actuarial Practice.

Tobias Fissler, Silvana M. Pesenti

2022

2 references

View Paper DOI

Reducing Activation Recomputation in Large Transformer Models

Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, B...

2022

2 references

Training large transformer models is one of the most important computational challenges of modern AI. In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation. Activation recompu...

View Paper PDF DOI

Copy-and-patch compilation: a fast compilation algorithm for high-level languages and bytecode

2021

2 references

Fast compilation is important when compilation occurs at runtime, such as query compilers in modern database systems and WebAssembly virtual machines in modern browsers. We present copy-and-patch, an extremely fast compilation technique that also pro...

View Paper DOI