Papers
Browse academic papers referenced in production code
Properties of the Hubert-Arabie adjusted Rand index.
This article provides an investigation of cluster validation indices that relates 4 of the indices to the L. Hubert and P. Arabie (1985) adjusted Rand index--the cluster validation measure of choice (G. W. Milligan & M. C. Cooper, 1986). It is shown ...
A simple algorithm for finding frequent elements in streams and bags
We present a simple, exact algorithm for identifying in a multiset the items with frequency more than a threshold θ. The algorithm requires two passes, linear time, and space 1/θ. The first pass is an on-line algorithm, generalizing a well-known algo...
Accuracy and stability of numerical algorithms, Second Edition.
From the Publisher: What is the most accurate way to sum floating point numbers? What are the advantages of IEEE arithmetic? How accurate is Gaussian elimination and what were the key breakthroughs in the development of error analysis for the method?...
Accurate garbage collection in an uncooperative environment.
Previous attempts at garbage collection in uncooperative environments have generally used conservative or mostly-conservative approaches. We describe a technique for doing fully type-accurate garbage collection in an uncooperative environment, using ...
Learning Precise Timing with LSTM Recurrent Networks.
In response to Rodriguez's recent article (2001), we compare the performance of simple recurrent nets and long short-term memory recurrent nets on context-free and context-sensitive languages.
Transforming classifier scores into accurate multiclass probability estimates
Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decision-making, such as example-dependent misclassification costs, the outpu...
Alias burying: Unique variables without destructive reads
Abstract An unshared object can be accessed without regard to possible conflicts with other parts of a system, whether concurrent or single‐threaded. A unique variable (sometimes known as a ‘free’ or ‘linear’ variable) is one that either is null or e...
LOF: Identifying Density-Based Local Outliers.
For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a bin...
Sequential Karhunen-Loeve basis extraction and its application to images
The Karhunen-Loeve (KL) transform is an optimal method for approximating a set of vectors or images, which was used in image processing and computer vision for several tasks such as face and object recognition. Its computational demands and its batch...
Goodness of Fit and Related Inference Processes for Quantile Regression
Abstract We introduce a goodness-of-fit process for quantile regression analogous to the conventional R2 statistic of least squares regression. Several related inference processes designed to test composite hypotheses about the combined effect of sev...
Learning to forget: continual prediction with LSTM
Long short-term memory (LSTM) can solve many tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams without explicitly marked sequence ends. ...
On the Complexity of Loop Fusion.
Loop fusion is a program transformation that combines several loops into one. It is used in parallelizing compilers mainly for increasing the granularity of loops and for improving data reuse. The goal of this paper is to study, from a theoretical po...
Fast recursive division
We present a new recursive method for division with remainder of integers. Its running time is $2K(n)+O(n \\log n)$ for division of a $2n$-digit number by an $n$-digit number where $K(n)$ is the Karatsuba multiplication time. It pays in p ractice for...
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator
A new algorithm called Mersenne Twister (MT) is proposed for generating uniform pseudorandom numbers. For a particular choice of parameters, the algorithm provides a super astronomical period of 2 19937 −1 and 623-dimensional equidistribution up to 3...