Papers
Browse academic papers referenced in production code
A Fast Algorithm for the Minimum Covariance Determinant Estimator
The minimum covariance determinant (MCD) method of Rousseeuw is a highly robust estimator of multivariate location and scatter. Its objective is to find h observations (out of n) whose covariance matrix has the lowest determinant. Until now, applicat...
Embedded post-processing for enhancement of compressed images
This paper presents a simple and effective post-processing method for compressed images. This work focuses on the cyclic time-variance introduced by block-based and subband transform coders. We propose an algorithm to (almost) restore stationarity to...
A comparison of event models for naive bayes text classification
Article Free Access Share on Distributional clustering of words for text classification Authors: L. Douglas Baker School of Computer Science, Carnegie Mellon University, Pittsburgh, PA and Just Research 4616 Henry Street, Pittsburgh, PA School of Com...
Nonlinear Component Analysis as a Kernel Eigenvalue Problem.
A new method for performing a nonlinear form of principal component analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, related to input space by...
The DET curve in assessment of detection task performance.
Abstract : We introduce the DET Curve as a means of representing performance on detection tasks that involve a tradeoff of error types. We discuss why we prefer it to the traditional ROC Curve and offer several examples of its use in speaker recognit...
Estimating the mean and variance of the target probability distribution
Introduces a method that estimates the mean and the variance of the probability distribution of the target as a function of the input, given an assumed target error-distribution model. Through the activation of an auxiliary output unit, this method p...
Optimization of cyclic redundancy-check codes with 24 and 32 parity bits
The method developed by T. Fujiwara et al. (1985) for efficiently computing the minimum distance of shortened Hamming codes using the weight distribution of their dual codes is extended to treat arbitrary shortened cyclic codes. Using this method imp...
How to read floating point numbers accurately
Consider the problem of converting decimal scientific notation for a number into the best binary floating point approximation to that number, for some fixed precision. This problem cannot be solved using arithmetic of any fixed precision. Hence the I...
Random sampling from hash files
In this paper we discuss simple random sampling from hash files on secondary storage. We consider both iterative and batch sampling algorithms from both static and dynamic hashing methods. The static methods considered are open addressing hash files ...
The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles.
The R-tree, one of the most popular access methods for rectangles, is based on the heuristic optimization of the area of the enclosing rectangle in each inner node. By running numerous experiments in a standardized testbed under highly varying data, ...
Least Median of Squares Regression
Abstract Classical least squares regression consists of minimizing the sum of the squared residuals. Many authors have produced more robust versions of this estimator by replacing the square by something else, such as the absolute value. In this arti...
R-trees: a dynamic index structure for spatial searching
In order to handle spatial data efficiently, as required in computer aided design and geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations However, tradit...