Papers
Browse academic papers referenced in production code
The Complex Gradient Operator and the CR-Calculus
A thorough discussion and development of the calculus of real-valued functions of complex-valued vectors is given using the framework of the Wirtinger Calculus. The presented material is suitable for exposition in an introductory Electrical Engineeri...
Novel Table Lookup-Based Algorithms for High-Performance CRC Generation
A framework for designing a family of novel fast cyclic redundancy code (CRC) generation algorithms is presented. Our algorithms can ideally read arbitrarily large amounts of data at a time, while optimizing their memory requirement to meet the const...
On Linear DETs.
This paper investigates the properties of a popular ROC variant - the detection error trade-off plot (DET). In particular, we derive a set of conditions on the underlying probability distributions to produce linear DET plots in a generalized setting....
Practical type inference for arbitrary-rank types
Abstract Haskell's popularity has driven the need for ever more expressive type system features, most of which threaten the decidability and practicality of Damas-Milner type inference. One such feature is the ability to write functions with higher-r...
Strictly Proper Scoring Rules, Prediction, and Estimation
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for ...
Label Propagation and Quadratic Criterion.
Abstract This chapter shows how the different graph-based algorithms for semi-supervised learning can be cast into a common framework where one minimizes a quadratic cost criterion whose closed-form solution is found by solving a linear system of siz...
Less hashing, same performance: Building a better Bloom filter
Abstract A standard technique from the hashing literature is to use two hash functions h 1 ( x ) and h 2 ( x ) to simulate additional hash functions of the form g i ( x ) = h 1 ( x ) + i h 2 ( x ). We demonstrate that this technique can be usefully a...
Training linear SVMs in linear time.
Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for high-dimensional sparse data commonly encountered in applications like text classification, word-sense disambiguation, and drug design. These ...
Least angle regression
The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collect...
Learning to Find Pre-Images.
Travel arrangements and flight ticket booking via internet is widely used nowadays and follow already certain standards. Although increasing activity for multimedia/web education components can be observed, we are far away from standards in this impo...
Xorshift RNGs
Description of a class of simple, extremely fast random number generators (RNGs) with periods 2k - 1 for k = 32, 64, 96, 128, 160, 192. These RNGs seem to pass tests of randomness very well.
32-bit cyclic redundancy codes for Internet applications
Standardized 32-bit Cyclic Redundancy Codes provide fewer bits of guaranteed error detection than they could, achieving a Hamming Distance (HD) of only 4 for maximum-length Ethernet messages, whereas HD=6 is possible. Although research has revealed i...
Small sample corrections for LTS and MCD
The least trimmed squares estimator and the minimum covariance determinant estimator Rousseeuw (1984) are frequently used robust estimators of regression and of location and scatter. Consistency factors can be computed for both methods to make the es...
Missing value estimation methods for DNA microarrays.
Abstract Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For examp...
Special Invited Paper-Additive logistic regression: A statistical view of boosting
Boosting is one of the most important recent developments in\nclassification methodology. Boosting works by sequentially applying a\nclassification algorithm to reweighted versions of the training data and then\ntaking a weighted majority vote of the...