122 papers
72 files
244 references

Papers Referenced in This Repository

Introduction to information retrieval.

Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze
2008
7 references

Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gatheri...

Show 7 references in code

Greedy function approximation: A gradient boosting machine.

Jerome H. Friedman
2001
6 references

Function estimation/approximation is viewed from the perspective\nof numerical optimization in function space, rather than parameter space. A\nconnection is made between stagewise additive expansions and steepest-descent\nminimization. A general gradient descent “boosting” paradigm is\ndeveloped for...

Show 6 references in code

Automatic model construction with Gaussian processes

David Duvenaud
2014
4 references

This thesis develops a method for automatically constructing, visualizing and describing a large class of models, useful for forecasting and finding structure in domains such as time series, geological formations, and physical dynamics. These models, based on Gaussian processes, can capture many typ...

Show 4 references in code

Making and Evaluating Point Forecasts

T. Gneiting
2009
4 references

Typically, point forecasting methods are compared and assessed by means of an error measure or scoring function, with the absolute error and the squared error being key examples. The individual scores are averaged over forecast cases, to result in a summary measure of the predictive performance, suc...

Show 4 references in code

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, Jimmy Ba
2014
22 references

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescal...

Show 4 references in code

LIBLINEAR: A Library for Large Linear Classification

Jacob Fish, Rong Fan
2008
4 references

AbstractWe present a generalization of the classical mathematical homogenization theory aimed at accounting for finite unit cell distortions, which gives rise to a nonperiodic asymptotic expansion. We introduce an auxiliary macro‐deformed configuration, where the overall Cauchy stress is defined, an...

Show 4 references in code

A Comparative Analysis of Community Detection Algorithms on Artificial Networks

Zhao Yang, René Algesheimer, Claudio J. Tessone
2016
3 references

Many community detection algorithms have been developed to uncover the mesoscopic properties of complex networks. However how good an algorithm is, in terms of accuracy and computing time, remains still open. Testing algorithms on real-world network has certain restrictions which made their insights...

Show 3 references in code

Random Features for Large-Scale Kernel Machines

A. Rahimi, B. Recht
2007
3 references

In this paper, we contributed a stereo face recognition formulation which combines appearance and disparity/depth at feature level. We showed that the present-day passive stereovision in combination with 2D appearance images can match up to other methods which rely on active depth data. A Reduced Mu...

Show 3 references in code

Commonly used software tools produce conflicting and overly-optimistic AUPRC values

Wenyu Chen, Chen Miao, Zhenghao Zhang, Cathy Sin-Hang Fung, Ran Wang, Yizhen Chen, Yan Qian, Lixin C...
2024
3 references

The precision-recall curve (PRC) and the area under the precision-recall curve (AUPRC) are useful for quantifying classification performance. They are commonly used in situations with imbalanced classes, such as cancer diagnosis and cell type annotation. We evaluate 10 popular tools for plotting PRC...

Show 3 references in code

Calibration of Machine Learning Models

Antonio Bella, Cèsar Ferri, José Hernández‐Orallo, Marïa José Ramírez-Quintana
2012
3 references

The evaluation of machine learning models is a crucial step before their application because it is essential to assess how well a model will behave for every single case. In many real applications, not only is it important to know the “total” or the “average” error of the model, it is also important...

Show 3 references in code

On classification, ranking, and probability estimation.

H. Andersson
2007
3 references

The aim of this thesis was the study of respiration in ocean margin \nsediments and the assessments of tools needed for this purpose.\n \n \nThe first study was on the biological pump and global respiration \npatterns in the deep ocean using an empirical model based \non sediment oxygen consumption ...

Show 3 references in code

On the “degrees of freedom” of the lasso

2007
3 references

We study the effective degrees of freedom of the lasso in the framework of Stein’s unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate for the degrees of freedom of the lasso—a conclusion that requires no special assumption on the predictors. In a...

Show 3 references in code

Asymptotics for the minimum covariance determinant estimator

Ronald W. Butler, P. L. Davies, Myoungshic Jhun
1993
3 references

Consistency is shown for the minimum covariance determinant (MCD) estimators of multivariate location and scale and asymptotic normality is shown for the former. The proofs are made possible by showing a separating ellipsoid property for the MCD subset of observations. An analogous property is shown...

Show 3 references in code

Training linear SVMs in linear time.

Thorsten Joachims
2006
2 references

Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for high-dimensional sparse data commonly encountered in applications like text classification, word-sense disambiguation, and drug design. These applications involve a large number of examples n ...

Show 2 references in code

Self-Organizing Feature Maps Identify Proteins Critical to Learning in a Mouse Model of Down Syndrome

C. Higuera, K. Gardiner, K. Cios
2015
2 references

Down syndrome (DS) is a chromosomal abnormality (trisomy of human chromosome 21) associated with intellectual disability and affecting approximately one in 1000 live births worldwide. The overexpression of genes encoded by the extra copy of a normal chromosome in DS is believed to be sufficient to p...

Show 2 references in code

Accelerated Hierarchical Density Based Clustering

Leland McInnes, John Healy
2017
2 references

We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while ...

Show 2 references in code

Density-Based Clustering Based on Hierarchical Density Estimates

Ricardo J. G. B. Campello, Davoud Moulavi, Jörg Sander
2013
2 references
Show 2 references in code

Information theoretic measures for clusterings comparison: is a correction for chance necessary?

X. Nguyen, J. Epps, J. Bailey
2009
2 references

Information theoretic based measures form a fundamental class of similarity measures for comparing clusterings, beside the class of pair-counting based and set-matching based measures. In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clust...

Show 2 references in code

Kernel Principal Component Analysis.

Bernhard Schölkopf, Alexander J. Smola, Klaus‐Robert Müller
1997
2 references
Show 2 references in code

Learning to Find Pre-Images.

Gökhan Aydýnlý, Wolfgang Härdle, Bernd Rönz
2003
2 references

Travel arrangements and flight ticket booking via internet is widely used nowadays and follow already certain standards. Although increasing activity for multimedia/web education components can be observed, we are far away from standards in this important area. Statistics can possibly profit the mos...

Show 2 references in code

An implementation of a randomized algorithm for principal component analysis.

Mehdi Soufifar
2014
2 references

This thesis addresses the language recognition problem with a special focus on phonotactic language recognition. A full description of different steps in a language recognition system is provided. We study state-of-the-art speech modeling techniques in language recognition that comprise phonotactic,...

Show 2 references in code

On Grouping for Maximum Homogeneity

Walter D. Fisher
1958
2 references

Abstract Given a set of arbitrary numbers, what is a practical procedure for grouping them so that the variance within groups is minimized? An answer to this question, including a description of an automatic computer program, is given for problems up to the size where 200 numbers are to be placed in...

Show 2 references in code

MICE: Multivariate Imputation by Chained Equations in R

S. Buuren, K. Groothuis-Oudshoorn
2011
2 references

The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice, which exten...

Show 2 references in code

Missing value estimation methods for DNA microarrays.

Olga G. Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, Da...
2001
2 references

Abstract Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-...

Show 2 references in code

Efficient additive kernels via explicit feature maps

Andrea Vedaldi, Andrew Zisserman
2010
2 references

Maji and Berg have recently introduced an explicit feature map approximating the intersection kernel. This enables efficient learning methods for linear kernels to be applied to the non-linear intersection kernel, expanding the applicability of this model to much larger problems. In this paper we ge...

Show 2 references in code

Fast and scalable polynomial kernels via explicit feature maps

Ninh D. Pham, R. Pagh
2013
2 references

Approximation of non-linear kernels using random feature mapping has been successfully employed in large-scale data analysis applications, accelerating the training of kernel machines. While previous random feature mappings run in O(ndD) time for $n$ training samples in d-dimensional space and D ran...

Show 2 references in code

Least angle regression

2004
2 references

The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to s...

Show 2 references in code

Strictly Proper Scoring Rules, Prediction, and Estimation

Tilmann Gneiting, Adrian E. Raftery
2007
2 references

Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distributionF if he ...

Show 2 references in code

Design of the 2015 ChaLearn AutoML challenge.

Isabelle Guyon, Kristin P. Bennett, Gavin C. Cawley, Hugo Jair Escalante, Sérgio Escalera, Tin Kam H...
2015
2 references

ChaLearn is organizing the Automatic Machine Learning (AutoML) contest for IJCNN 2015, which challenges participants to solve classification and regression problems without any human intervention. Participants' code is automatically run on the contest servers to train and test learning machines. How...

Show 2 references in code

The relationship between Precision-Recall and ROC curves.

Jesse Davis, Mark Goadrich
2006
3 references

Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more informative picture of an algorithm's performance. We show that a deep conn...

Show 2 references in code

Macro F1 and Macro F1

2019
2 references

The 'macro F1' metric is frequently used to evaluate binary, multi-class and multi-label classification problems. Yet, we find that there exist two different formulas to calculate this quantity. In this note, we show that only under rare circumstances the two computations can be considered equivalen...

Show 2 references in code

The DET curve in assessment of detection task performance.

Alvin F. Martin, G. Doddington, T. Kamm, Mark Ordowski, Mark A. Przybocki
1997
2 references

Abstract : We introduce the DET Curve as a means of representing performance on detection tasks that involve a tradeoff of error types. We discuss why we prefer it to the traditional ROC Curve and offer several examples of its use in speaker recognition and language recognition. We explain why it is...

Show 2 references in code

On Linear DETs.

Jiri Navractil, D. Klusacek
2007
2 references

This paper investigates the properties of a popular ROC variant - the detection error trade-off plot (DET). In particular, we derive a set of conditions on the underlying probability distributions to produce linear DET plots in a generalized setting. We show that the linear DETs on a normal deviate ...

Show 2 references in code

A comparison of event models for naive bayes text classification

A. McCallum, K. Nigam
1998
2 references

Article Free Access Share on Distributional clustering of words for text classification Authors: L. Douglas Baker School of Computer Science, Carnegie Mellon University, Pittsburgh, PA and Just Research 4616 Henry Street, Pittsburgh, PA School of Computer Science, Carnegie Mellon University, Pittsbu...

Show 2 references in code

Spam Filtering with Naive Bayes - Which Naive Bayes?

V. Metsis, Ion Androutsopoulos, G. Paliouras
2006
2 references
Show 2 references in code

Nonlinear Component Analysis as a Kernel Eigenvalue Problem.

Bernhard Schölkopf, Alexander J. Smola, Klaus‐Robert Müller
1998
2 references

A new method for performing a nonlinear form of principal component analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, related to input space by some nonlinear map—for instance, the space of all...

Show 2 references in code

Label Propagation and Quadratic Criterion.

Yoshua Bengio, Delalleau Olivier, Roux Nicolas Le
2006
2 references

Abstract This chapter shows how the different graph-based algorithms for semi-supervised learning can be cast into a common framework where one minimizes a quadratic cost criterion whose closed-form solution is found by solving a linear system of size n (total number of data points). The cost criter...

Show 2 references in code

Least Median of Squares Regression

Peter J. Rousseeuw
1984
2 references

Abstract Classical least squares regression consists of minimizing the sum of the squared residuals. Many authors have produced more robust versions of this estimator by replacing the square by something else, such as the absolute value. In this article a different approach is introduced in which th...

Show 2 references in code

A Fast Algorithm for the Minimum Covariance Determinant Estimator

Peter J. Rousseeuw, Katrien Van Driessen
1999
2 references

The minimum covariance determinant (MCD) method of Rousseeuw is a highly robust estimator of multivariate location and scatter. Its objective is to find h observations (out of n) whose covariance matrix has the lowest determinant. Until now, applications of the MCD were hampered by the computation t...

Show 2 references in code

Small sample corrections for LTS and MCD

Greet Pison, Stefan Van Aelst, Gert Willems
2002
2 references

The least trimmed squares estimator and the minimum covariance determinant estimator Rousseeuw (1984) are frequently used robust estimators of regression and of location and scatter. Consistency factors can be computed for both methods to make the estimators consistent at the normal model. However, ...

Show 2 references in code

Algorithms for Nonnegative Matrix Factorization with the β-Divergence.

Cédric Févotte, Jérôme Idier
2011
2 references

This letter describes algorithms for nonnegative matrix factorization (NMF) with the β-divergence (β-NMF). The β-divergence is a family of cost functions parameterized by a single shape parameter β that takes the Euclidean distance, the Kullback-Leibler divergence, and the Itakura-Saito divergence a...

Show 2 references in code

Modern Information Retrieval - the concepts and technology behind search, Second edition

R. Baeza-Yates, B. Ribeiro-Neto
2011
2 references

Intelligent interaction between humans and computers has been a dream of artificial intelligence since the beginning of digital era and one of the original motivations behind the creation of artificial intelligence. A key step towards the achievement of such an ambitious goal is to enable the Questi...

Show 2 references in code

Special Invited Paper-Additive logistic regression: A statistical view of boosting

J. Friedman
2000
2 references

Boosting is one of the most important recent developments in\nclassification methodology. Boosting works by sequentially applying a\nclassification algorithm to reweighted versions of the training data and then\ntaking a weighted majority vote of the sequence of classifiers thus produced.\nFor many ...

Show 2 references in code

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.

Kaiming He, X. Zhang, Shaoqing Ren, Jian Sun
2015
9 references

Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU ...

Show 2 references in code

A randomized algorithm for the decomposition of matrices

Per‐Gunnar Martinsson, Vladimir Rokhlin, Mark Tygert
2011
2 references
Show 2 references in code

A New Vector Partition of the Probability Score

A. H. Murphy
1973
1 reference

A new vector partition of the probability, or Brier, score (PS) is formulated and the nature and properties of this partition are described. The relationships between the terms in this partition and the terms in the original vector partition of the PS are indicated. The new partition consists of thr...

Show 1 reference in code

Statistical Foundations of Actuarial Learning and its Applications

Mario V. Wuthrich, M. Merz
2021
1 reference

The aim of this manuscript is to provide the mathematical and statistical foundations of actuarial learning. This is key to most actuarial tasks like insurance pricing, product development, claims reserving and risk management. The basic approach to these tasks is regression modeling. This manuscrip...

Show 1 reference in code

k-means++: the advantages of careful seeding

David Arthur, Sergei Vassilvitskii
2007
1 reference
Show 1 reference in code

V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure

N.H. Bergboer
2007
1 reference

As it is not known a priori which size of the context region around the object yields to most useful information, we pose a second research question.Research question 2 (RQ2): What size of the context region is best suited to lower the false-detection rate?We aim at presenting and evaluating novel a...

Show 1 reference in code

On Clustering Validation Techniques.

M. Halkidi, Yannis Batistakis, M. Vazirgiannis
2001
1 reference
Show 1 reference in code

Sparse inverse covariance estimation with the graphical lasso.

Jerome H. Friedman, Trevor Hastie, Robert Tibshirani
2008
1 reference

Abstract We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm—the graphical lasso—that is remarkably fast: It solves a 1000-node problem (∼500000 parameters) ...

Show 1 reference in code

Stochastic variational inference.

Dawen Liang, M. Hoffman, D. Ellis
2013
1 reference

AbstractTools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate ...

Show 1 reference in code

Machine Learning Applications to Land and Structure Valuation

Michael Mayer, Steven C. Bourassa, Martin Hoesli, D. Scognamiglio
2022
1 reference

In some applications of supervised machine learning, it is desirable to trade model complexity with greater interpretability for some covariates while letting other covariates remain a “black box”. An important example is hedonic property valuation modeling, where machine learning techniques typical...

Show 1 reference in code

Generalized Boosted Models: A guide to the gbm package

G. Ridgeway
2006
1 reference

This article provides an introduction to ensemble statistical procedures as a special case of algorithmic methods. The discussion begins with classification and regression trees (CART) as a didactic device to introduce many of the key issues. Following the material on CART is a consideration of cros...

Show 1 reference in code

Feature hashing for large scale multitask learning.

Kilian Q. Weinberger, Anirban Dasgupta, John Langford, Alex Smola, Josh Attenberg
2009
1 reference

Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We d...

Show 1 reference in code

Stop Word Lists in Free Open-source Software Packages

J. Nothman, Hanmin Qin, R. Yurchak
2018
1 reference

Open-source software packages for language processing often include stop word lists. Users may apply them without awareness of their surprising omissions (e.g. “hasn’t” but not “hadn’t”) and inclusions (“computer”), or their incompatibility with a particular tokenizer. Motivated by issues raised abo...

Show 1 reference in code

Random Search for Hyper-Parameter Optimization

J. Bergstra, Yoshua Bengio
2012
1 reference
Show 1 reference in code

Statistical Analysis with Missing Data.

Martin G. Gibson, R. Little, D. Rubin
1989
1 reference
Show 1 reference in code

Generalized RBF feature maps for Efficient Detection

Sreekanth Vempati, Andrea Vedaldi, Andrew Zisserman, C. V. Jawahar
2010
1 reference

Kernel methods yield state-of-the-art performance in certain applications such as image classification and object detection. However, large scale problems require machine learning techniques of at most linear complexity and these are usually limited to linear kernels. This unfortunately rules out go...

Show 1 reference in code

Notes on Regularized Least Squares

2007
1 reference

This is a collection of information about regularized least squares (RLS). The facts here are not “new results”, but we have not seen them usefully collected together before. A key goal of this work is to demonstrate that with RLS, we get certain things “for free”: if we can solve a single supervise...

Show 1 reference in code

Generalized Linear Models

Peter McCullagh, J. A. Nelder
1989
1 reference
Show 1 reference in code

Performance Evaluation of RANSAC Family.

Sunglok Choi, Taemin Kim, Wonpil Yu
2009
1 reference

RANSAC (Random Sample Consensus) has been popular in regression problem with samples contaminated with outliers. It has been a milestone of many researches on robust estimators, but there are a few survey and performance analysis on them. This paper categorizes them on their objectives: being accura...

Show 1 reference in code

More on Multidimensional Scaling and Unfolding in R: smacof Version 2.

Patrick Mair, Patrick J. F. Groenen, Jan de Leeuw
2022
1 reference

The smacof package offers a comprehensive implementation of multidimensional scaling (MDS) techniques in R. Since its first publication (De Leeuw and Mair 2009b) the functionality of the package has been enhanced, and several additional methods, features and utilities were added. Major updates inclu...

Show 1 reference in code

Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets

Anna C. Belkina, Chris Ciccolella, R. Anno, Richard L. Halpert, Josef Spidlen, J. Snyder-Cappione
2019
1 reference

Accurate and comprehensive extraction of information from high-dimensional single cell datasets necessitates faithful visualizations to assess biological populations. A state-of-the-art algorithm for non-linear dimension reduction, t-SNE, requires multiple heuristics and fails to produce clear repre...

Show 1 reference in code

Visualizing Data using t-SNE

L. Maaten, Geoffrey E. Hinton
2008
1 reference

Tese de mestrado integrado. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 2008

Show 1 reference in code

Probabilistic Forecasting

2014
1 reference

A probabilistic forecast takes the form of a predictive probability distribution over future quantities or events of interest. Probabilistic forecasting aims to maximize the sharpness of the predictive distributions, subject to calibration, on the basis of the available information set. We formalize...

Show 1 reference in code

Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies

Jerry Weltman
2015
1 reference

Humanities majors often find themselves in jobs where they either manage programmers or work with them in close collaboration. These interactions often pose difficulties because specialists in literature, history, philosophy, and so on are not usually "code literate." They do not understand what tas...

Show 1 reference in code

Precision-Recall-Gain Curves: PR Analysis Done Right.

Lucas Lemasters, John Flach
2015
1 reference

Woodworth’s Two-Component model (1899) partitioned speeded limb movements into two distinct phases: (1) a central ballistic open-loop mechanism and (2) a closed-loop feedback component. The present study investigated the implementation of multi-gain control configurations that utilized separate gain...

Show 1 reference in code

An experimental comparison of performance measures for classification

Cèsar Ferri, José Hernández‐Orallo, R. Modroiu
2009
1 reference
Show 1 reference in code

The Optimality of Naive Bayes.

Harry Zhang
2004
1 reference
Show 1 reference in code

Multidimensional Binary Search Trees Used for Associative Searching.

Jon Bentley
1975
1 reference

This paper develops the multidimensional binary search tree (or k -d tree, where k is the dimensionality of the search space) as a data structure for storage of information to be retrieved by associative searches. The k -d tree is defined and examples are given. It is shown to be quite efficient in ...

Show 1 reference in code
Show 1 reference in code

LOF: Identifying Density-Based Local Outliers.

Markus Breunig, Hans‐Peter Kriegel, Raymond T. Ng, Jörg Sander
2000
1 reference

For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for m...

Show 1 reference in code

Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent

W. Xu
2011
1 reference

For large scale learning problems, it is desirable if we can obtain the optimal model parameters by going through the data in only one pass. Polyak and Juditsky (1992) showed that asymptotically the test performance of the simple average of the parameters obtained by stochastic gradient descent (SGD...

Show 1 reference in code

In Defense of One-Vs-All Classification

Eibe Frank, Stefan Krämer
2004
1 reference

Nested dichotomies are a standard statistical technique for tackling certain polytomous classification problems with logistic regression. They can be represented as binary trees that recursively split a multi-class classification task into a system of dichotomies and provide a statistically sound wa...

Show 1 reference in code

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods

David Yarowsky
1995
1 reference

This paper presents an unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require time-consuming hand annotations. The algorithm is based on two powerful constraints---that words tend to have o...

Show 1 reference in code

Predicting good probabilities with supervised learning.

Alexandru Niculescu-Mizil, Rich Caruana
2005
1 reference

We examine the relationship between the predictions made by different learning algorithms and true posterior probabilities. We show that maximum margin methods such as boosted trees and boosted stumps push probability mass away from 0 and 1 yielding a characteristic sigmoid shaped distortion in the ...

Show 1 reference in code

Shrinkage Algorithms for MMSE Covariance Estimation

Yilun Chen, Ami Wiesel, Yonina C. Eldar, Alfred O. Hero
2009
1 reference

We address covariance estimation in the sense of minimum mean-squared error (MMSE) for Gaussian samples. Specifically, we consider shrinkage methods which are suitable for high dimensional problems with a small number of samples (large p small n). First, we improve on the Ledoit-Wolf (LW) method by ...

Show 1 reference in code

Cascading classifiers.

Ethem Alpaydın, Fikret Gürgen
1998
1 reference
Show 1 reference in code

Incremental Learning for Robust Visual Tracking

David A. Ross, Jongwoo Lim, Ruei-Sung Lin, Ming-Hsuan Yang
2008
1 reference
Show 1 reference in code

Sequential Karhunen-Loeve basis extraction and its application to images

2000
1 reference

The Karhunen-Loeve (KL) transform is an optimal method for approximating a set of vectors or images, which was used in image processing and computer vision for several tasks such as face and object recognition. Its computational demands and its batch calculation nature have limited its application. ...

Show 1 reference in code

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

Nathan Blow
2009
13 references

Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for pe...

Show 1 reference in code

Numerical Optimization

Jorge Nocedal, Stephen J. Wright
2006
1 reference
Show 1 reference in code

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

Noah Simon, Jerome H. Friedman, Trevor Hastie
2013
1 reference

In this paper we purpose a blockwise descent algorithm for group-penalized multiresponse regression. Using a quasi-newton framework we extend this to group-penalized multinomial regression. We give a publicly available implementation for these in R, and compare the speed of this algorithm to a compe...

Show 1 reference in code

Transforming classifier scores into accurate multiclass probability estimates

Bianca Zadrozny, Charles Elkan
2002
1 reference

Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decision-making, such as example-dependent misclassification costs, the outputs of other classifiers, or domain knowledge. Prev...

Show 1 reference in code

Convergence Theory for Preconditioned Eigenvalue Solvers in a Nutshell

M. Argentati, A. Knyazev, K. Neymeyr, E. Ovtchinnikov, M. Zhou
2014
1 reference

Preconditioned iterative methods for numerical solution of large matrix eigenvalue problems are increasingly gaining importance in various application areas, ranging from material sciences to data mining. Some of them, e.g., those using multilevel preconditioning for elliptic differential operators ...

Show 1 reference in code

Inter-Coder Agreement for Computational Linguistics

2008
1 reference

This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff's alpha as well as Scott's pi and Cohen's kappa; discusses the use of coefficients in several annotation tasks; ...

Show 1 reference in code

Pattern Recognition and Machine Learning

Christopher Bishop
2006
1 reference

The Journal of Electronic Imaging (JEI), copublished bimonthly with the Society for Imaging Science and Technology, publishes peer-reviewed papers that cover research and applications in all areas of electronic imaging science and technology.

Show 1 reference in code

Comparing partitions

Lawrence J. Hubert, Phipps Arabie
1985
1 reference
Show 1 reference in code

Properties of the Hubert-Arabie adjusted Rand index.

D. Steinley
2004
1 reference

This article provides an investigation of cluster validation indices that relates 4 of the indices to the L. Hubert and P. Arabie (1985) adjusted Rand index--the cluster validation measure of choice (G. W. Milligan & M. C. Cooper, 1986). It is shown how these other indices can be "roughly" transform...

Show 1 reference in code

Goodness of Fit and Related Inference Processes for Quantile Regression

R. Koenker, J. Machado
1999
1 reference

Abstract We introduce a goodness-of-fit process for quantile regression analogous to the conventional R2 statistic of least squares regression. Several related inference processes designed to test composite hypotheses about the combined effect of several covariates over an entire range of conditiona...

Show 1 reference in code

Permutation Tests for Studying Classifier Performance

2010
1 reference

We explore the framework of permutation-based p-values for assessing the performance of classifiers. In this paper we study two simple permutation tests. The first test assess whether the classifier has found a real class structure in the data; the corresponding null distribution is estimated by per...

Show 1 reference in code

Flexible smoothing with B-splines and penalties

Paul H.C. Eilers, Brian D. Marx
1996
1 reference

B-splines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots...

Show 1 reference in code

Probability Estimates for Multi-Class Classification by Pairwise Coupling.

Xiaodong Wu, Danny Z. Chen, James J. Mason, Steven R. Schmid
2003
1 reference
Show 1 reference in code

A Newton-CG Algorithm with Complexity Guarantees for Smooth Unconstrained Optimization

Clément W. Royer, Michael O'Neill, Stephen J. Wright
2018
1 reference

We consider minimization of a smooth nonconvex objective function using an iterative algorithm based on Newton's method and the linear conjugate gradient algorithm, with explicit detection and use of negative curvature directions for the Hessian of the objective function. The algorithm tracks Newton...

Show 1 reference in code
Link copied to clipboard!