Machine Learning
Machine learning frameworks, algorithms, and training systems
Repositories
(7)huggingface/transformers
microsoft/onnxruntime
mlflow/mlflow
pytorch/pytorch
ray-project/ray
scikit-learn/scikit-learn
tensorflow/tensorflow
Papers
(373)Commonly used software tools produce conflicting and overly-optimistic AUPRC values
The precision-recall curve (PRC) and the area under the precision-recall curve (AUPRC) are useful for quantifying classification performance. They are commonly used in situations with imbalanced classes, such as cancer diagnosis and cell type annotat...
Design of the 2015 ChaLearn AutoML challenge.
ChaLearn is organizing the Automatic Machine Learning (AutoML) contest for IJCNN 2015, which challenges participants to solve classification and regression problems without any human intervention. Participants' code is automatically run on the contes...
Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies
Humanities majors often find themselves in jobs where they either manage programmers or work with them in close collaboration. These interactions often pose difficulties because specialists in literature, history, philosophy, and so on are not usuall...
Introduction to information retrieval.
Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of th...
LOF: Identifying Density-Based Local Outliers.
For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a bin...
Macro F1 and Macro F1
The 'macro F1' metric is frequently used to evaluate binary, multi-class and multi-label classification problems. Yet, we find that there exist two different formulas to calculate this quantity. In this note, we show that only under rare circumstance...
Making and Evaluating Point Forecasts
Typically, point forecasting methods are compared and assessed by means of an error measure or scoring function, with the absolute error and the squared error being key examples. The individual scores are averaged over forecast cases, to result in a ...
Nonlinear Component Analysis as a Kernel Eigenvalue Problem.
A new method for performing a nonlinear form of principal component analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, related to input space by...
Notes on Regularized Least Squares
This is a collection of information about regularized least squares (RLS). The facts here are not ânew resultsâ, but we have not seen them usefully collected together before. A key goal of this work is to demonstrate that with RLS, we get certain thi...
Performance Evaluation of RANSAC Family.
RANSAC (Random Sample Consensus) has been popular in regression problem with samples contaminated with outliers. It has been a milestone of many researches on robust estimators, but there are a few survey and performance analysis on them. This paper ...
Precision-Recall-Gain Curves: PR Analysis Done Right.
Woodworthâs Two-Component model (1899) partitioned speeded limb movements into two distinct phases: (1) a central ballistic open-loop mechanism and (2) a closed-loop feedback component. The present study investigated the implementation of multi-gain ...
Probabilistic Forecasting
A probabilistic forecast takes the form of a predictive probability distribution over future quantities or events of interest. Probabilistic forecasting aims to maximize the sharpness of the predictive distributions, subject to calibration, on the ba...
Strictly Proper Scoring Rules, Prediction, and Estimation
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for ...
Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent
For large scale learning problems, it is desirable if we can obtain the optimal model parameters by going through the data in only one pass. Polyak and Juditsky (1992) showed that asymptotically the test performance of the simple average of the param...