Machine Learning
Machine learning frameworks, algorithms, and training systems
Repositories
(7)huggingface/transformers
microsoft/onnxruntime
mlflow/mlflow
pytorch/pytorch
ray-project/ray
scikit-learn/scikit-learn
tensorflow/tensorflow
Papers
(373)Algorithms for Nonnegative Matrix Factorization with the β-Divergence.
This letter describes algorithms for nonnegative matrix factorization (NMF) with the β-divergence (β-NMF). The β-divergence is a family of cost functions parameterized by a single shape parameter β that takes the Euclidean distance, the Kullback-Leib...
Convergence Theory for Preconditioned Eigenvalue Solvers in a Nutshell
Preconditioned iterative methods for numerical solution of large matrix eigenvalue problems are increasingly gaining importance in various application areas, ranging from material sciences to data mining. Some of them, e.g., those using multilevel pr...
Flexible smoothing with B-splines and penalties
B-splines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We...
Label Propagation and Quadratic Criterion.
Abstract This chapter shows how the different graph-based algorithms for semi-supervised learning can be cast into a common framework where one minimizes a quadratic cost criterion whose closed-form solution is found by solving a linear system of siz...
Neural Machine Translation by Jointly Learning to Align and Translate
Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize t...
Permutation Tests for Studying Classifier Performance
We explore the framework of permutation-based p-values for assessing the performance of classifiers. In this paper we study two simple permutation tests. The first test assess whether the classifier has found a real class structure in the data; the c...
Predicting good probabilities with supervised learning.
We examine the relationship between the predictions made by different learning algorithms and true posterior probabilities. We show that maximum margin methods such as boosted trees and boosted stumps push probability mass away from 0 and 1 yielding ...
Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods
BBMRI-NL, X-omics, VOILA, Medical Delta and the Dutch Research Council (NWO-VENI).
Self-attention Does Not Need $O(n^2)$ Memory
We present a very simple algorithm for attention that requires $O(1)$ memory with respect to sequence length and an extension to self-attention that requires $O(\log n)$ memory. This is in contrast with the frequently stated belief that self-attentio...
Sequential Karhunen-Loeve basis extraction and its application to images
The Karhunen-Loeve (KL) transform is an optimal method for approximating a set of vectors or images, which was used in image processing and computer vision for several tasks such as face and object recognition. Its computational demands and its batch...
Special Invited Paper-Additive logistic regression: A statistical view of boosting
Boosting is one of the most important recent developments in\nclassification methodology. Boosting works by sequentially applying a\nclassification algorithm to reweighted versions of the training data and then\ntaking a weighted majority vote of the...
Training linear SVMs in linear time.
Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for high-dimensional sparse data commonly encountered in applications like text classification, word-sense disambiguation, and drug design. These ...
Transforming classifier scores into accurate multiclass probability estimates
Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decision-making, such as example-dependent misclassification costs, the outpu...