A Bayesian approach for comparing cross-validated algorithms on multiple data sets.

The package allows to compare two algorithms whose performance has been assessed via cross-validation on multiple data sets. It performs a Bayesian correlated t-test on each data set and then merges their results via a Poisson-binomial inference.
It returns the posterior probability of one algorithm having a higher mean score than the other on the provided collection of data sets. It accounts for the uncertainty and the correlation which characterize the cross-validation samples generated on each data set.

The package contains both a Matlab and an R implementation.

The paper is published in Machine Learning, 2015 Machine Learning, 2015 doi (10.1007/s10994-015-5486-z).