### On the Stability of Feature Selection Algorithms

Authors: Sarah Nogueira, Konstantinos Sechidis, Gavin Brown

Journal: Journal of Machine Learning Research

Publication Date: 10 April, 2018

Department of: Computer Science

### In Abstract

**Quantifying the reproducibility of feature selection research**

When solving a problem in data science, some pieces of the data are important, some irrelevant, and some only show their worth in the context of others. The task of automatically distinguishing the important features from a large body of data is called ”feature selection”, and there exist many data science algorithms to address this fundamental challenge. However, the results of these algorithms are often not reproducible with respect to tiny variations in the original data, that really should be insignificant in practice. The recent work of Manchester computer scientists has therefore focused on the challenge of ”reproducibility” in feature selection research — more technically known as the ”stability” of the algorithms.

To understand the challenge faced by feature selection algorithms, imagine you’re trying to guess the price of a car. You are provided with various features of the cars, like the make and model, year of manufacture, etc. In solving any given problem like this, some features are important, some are irrelevant, and some are redundant in the context of others. For the car example, the number of miles on the clock clearly matters, while the colour of the wheel trim probably does not, and the age of the car is probably redundant if you know the mileage. You know this because you (probably) know something about cars. But, what if we are predicting whether someone will have a relapse of lung cancer – what things matter? Genetic factors? Lifestyle? Metabolic? This requires years of medical training to recognise. In data science research, we use statistical methods to automatically find the important features. However the results of such ”feature selection” algorithms are often not reproducible across tiny variations in the data. So how much can we really trust them?

If a feature selection algorithm is very ”stable”, then we can be more confident the algorithm has discovered some inherent underlying structure in the data. As such, there have been many attempts to quantify stability, with various pros and cons to each methodology. Now computer scientists at the University of Manchester have consolidated 20 years of work in the area, taking a fresh approach to the problem, from a first-principles mathematical framework. This serves to highlight several subtle flaws in previous measures, but the main result is a new measure of stability. The new measure is a proper generalisation of several existing measures and provides new capabilities, which no other measure has demonstrated – such as confidence intervals and hypothesis testing machinery, which enable use to ask the question ”is algorithm A more reliable than algorithm B, with statistical significance?” The results of the work have aready been applied to real biomedical problems, and extended in a number of ways since the initial publication of this work.

- Feature selection algorithms are fundamental to data science, but results are often not reproducible across tiny variations in the data.
- This paper provides a new framework to quantify the degree of reproducibility, i.e. the stability of the algorithm.