Continuation Methods for Approximate Large Scale Object Sequencing
Authors: Xenophon Evangelopoulos, Austin J Brockmeier, Tingting Mu, John Y Goulermas
Journal: Machine Learning
Publication Date: 23 October, 2018
Department of: Computer Science
Making Data Pattern Ordering Practical
Humans tend to explore data patterns by comparing them with each other using prescribed notions of data similarity or distance. Seriation is a generic exploratory combinatorial data analysis technique applied in a very wide number of fields including bioinformatics, archaeology, medicine, forensics, psychology, gene sequencing, etc. It orders patterns visually along an intuitive linear arrangement where more similar patterns are positioned closely together while dissimilar ones further apart. Therefore, patterns and trends of gradually varying data characteristics are captured and identified. However, this can be very computationally demanding and even for a handful of measurements an exact solution to the problem is impractical. Recently, computer scientists from the universities of Liverpool and Manchester have created new methods for large-scale seriation that can approximate the optimal ordering for thousands of patterns. These are based on mathematical tools that optimise relaxed versions of the original problem for efficiency, while providing mechanisms to recover a near optimal solution. The work proposes different methods to suit the problem and data characteristics at hand, which have excellent scalability yet minimal sacrifice in accuracy.
- The original motivation for seriation arose in the field of archaeology in 1968.
- The problem of seriation was mathematically formalised by Kendall in 1971.
- Seriation becomes popular in many areas in the 21th century, but scalability has always been a serious challenge that prevents its practical use.
- This work tackles particularly the scalability issue, contributing fast seriation algorithms with yet minimal sacrifice in accuracy.