|
Discords
Primer for discords finding in time-series.
IntroductionIn 2005 Keogh, Lin & Fu published a paper providing SAX application for finding sub-series of unusual behavior within time-series: E. Keogh, J. Lin and A. Fu (2005). "HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence". In The Fifth IEEE International Conference on Data Mining. Authors define discords in their work as the "...subsequences of a longer time series that are maximally different to all the rest of the time series subsequences..." DetailsJMotif provides a convenient method for finding discords within time series. Following the article I've downloaded a TEK16 dataset (Space Shuttle Marotta Valve Series) from the UCR Time Series Classification/Clustering Page. The discord location was identified by simply calling SAXFactory.instances2Discords method:
public static void main(String[] args) throws Exception {
// get the data first
Instances tsData = readTSData();
// now build the SAX data structure using sliding window of size 40 and alphabet of 3
DiscordRecords dr = SAXFactory.instances2Discords(tsData, attribute, windowSize, alphabetSize);
// printout the discords occurrences
System.out.println(dr.toString());
}
/**
* Read the timeseries data into WEKA format.
*
* @return Timeseries.
* @throws Exception If error occurs.
*/
private static Instances readTSData() throws Exception {
Instances data = DataSource.read("data//ts_data//TEK16.arff");
return data;
}The TEK17 dataset analysis: Raw timeseries with discord found by JMotif highlighted Zoomed into discord; similar fragments from timeseries and their clustering.
| |||