Applying nature-inspired optimization algorithms for selecting important timestamps to reduce time series dimensionality

Research output: Contribution to journalArticle

Abstract

Time series data account for a major part of data supply available today. Time series mining handles several tasks such as classification, clustering, query-by-content, prediction, and others. Performing data mining tasks on raw time series is inefficient as these data are high-dimensional by nature. Instead, time series are first pre-processed using several techniques before different data mining tasks can be performed on them. In general, there are two main approaches to reduce time series dimensionality; the first is what we call landmark methods. These methods are based on finding characteristic features in the target time series. The second is based on data transformations. These methods transform the time series from the original space into a reduced space, where they can be managed more efficiently. The method we present in this paper applies a third approach, as it projects a time series onto a lower-dimensional space by selecting important points in the time series. The novelty of our method is that these points are not chosen according to a geometric criterion, which is subjective in most cases, but through an optimization process. The other important characteristic of our method is that these important points are selected on a dataset-level and not on a single time series-level. The direct advantage of this strategy is that the distance defined on the low-dimensional space lower bounds the original distance applied to raw data. This enables us to apply the popular GEMINI algorithm. The promising results of our experiments on a wide variety of time series datasets, using different optimizers, and applied to the two major data mining tasks, validate our new method.

Original languageEnglish
Pages (from-to)13-28
Number of pages16
JournalEvolving Systems
Volume10
Issue number1
Early online date2 Nov 2017
DOIs
Publication statusPublished - 1 Mar 2019

Fingerprint

Timestamp
Dimensionality
Time series
Optimization Algorithm
Data Mining
Data mining
Data Transformation
Method of Characteristics
Landmarks
Process Optimization
Time Series Data
Mining
High-dimensional
Clustering
Query
Transform
Lower bound
Target

Keywords

  • Classification
  • Clustering
  • Differential evolution
  • Genetic algorithm
  • Particle swarm optimization
  • Time series mining

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modelling and Simulation
  • Computer Science Applications
  • Control and Optimization

Cite this

@article{3897e5ac06ed498aa710217a07e57e66,
title = "Applying nature-inspired optimization algorithms for selecting important timestamps to reduce time series dimensionality",
abstract = "Time series data account for a major part of data supply available today. Time series mining handles several tasks such as classification, clustering, query-by-content, prediction, and others. Performing data mining tasks on raw time series is inefficient as these data are high-dimensional by nature. Instead, time series are first pre-processed using several techniques before different data mining tasks can be performed on them. In general, there are two main approaches to reduce time series dimensionality; the first is what we call landmark methods. These methods are based on finding characteristic features in the target time series. The second is based on data transformations. These methods transform the time series from the original space into a reduced space, where they can be managed more efficiently. The method we present in this paper applies a third approach, as it projects a time series onto a lower-dimensional space by selecting important points in the time series. The novelty of our method is that these points are not chosen according to a geometric criterion, which is subjective in most cases, but through an optimization process. The other important characteristic of our method is that these important points are selected on a dataset-level and not on a single time series-level. The direct advantage of this strategy is that the distance defined on the low-dimensional space lower bounds the original distance applied to raw data. This enables us to apply the popular GEMINI algorithm. The promising results of our experiments on a wide variety of time series datasets, using different optimizers, and applied to the two major data mining tasks, validate our new method.",
keywords = "Classification, Clustering, Differential evolution, Genetic algorithm, Particle swarm optimization, Time series mining",
author = "{Muhammad Fuad}, {Muhammad Marwan}",
year = "2019",
month = "3",
day = "1",
doi = "10.1007/s12530-017-9207-7",
language = "English",
volume = "10",
pages = "13--28",
journal = "Evolving Systems",
issn = "1868-6478",
number = "1",

}

TY - JOUR

T1 - Applying nature-inspired optimization algorithms for selecting important timestamps to reduce time series dimensionality

AU - Muhammad Fuad, Muhammad Marwan

PY - 2019/3/1

Y1 - 2019/3/1

N2 - Time series data account for a major part of data supply available today. Time series mining handles several tasks such as classification, clustering, query-by-content, prediction, and others. Performing data mining tasks on raw time series is inefficient as these data are high-dimensional by nature. Instead, time series are first pre-processed using several techniques before different data mining tasks can be performed on them. In general, there are two main approaches to reduce time series dimensionality; the first is what we call landmark methods. These methods are based on finding characteristic features in the target time series. The second is based on data transformations. These methods transform the time series from the original space into a reduced space, where they can be managed more efficiently. The method we present in this paper applies a third approach, as it projects a time series onto a lower-dimensional space by selecting important points in the time series. The novelty of our method is that these points are not chosen according to a geometric criterion, which is subjective in most cases, but through an optimization process. The other important characteristic of our method is that these important points are selected on a dataset-level and not on a single time series-level. The direct advantage of this strategy is that the distance defined on the low-dimensional space lower bounds the original distance applied to raw data. This enables us to apply the popular GEMINI algorithm. The promising results of our experiments on a wide variety of time series datasets, using different optimizers, and applied to the two major data mining tasks, validate our new method.

AB - Time series data account for a major part of data supply available today. Time series mining handles several tasks such as classification, clustering, query-by-content, prediction, and others. Performing data mining tasks on raw time series is inefficient as these data are high-dimensional by nature. Instead, time series are first pre-processed using several techniques before different data mining tasks can be performed on them. In general, there are two main approaches to reduce time series dimensionality; the first is what we call landmark methods. These methods are based on finding characteristic features in the target time series. The second is based on data transformations. These methods transform the time series from the original space into a reduced space, where they can be managed more efficiently. The method we present in this paper applies a third approach, as it projects a time series onto a lower-dimensional space by selecting important points in the time series. The novelty of our method is that these points are not chosen according to a geometric criterion, which is subjective in most cases, but through an optimization process. The other important characteristic of our method is that these important points are selected on a dataset-level and not on a single time series-level. The direct advantage of this strategy is that the distance defined on the low-dimensional space lower bounds the original distance applied to raw data. This enables us to apply the popular GEMINI algorithm. The promising results of our experiments on a wide variety of time series datasets, using different optimizers, and applied to the two major data mining tasks, validate our new method.

KW - Classification

KW - Clustering

KW - Differential evolution

KW - Genetic algorithm

KW - Particle swarm optimization

KW - Time series mining

UR - http://www.scopus.com/inward/record.url?scp=85065208012&partnerID=8YFLogxK

U2 - 10.1007/s12530-017-9207-7

DO - 10.1007/s12530-017-9207-7

M3 - Article

VL - 10

SP - 13

EP - 28

JO - Evolving Systems

JF - Evolving Systems

SN - 1868-6478

IS - 1

ER -