Variable-chromosome-length genetic algorithm for time series discretization

Research output: Chapter in Book/Report/Conference proceedingConference proceeding

1 Citation (Scopus)

Abstract

The symbolic aggregate approximation method (SAX) of time series is a widely-known dimensionality reduction technique of time series data. SAX assumes that normalized time series have a high-Gaussian distribution. Based on this assumption SAX uses statistical lookup tables to determine the locations of the breakpoints on which SAX is based. In a previous work, we showed how this assumption oversimplifies the problem, which may result in high classification errors. We proposed an alternative approach, based on the genetic algorithms, to determine the locations of the breakpoints. We also showed how this alternative approach boosts the performance of the original SAX. However, the method we presented has the same drawback that existed in the original SAX; it was only able to determine the locations of the breakpoints but not the corresponding alphabet size, which had to be input by the user in the original SAX. In the method we previously presented we had to run the optimization process as many times as the range of the alphabet size. Besides, performing the optimization process in two steps can cause overfitting. The novelty of the present work is twofold; first, we extend a version of the genetic algorithms that uses chromosomes of different lengths. Second, we apply this new version of variable-chromosome-length genetic algorithm to the problem at hand to simultaneously determine the number of the breakpoints, together with their locations, so that the optimization process is run only once. This speeds up the training stage and also avoids overfitting. The experiments we conducted on a variety of datasets give promising results.

Original languageEnglish
Title of host publicationDatabase and Expert Systems Applications - 27th International Conference, DEXA 2016, Proceedings
EditorsSven Hartmann, Hui Ma
PublisherSpringer-Verlag Italia
Pages418-425
Number of pages8
ISBN (Electronic)978-3-319-44406-2
ISBN (Print)978-3-319-44405-5
DOIs
Publication statusE-pub ahead of print - 6 Aug 2016
Externally publishedYes
Event27th International Conference on Database and Expert Systems Applications, DEXA 2016 - Porto, Portugal
Duration: 5 Sep 20168 Sep 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9828 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Database and Expert Systems Applications, DEXA 2016
CountryPortugal
CityPorto
Period5/09/168/09/16

Fingerprint

Chromosomes
Chromosome
Time series
Process Optimization
Genetic algorithms
Discretization
Genetic Algorithm
Overfitting
Table lookup
Alternatives
Look-up Table
Gaussian distribution
Dimensionality Reduction
Time Series Data
Approximation Methods
Speedup
Range of data
Experiment
Experiments

Keywords

  • Discretization
  • Time series
  • Variable-chromosome-length genetic algorithm

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Muhammad Fuad, M. M. (2016). Variable-chromosome-length genetic algorithm for time series discretization. In S. Hartmann, & H. Ma (Eds.), Database and Expert Systems Applications - 27th International Conference, DEXA 2016, Proceedings (pp. 418-425). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9828 LNCS). Springer-Verlag Italia. https://doi.org/10.1007/978-3-319-44406-2_35

Variable-chromosome-length genetic algorithm for time series discretization. / Muhammad Fuad, Muhammad Marwan.

Database and Expert Systems Applications - 27th International Conference, DEXA 2016, Proceedings. ed. / Sven Hartmann; Hui Ma. Springer-Verlag Italia, 2016. p. 418-425 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9828 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference proceeding

Muhammad Fuad, MM 2016, Variable-chromosome-length genetic algorithm for time series discretization. in S Hartmann & H Ma (eds), Database and Expert Systems Applications - 27th International Conference, DEXA 2016, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9828 LNCS, Springer-Verlag Italia, pp. 418-425, 27th International Conference on Database and Expert Systems Applications, DEXA 2016, Porto, Portugal, 5/09/16. https://doi.org/10.1007/978-3-319-44406-2_35
Muhammad Fuad MM. Variable-chromosome-length genetic algorithm for time series discretization. In Hartmann S, Ma H, editors, Database and Expert Systems Applications - 27th International Conference, DEXA 2016, Proceedings. Springer-Verlag Italia. 2016. p. 418-425. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-44406-2_35
Muhammad Fuad, Muhammad Marwan. / Variable-chromosome-length genetic algorithm for time series discretization. Database and Expert Systems Applications - 27th International Conference, DEXA 2016, Proceedings. editor / Sven Hartmann ; Hui Ma. Springer-Verlag Italia, 2016. pp. 418-425 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{b49c2ae766144779a25cb610ed57f3b6,
title = "Variable-chromosome-length genetic algorithm for time series discretization",
abstract = "The symbolic aggregate approximation method (SAX) of time series is a widely-known dimensionality reduction technique of time series data. SAX assumes that normalized time series have a high-Gaussian distribution. Based on this assumption SAX uses statistical lookup tables to determine the locations of the breakpoints on which SAX is based. In a previous work, we showed how this assumption oversimplifies the problem, which may result in high classification errors. We proposed an alternative approach, based on the genetic algorithms, to determine the locations of the breakpoints. We also showed how this alternative approach boosts the performance of the original SAX. However, the method we presented has the same drawback that existed in the original SAX; it was only able to determine the locations of the breakpoints but not the corresponding alphabet size, which had to be input by the user in the original SAX. In the method we previously presented we had to run the optimization process as many times as the range of the alphabet size. Besides, performing the optimization process in two steps can cause overfitting. The novelty of the present work is twofold; first, we extend a version of the genetic algorithms that uses chromosomes of different lengths. Second, we apply this new version of variable-chromosome-length genetic algorithm to the problem at hand to simultaneously determine the number of the breakpoints, together with their locations, so that the optimization process is run only once. This speeds up the training stage and also avoids overfitting. The experiments we conducted on a variety of datasets give promising results.",
keywords = "Discretization, Time series, Variable-chromosome-length genetic algorithm",
author = "{Muhammad Fuad}, {Muhammad Marwan}",
year = "2016",
month = "8",
day = "6",
doi = "10.1007/978-3-319-44406-2_35",
language = "English",
isbn = "978-3-319-44405-5",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag Italia",
pages = "418--425",
editor = "Sven Hartmann and Hui Ma",
booktitle = "Database and Expert Systems Applications - 27th International Conference, DEXA 2016, Proceedings",
address = "Italy",

}

TY - GEN

T1 - Variable-chromosome-length genetic algorithm for time series discretization

AU - Muhammad Fuad, Muhammad Marwan

PY - 2016/8/6

Y1 - 2016/8/6

N2 - The symbolic aggregate approximation method (SAX) of time series is a widely-known dimensionality reduction technique of time series data. SAX assumes that normalized time series have a high-Gaussian distribution. Based on this assumption SAX uses statistical lookup tables to determine the locations of the breakpoints on which SAX is based. In a previous work, we showed how this assumption oversimplifies the problem, which may result in high classification errors. We proposed an alternative approach, based on the genetic algorithms, to determine the locations of the breakpoints. We also showed how this alternative approach boosts the performance of the original SAX. However, the method we presented has the same drawback that existed in the original SAX; it was only able to determine the locations of the breakpoints but not the corresponding alphabet size, which had to be input by the user in the original SAX. In the method we previously presented we had to run the optimization process as many times as the range of the alphabet size. Besides, performing the optimization process in two steps can cause overfitting. The novelty of the present work is twofold; first, we extend a version of the genetic algorithms that uses chromosomes of different lengths. Second, we apply this new version of variable-chromosome-length genetic algorithm to the problem at hand to simultaneously determine the number of the breakpoints, together with their locations, so that the optimization process is run only once. This speeds up the training stage and also avoids overfitting. The experiments we conducted on a variety of datasets give promising results.

AB - The symbolic aggregate approximation method (SAX) of time series is a widely-known dimensionality reduction technique of time series data. SAX assumes that normalized time series have a high-Gaussian distribution. Based on this assumption SAX uses statistical lookup tables to determine the locations of the breakpoints on which SAX is based. In a previous work, we showed how this assumption oversimplifies the problem, which may result in high classification errors. We proposed an alternative approach, based on the genetic algorithms, to determine the locations of the breakpoints. We also showed how this alternative approach boosts the performance of the original SAX. However, the method we presented has the same drawback that existed in the original SAX; it was only able to determine the locations of the breakpoints but not the corresponding alphabet size, which had to be input by the user in the original SAX. In the method we previously presented we had to run the optimization process as many times as the range of the alphabet size. Besides, performing the optimization process in two steps can cause overfitting. The novelty of the present work is twofold; first, we extend a version of the genetic algorithms that uses chromosomes of different lengths. Second, we apply this new version of variable-chromosome-length genetic algorithm to the problem at hand to simultaneously determine the number of the breakpoints, together with their locations, so that the optimization process is run only once. This speeds up the training stage and also avoids overfitting. The experiments we conducted on a variety of datasets give promising results.

KW - Discretization

KW - Time series

KW - Variable-chromosome-length genetic algorithm

UR - http://www.scopus.com/inward/record.url?scp=84981237573&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-44406-2_35

DO - 10.1007/978-3-319-44406-2_35

M3 - Conference proceeding

SN - 978-3-319-44405-5

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 418

EP - 425

BT - Database and Expert Systems Applications - 27th International Conference, DEXA 2016, Proceedings

A2 - Hartmann, Sven

A2 - Ma, Hui

PB - Springer-Verlag Italia

ER -