An improved algorithm for the longest common subsequence problem

Seyed Rasoul Mousavi, Farzaneh Tabataba

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

The Longest Common Subsequence problem seeks a longest subsequence of every member of a given set of strings. It has applications, among others, in data compression, FPGA circuit minimization, and bioinformatics. The problem is NP-hard for more than two input strings, and the existing exact solutions are impractical for large input sizes. Therefore, several approximation and (meta) heuristic algorithms have been proposed which aim at finding good, but not necessarily optimal, solutions to the problem. In this paper, we propose a new algorithm based on the constructive beam search method. We have devised a novel heuristic, inspired by the probability theory, intended for domains where the input strings are assumed to be independent. Special data structures and dynamic programming methods are developed to reduce the time complexity of the algorithm. The proposed algorithm is compared with the state-of-the-art over several standard benchmarks including random and real biological sequences. Extensive experimental results show that the proposed algorithm outperforms the state-of-the-art by giving higher quality solutions with less computation time for most of the experimental cases.
Original languageEnglish
Pages (from-to)512-520
Number of pages9
JournalComputers & Operations Research
Volume39
Issue number3
DOIs
Publication statusPublished - Mar 2012

Fingerprint

Longest Common Subsequence
Strings
Beam Search
Data compression
Data Compression
Probability Theory
Bioinformatics
Heuristic algorithms
Subsequence
Dynamic programming
Search Methods
Metaheuristics
Heuristic algorithm
Field Programmable Gate Array
Time Complexity
Dynamic Programming
Data structures
Field programmable gate arrays (FPGA)
Computational complexity
Data Structures

Keywords

  • Longest common subsequence
  • LCS
  • Beam search
  • Heuristic function
  • Algorithms
  • Bioinformatics

Cite this

An improved algorithm for the longest common subsequence problem. / Mousavi, Seyed Rasoul; Tabataba, Farzaneh.

In: Computers & Operations Research, Vol. 39, No. 3, 03.2012, p. 512-520.

Research output: Contribution to journalArticle

Mousavi, Seyed Rasoul ; Tabataba, Farzaneh. / An improved algorithm for the longest common subsequence problem. In: Computers & Operations Research. 2012 ; Vol. 39, No. 3. pp. 512-520.
@article{38f6dbac459e4137b0df5d8713885c14,
title = "An improved algorithm for the longest common subsequence problem",
abstract = "The Longest Common Subsequence problem seeks a longest subsequence of every member of a given set of strings. It has applications, among others, in data compression, FPGA circuit minimization, and bioinformatics. The problem is NP-hard for more than two input strings, and the existing exact solutions are impractical for large input sizes. Therefore, several approximation and (meta) heuristic algorithms have been proposed which aim at finding good, but not necessarily optimal, solutions to the problem. In this paper, we propose a new algorithm based on the constructive beam search method. We have devised a novel heuristic, inspired by the probability theory, intended for domains where the input strings are assumed to be independent. Special data structures and dynamic programming methods are developed to reduce the time complexity of the algorithm. The proposed algorithm is compared with the state-of-the-art over several standard benchmarks including random and real biological sequences. Extensive experimental results show that the proposed algorithm outperforms the state-of-the-art by giving higher quality solutions with less computation time for most of the experimental cases.",
keywords = "Longest common subsequence, LCS, Beam search, Heuristic function, Algorithms, Bioinformatics",
author = "Mousavi, {Seyed Rasoul} and Farzaneh Tabataba",
year = "2012",
month = "3",
doi = "10.1016/j.cor.2011.02.026",
language = "English",
volume = "39",
pages = "512--520",
journal = "Surveys in Operations Research and Management Science",
issn = "0305-0548",
publisher = "Elsevier",
number = "3",

}

TY - JOUR

T1 - An improved algorithm for the longest common subsequence problem

AU - Mousavi, Seyed Rasoul

AU - Tabataba, Farzaneh

PY - 2012/3

Y1 - 2012/3

N2 - The Longest Common Subsequence problem seeks a longest subsequence of every member of a given set of strings. It has applications, among others, in data compression, FPGA circuit minimization, and bioinformatics. The problem is NP-hard for more than two input strings, and the existing exact solutions are impractical for large input sizes. Therefore, several approximation and (meta) heuristic algorithms have been proposed which aim at finding good, but not necessarily optimal, solutions to the problem. In this paper, we propose a new algorithm based on the constructive beam search method. We have devised a novel heuristic, inspired by the probability theory, intended for domains where the input strings are assumed to be independent. Special data structures and dynamic programming methods are developed to reduce the time complexity of the algorithm. The proposed algorithm is compared with the state-of-the-art over several standard benchmarks including random and real biological sequences. Extensive experimental results show that the proposed algorithm outperforms the state-of-the-art by giving higher quality solutions with less computation time for most of the experimental cases.

AB - The Longest Common Subsequence problem seeks a longest subsequence of every member of a given set of strings. It has applications, among others, in data compression, FPGA circuit minimization, and bioinformatics. The problem is NP-hard for more than two input strings, and the existing exact solutions are impractical for large input sizes. Therefore, several approximation and (meta) heuristic algorithms have been proposed which aim at finding good, but not necessarily optimal, solutions to the problem. In this paper, we propose a new algorithm based on the constructive beam search method. We have devised a novel heuristic, inspired by the probability theory, intended for domains where the input strings are assumed to be independent. Special data structures and dynamic programming methods are developed to reduce the time complexity of the algorithm. The proposed algorithm is compared with the state-of-the-art over several standard benchmarks including random and real biological sequences. Extensive experimental results show that the proposed algorithm outperforms the state-of-the-art by giving higher quality solutions with less computation time for most of the experimental cases.

KW - Longest common subsequence

KW - LCS

KW - Beam search

KW - Heuristic function

KW - Algorithms

KW - Bioinformatics

U2 - 10.1016/j.cor.2011.02.026

DO - 10.1016/j.cor.2011.02.026

M3 - Article

VL - 39

SP - 512

EP - 520

JO - Surveys in Operations Research and Management Science

JF - Surveys in Operations Research and Management Science

SN - 0305-0548

IS - 3

ER -