A Real Linear and Parallel Multiple Longest Common Subsequences (MLCS) Algorithm

Y. Li, H. Li, T. Duan, Sheng Wang, Z. Wang, Y. Cheng

Research output: Contribution to conferencePaper

10 Citations (Scopus)

Abstract

Information in various applications is often expressed as character sequences over a finite alphabet (e.g., DNA or protein sequences). In Big Data era, the lengths and sizes of these sequences are growing explosively, leading to grand challenges for the classical NP-hard problem, namely searching for the Multiple Longest Common Subsequences (MLCS) from multiple sequences. In this paper, we first unveil the fact that the state-of-the-art MLCS algorithms are unable to be applied to long and large-scale sequences alignments. To overcome their defects and tackle the longer and large-scale or even big sequences alignments, based on the proposed novel problem-solving model and various strategies, e.g., parallel topological sorting, optimal calculating, reuse of intermediate results, subsection calculation and serialization, etc., we present a novel parallel MLCS algorithm. Exhaustive experiments on the datasets of both synthetic and real-world biological sequences demonstrate that both the time and space of the proposed algorithm are only linear in the number of dominants from aligned sequences, and the proposed algorithm significantly outperforms the state-of-the-art MLCS algorithms, being applicable to longer and large-scale sequences alignments.
Original languageEnglish
Pages1725-1734
DOIs
Publication statusPublished - 2016
EventACM SIGKDD International Conference on Knowledge Discovery and Data Mining - California, United States
Duration: 13 Aug 201617 Aug 2016

Conference

ConferenceACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CityCalifornia
Period13/08/1617/08/16

Bibliographical note

The full text is currently unavailable on the repository.

Keywords

  • Multiple Longest Common Subsequences (MLCS)
  • Non-redundant Common Subsequence Graph (NCSG)
  • Topological Sorting
  • Subsection Calculation and Serialization

Fingerprint Dive into the research topics of 'A Real Linear and Parallel Multiple Longest Common Subsequences (MLCS) Algorithm'. Together they form a unique fingerprint.

  • Cite this

    Li, Y., Li, H., Duan, T., Wang, S., Wang, Z., & Cheng, Y. (2016). A Real Linear and Parallel Multiple Longest Common Subsequences (MLCS) Algorithm. 1725-1734. Paper presented at ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, California, United States. https://doi.org/10.1145/2939672.2939842