A Real Linear and Parallel Multiple Longest Common Subsequences (MLCS) Algorithm

Y. Li, H. Li, T. Duan, Sheng Wang, Z. Wang, Y. Cheng

    Research output: Contribution to conferencePaper

    10 Citations (Scopus)

    Abstract

    Information in various applications is often expressed as character sequences over a finite alphabet (e.g., DNA or protein sequences). In Big Data era, the lengths and sizes of these sequences are growing explosively, leading to grand challenges for the classical NP-hard problem, namely searching for the Multiple Longest Common Subsequences (MLCS) from multiple sequences. In this paper, we first unveil the fact that the state-of-the-art MLCS algorithms are unable to be applied to long and large-scale sequences alignments. To overcome their defects and tackle the longer and large-scale or even big sequences alignments, based on the proposed novel problem-solving model and various strategies, e.g., parallel topological sorting, optimal calculating, reuse of intermediate results, subsection calculation and serialization, etc., we present a novel parallel MLCS algorithm. Exhaustive experiments on the datasets of both synthetic and real-world biological sequences demonstrate that both the time and space of the proposed algorithm are only linear in the number of dominants from aligned sequences, and the proposed algorithm significantly outperforms the state-of-the-art MLCS algorithms, being applicable to longer and large-scale sequences alignments.
    Original languageEnglish
    Pages1725-1734
    DOIs
    Publication statusPublished - 2016
    EventACM SIGKDD International Conference on Knowledge Discovery and Data Mining - California, United States
    Duration: 13 Aug 201617 Aug 2016

    Conference

    ConferenceACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    Country/TerritoryUnited States
    CityCalifornia
    Period13/08/1617/08/16

    Bibliographical note

    The full text is currently unavailable on the repository.

    Keywords

    • Multiple Longest Common Subsequences (MLCS)
    • Non-redundant Common Subsequence Graph (NCSG)
    • Topological Sorting
    • Subsection Calculation and Serialization

    Fingerprint

    Dive into the research topics of 'A Real Linear and Parallel Multiple Longest Common Subsequences (MLCS) Algorithm'. Together they form a unique fingerprint.

    Cite this