Abstract
The Shortest Common Supersequence Problem asks to obtain a shortest string that is a supersequence of every member of a given set of strings. It has applications, among others, in data compression and oligonucleotide microarray production. The problem is NP-hard, and the existing exact solutions are impractical for large instances. In this paper, a new beam search algorithm is proposed for the problem, which employs a probabilistic heuristic and uses the dominance property to further prune the search space. The proposed algorithm is compared with three recent algorithms proposed for the problem on both random and biological sequences, outperforming them all by quickly providing solutions of higher average quality in all the experimental cases. The Java source and binary files of the proposed IBS-SCS algorithm and our implementation of the DR algorithm and all the random and real datasets used in this paper are freely available upon request.
Original language | English |
---|---|
Pages (from-to) | 457-467 |
Number of pages | 11 |
Journal | Engineering Applications of Artificial Intelligence |
Volume | 25 |
Issue number | 3 |
Early online date | 20 Sept 2011 |
DOIs | |
Publication status | Published - Apr 2012 |
Externally published | Yes |
Keywords
- Dynamic programming
- Heuristic
- Microarray production
- Optimization
- Sequence analysis
- Shortest common supersequence
ASJC Scopus subject areas
- Control and Systems Engineering
- Artificial Intelligence
- Electrical and Electronic Engineering