Abstract
Given a set of aligned fragments, haplotype assembly is the problem of finding the haplotypes from which the fragments have been read. The problem is important because haplotypes contain SNP information, which is essential to many genomic analyses such as the analysis of potential association between certain diseases and genetic variations. The current state-of-the-art haplotype assembly algorithm, HapSAT, does not exploit genotype information and only receives a read matrix as input. However, the imminent importance of haplotypes and inexpensiveness of genotype information motivate for exploiting genotype information to obtain more accurate haplotypes. In this paper, an improved haplotype assembly method, xGenHapSAT, is proposed, which exploits xor genotype information for more accurate haplotype assembly. Xor genotype information is even less expensive than full genotype information, e.g., using the Denaturing High-Performance Liquid Chromatography (DHPLC) technique. It is shown that using this inexpensively obtainable information significantly improves the accuracy of the assembled haplotypes. In addition, a new, more efficient, Max-2-SAT formulation is adopted in xGenHapSAT, which, on average, increases the speed of the algorithm. Moreover, the proposed xGenHapSAT method replaces the current state-of-the-art haplotype assembly method based on genotype information. Finally, our state-of-the-art haplotype assembly software, HapSoft, which includes both xGenHapSAT and HapSAT, is made freely available for research purposes.
Original language | English |
---|---|
Pages (from-to) | 122-130 |
Number of pages | 9 |
Journal | Journal of Theoretical Biology |
Volume | 298 |
Early online date | 12 Jan 2012 |
DOIs | |
Publication status | Published - 7 Apr 2012 |
Externally published | Yes |
Keywords
- Computational biology
- Haplotype assembly
- Single individual haplotyping
- SNP
- Xor genotype.
ASJC Scopus subject areas
- Statistics and Probability
- Modelling and Simulation
- Biochemistry, Genetics and Molecular Biology(all)
- Immunology and Microbiology(all)
- Agricultural and Biological Sciences(all)
- Applied Mathematics