FastLSA - A Fast Linear-Space Algorithm for Sequence Alignment

  • Author(s) / Creator(s)
  • Technical report TR01-10. For two DNA or protein sequences of length m and n, dynamic programming alignment algorithms like Needleman-Wunsch and Smith-Waterman take O(m x n) time and use O(m x n) space, so we refer to them as full matrix (FM) algorithms. This space requirement means that large sequences will not completely fit in main memory. It also means that shorter sequences will not completely reside in cache memory. Hirschberg's algorithm reduces the space requirements to O(min(m, n)), but requires approximately twice the number of operations required by the FM algorithms. This paper presents the FastLSA algorithm that is adaptive to the amount of space available. It allows a user to trade space for operations. At one extreme, it uses linear space with approximately 1.5 times the number of operations required by the FM algorithms. At the other extreme, it uses quadratic space with no extra operations compared to the FM algorithms. However, our experiments show that in practice, due to memory caching effects, FastLSA and Hirschberg's algorithm are often faster than FM. This is true even though the number of operations is higher and even when there is enough main memory to hold the entire dynamic programming matrix of the FM algorithm. FastLSA has been incorporated into the newest and fastest commercialproducts like BioTools' ChromaTool. This paper also briefly describes how to parallelize the FastLSA algorithm to further improve its performance. | TRID-ID TR01-10

  • Date created
  • Subjects / Keywords
  • Type of Item
  • DOI
  • License
    Attribution 3.0 International