Generating multiple sequence alignments (MSA) is one of the most commonly used bioinformatics techniques. The “sequences” to be compared can be DNA (promoters, genes, genomes) or proteins. Note that the length and number of sequences to be aligned has an impact on the methods (algorithms) that can be used; what is suitable for aligning 20 proteins probably won’t work for alignment of 5 poxvirus genomes (200 kb each).
Some useful links:
- Wikipedia: multiple sequence alignment
- Wikipedia: sequence alignment
- Wikipedia: list of sequence alignment software
- Protein Multiple Sequence Alignment: Book chapter by Chuong B. Do and Kazutaka Katoh
- Sequence alignment: Lecture notes by Per Kraulis
- Another list of tools
So you see, there lots of options (did you say: “too many!”?). Further confusion may arise because 1) the same algorithm may be used in many different software programs, and 2) referencing a software package may give no clue to the algorithm used. For many molecular biologists, Clustal is synonymous with sequence alignment. However, newer algorithms such as T-Coffee and MUSCLE are often offered in current software packages, and may be faster and more accurate.
Specialized alignment tools are almost always needed for long, genome sized DNA sequences.
In this set of posts, I’ll provide some information on favorite general MSA tools (that are free) that should be useful to the average molecular virologist. The lists noted above provide a multitude of tools, but many are for specific analyses.