Viral bioinformatics: Sequence searcher

virology toolboxThis week’s addition to the virology toolbox was written by Chris Upton

Sequence Searcher is a Java program that allows users to search for specific sequence motifs in protein or DNA sequences. For example, it can be used to identify restriction enzyme cleavage sites or find similar sequence patterns among multiple sequences. Most searches run in a few seconds.

Sequence Searcher is part of the suite of programs available at the University of Victoria.

Help files:

Some of the key features of Sequence Searcher include:

  • Searching through multiple sequences
  • Use of regular expressions or fuzzy search patterns.
  • Searching for patterns on both strands of a DNA sequence
  • Graphical representation of results and ability to save search results
  • It can run on multiple computer platforms (Java)

For DNA, the searches are conducted by finding the motif within a sequence from the 5’ to 3’ end on the top strand. The searches are also processed from the 5’ to 3’ end of the bottom strand. As a result, bases are numbered from 1 starting at the 5’ at either the top or bottom strand.

Regular expression and fuzzy pattern searches are available:

Fuzzy searches provide the option for the program to allow a certain number of mismatches from a sequence input at any position.  Note that the maximum number of mismatches that the program allows is 40% of the length of the sequence motif.

Regular expression allows for inputs of precise motifs along with considerable user-specified flexibility at specific positions.

figure 1

Figure 1. The input tab is where you can import DNA or protein sequences (must be in FASTA format) and type in the specific pattern to search within in the sequence(s). The search type can be selected as “Regular expression” or “Fuzzy” by using the drop down menu.

figure 2

Figure 2. When a search has been completed, the results tab is presented in a table format. The results in the table can be sorted depending on the column header (sequence, match, start, stop, confidence, and strand). The results can also be filtered by sequence and strand by selecting the drop down menus at the top.

Marass, F., & Upton, C. (2009). Sequence Searcher: A Java tool to perform regular expression and fuzzy searches of multiple DNA and protein sequences BMC Research Notes, 2 (1) DOI: 10.1186/1756-0500-2-14

Swine influenza daily update

influenza-virion1Here is an update on the global swine flu situation as of 28 April 2009.

There are now 64 laboratory confirmed cases of infection with the H1N1 swine influenza strain, up from 40 the day before. States reporting cases are California (10), Kansas, (2), New York City (45), Ohio (1) and Texas (6). These are the same states that reported isolations on the previous day. There are new laboratory confirmed isolations of the virus in Australia (3), Israel (1, a traveler returning from Mexico), and New Zealand (3). The number of laboratory confirmed cases in Mexico remains the same as the previous day: 26 cases, 7 deaths. This brings the total number of countries reporting laboratory confirmed cases to seven.

Swine influenza virus isolates from the US and Mexico have been given names according to the proper nomenclature, which takes the following form:
Influenza type/Country/isolate number/year (subtype).

Accordingly, the following swine influenza virus strains have been isolated:

A/California/04/2009 (H1N1)
A/California/07/2009 (H1N1)
A/California/08/2009 (H1N1)
A/California/10/2009 (H1N1)
A/Texas/04/2009 (H1N1)
A/Texas/05/2009 (H1N1)
A/Kansas/03/2009 (H1N1)
A/Ohio/07/2009 (H1N1)
A/New York/19/2009 (H1N1)
A/New York/20/2009 (H1N1)
A/Mexico/4482/2009 (H1N1)
A/Mexico/4486/2009 (H1N1)
A/Mexico/4108/2009 (H1N1)
A/Mexico/4115/2009 (H1N1)
A/Mexico/4603/2009 (H1N1)
A/Mexico/4604/2009 (H1N1)

I expect to see many more isolates from different countries in the coming weeks.

Today the CDC released genome sequences of the viral RNAs of six swine flu isolates from California and Texas (Addendum: sequences of New York, Ohio, and Kansas isolates were added late yesterday). The influenza virus genome consists of eight segments of RNA, each coding for one or more proteins (illustrated). Each RNA segment has a name – PB2, PB1, PA, HA, NP, NA, MP, and NS. Mystery Rays has done a quick analysis of the sequences. The isolates are all the same strain, but they are not identical. Unfortunately we don’t yet have genome sequence from any Mexican isolate – otherwise we could determine if they are significantly different. Such information might provide clues about why the disease in Mexico seems to be more severe than elsewhere.

A comparison with RNA sequences of other influenza virus isolates shows that most of the viral RNAs are from swine influenza viruses, with the possible exception of the PB1 RNA, which may be derived from a human H1N1 virus. This observation is somewhat surprising, because last week we were told that the new swine virus had RNA segments from pig, human, and avian influenza viruses. According to ProMED-mail, the NA and MP genes are related to those of influenza viruses from Asian-European swine, and the other genes appear to originate from swine flu viruses from pigs in North America. The data are in accord with the original assertion of the CDC that all genes of the new isolate were derived from swine viruses.

The fact that A/California/04/2009 and related isolates are pig viruses, with little or no genetic material from human influenza virus strains, is fascinating. Clearly these strains are different from viruses that circulate in pigs because they can be transmitted among humans and cause respiratory disease. It will be very important to compare the sequences of these isolates with viruses obtained from pigs in an attempt to determine what changes enabled the virus to adapt to humans.