A dancing matrix of viruses

Back in 1974, before it was possible to determine the sequence of a viral genome, before we knew much about the origin of viruses and their ability to move genes from organism to organism, Lewis Thomas wrote the following incredibly prescient words in The Lives of a Cell:

The viruses, instead of being single-minded agents of disease and death, now begin to look more like mobile genes. We live in a dancing matrix of viruses; they dart, rather like bees, from organism to organism, from plant to insect to mammal to me and back again, and into the sea, tugging along pieces of this genome, strings of genes from that, transplanting grafts of DNA, passing around heredity as though at a great party. They may be a mechanism for keeping new, mutant kinds of DNA in the widest circulation among us. If this is true, the odd virus disease, on which we must focus so much of our attention in medicine, may be looked on as an accident, something dropped.

When Thomas wrote these words we knew that bacteriophages could move pieces of DNA from bacterium to bacterium, but we had no idea of the global scale of this movement. We did not know that most viruses could carry genes from cell to cell, nor did we appreciate that viruses could be beneficial. I am amazed by the accuracy of his words written at a time when we knew so little.

TWiV 235: Live in Edmonton, eh?

Episode #235 of the science show This Week in Virology was recorded before an audience at the 2nd Li Ka Shing Institute of Virology Symposium at the University of Alberta, where they spoke with Dave, Stan, and Lorne about their work on poxvirus vaccines and recombination, an enveloped picornavirus, antivirals against hepatitis B and C viruses, and supporting virology research in Alberta.

You can find TWiV #235 at www.microbe.tv/twiv.

A DNA virus with the capsid of an RNA virus

Boiling Springs Lake Lassen NPViral genomes are unusual because they can be based on RNA or DNA, in contrast to all cellular life forms, which have DNA as their genetic information. An unusual new virus has been discovered that appears to have sequences from both an RNA and a DNA virus.

The new virus was identified during a study of viral diversity in an extreme environment, Boiling Spring Lake. You would never want to swim there: it is acidic (pH 2.5) and hot (52° − 95° C). But the lake is not devoid of living things: it is inhabited by various bacteria, Archaea, and unicellular eukaryotes. Where there is life, there are viruses, which leads us to an expedition to determine what kinds of viruses can be found in Boiling Spring Lake.

To answer this question, Goeff Diemer and Kenneth Stedman sequenced viral DNA extracted from purified viral particles from Boiling Spring Lake water. Their analyses revealed the presence of a virus with a circular, single-stranded DNA genome similar to that found in members of the Circoviridae (this virus family includes porcine circovirus and chicken anemia virus). What surprised the investigators was that the gene encoding the viral capsid protein was similar to that from viruses with single-stranded RNA genomes, including viruses that infect plants (Tombusviridae) or fungi. The authors call it ‘RNA-DNA hybrid virus’, or RDHV. The host of RDHV is unknown but could be one of the eukaryotes that inhabit Boiling Spring Lake.

RDHV probably arose when a circovirus acquired the capsid protein of an RNA virus by DNA recombination. This event likely occurred in a cell infected with both viruses. A cellular reverse transcriptase might have converted the circovirus RNA genome to DNA to allow recombination to occur. RDHV is unusual because genetic exchanges among viruses are restricted to those with similar genome types.

To determine if RDHV is an oddity, the authors searched the database of DNA sequences obtained from the Global Ocean Survey. They found three RDHV-like genomes, indicating that these viruses exist in the ocean. Whether they are present elsewhere is a question that should certainly be answered. It is important to determine whether recombination between RNA and DNA viruses is a common means of gene exchange, or whether it is a rarity.

The discovery of RDHV could have implications for viral evolution. It has been suggested that the first organisms that evolved on earth were based on RNA molecules with coding and catalytic capabilities. Later, DNA based life evolved, and today both DNA based and RNA based organisms co-exist. Viruses like RDHV could have emerged during the transition from an RNA to a DNA world, when a new DNA virus captured the gene encoding an RNA virus capsid. In other words, RNA genes that had already evolved were not discarded but appropriated by DNA viruses. This scenario would have required some mechanism for converting RNA into DNA (reverse transcriptases?). The finding of RDHV-like viruses in the ocean suggests that a common ancestor emerged some time ago which diversified into different environments. More RDHV-like viruses must be isolated and studied before we can determine whether or not these viruses are very old, and to deduce their implications for viral evolution.

Diemer GS, Stedman KM. 2012. A novel virus discovered in an extreme environment suggests recombination between unrelated groups of RNA and DNA viruses. Biology Direct 7:13.

Viral bioinformatics: Recombination

This week’s addition to the virology toolbox was written by Danielle Coulson and Chris Upton

Comparing genomes of viral strains can provide very useful insight into evolutionary relationships. Recombination, defined by Posada et al (2001) as the exchange of genetic information between two nucleotide sequences, is quite common in many viruses. Because recombination accounts for much of the genetic diversity observed between viral strains, it is of interest to decipher where the origins of recombinant sequences are, and to know which viral strains are likely to have undergone recombination. Several programs exist to detect recombination among genomes and to identify breakpoints in sequences, which represent recombinant regions. A few are described here, using HIV-1 strains isolated from Uganda, where subtypes A and D are prominent. Recombinant strains have arisen in Eastern Africa, given the co-circulation of different types of strains. Three recombination-detection programs are described here, using HIV-1 strains isolated from Uganda, where subtype A and D are prominent, and recombinant A-D strains have arisen.

Recombination analysis tool (RAT)

RAT is a very simple, easy-to-use cross-platform program that allows for the comparison and detection of recombination of between multiple sequences, in a straightforward graphical user interface. It provides a clear graphical output, depicting recombination crossover points between sequences by plotting the genetic distance between each sequence as a function of its sequence position. A sequence alignment (FASTA format works best, although other alignment files will work) is input, and default parameters may be maintained or changed. The default settings are well optimized for analysis; however, a rule of thumb is that window size should be 10% of the sequence length, and the increment size should be half the window size.

Figure 1. Sequence input window in RAT, where parameters can be adjusted, or left as default values. Here, a FASTA file containing three HIV sequences is input, and the suspected recombinant is selected as the test sequence, to which other sequences will be compared.

RAT is useful in that it allows the user to check for recombination between sequences already thought to be recombinants, as well as to conduct an auto search to find possible recombination spots. By clicking execute, a sequence viewer will display the similarity of all sequences in the alignment as compared to a specified test sequence. Useful in this display is the option to select and unselect different sequences, in order to view all sequences at once to find possible recombination sites, or to view two at a time to decipher specific recombination breakpoints. When conducting an auto search, results can be screened based on customized similarity thresholds. All possible recombination points within the threshold are listed between which sequences they occur.

Figure 2. Output for specified sequence search, showing the genetic distance of two HIV strains of clade A and D from the test sequence (the AD recombinant strain) on the Y-axis, and the sequence position on the X-axis. A possible recombination spot occurs at position 4874, where the recombinant sequence now shares higher similarity with clade D than clade A. This recombinant region appears to end at approximate position 6203.

Finally, the graphical representation of genetic distance between strains can be exported as a JPG file.

RAT works based on a distance method, whereby pairwise comparisons between sequences are performed as a sliding window moves along the length of the sequence. A score is generated based on the similarity between the nucleotides in the current window of the test sequence and the nucleotides in the same window of the other sequences. While useful information is provided in the RAT program, it does not provide statistical support for the results generated. However, this is an advantage as it allows analysis to proceed very quickly, which is extremely useful to get an overview of potential recombinant sites.

SimPlot

Simplot (for Windows) is another useful tool for detecting recombination between sequences, which like RAT, produces similarity plots, but has more features and therefore is slightly more complex. SimPlot allows for the analysis of up to 10 sequences (although the alignment may have more than this), where each can be used as a query sequence with which to compare the rest, or hidden from the analysis. Other useful and unique features of SimPlot are the ability to ignore sites containing gaps in the alignment when generating the similarity plot, as well as being able to identify the sequence position and exact similarity value on any point in the sequence you click on. Furthermore, there is a zoom in feature, as well as options to include titles, legends, grid lines and other useful information as part of the display. SimPlot also allows sequences to be grouped together, and analysis to be performed between groups rather than individual sequences.

Sequences (FASTA format, as well as other common alignment files) are loaded into the program, and those that are desired are selected for analysis. Several options are available for analysis, such as which Distance Model to use, and the number of Bootstrap replicates to use for statistical significance.

Figure 3. Similarity plot generated in SimPlot using one query sequence (recombinantAD) and two other sequences, (HIV-1 strains from Clade A1 and Clade D). Possible recombination sites are identified where sequence crossover occurs. By zooming in to better view, and clicking on crossover regions, breakpoints are determined, such as at regions 4481 and 5681 in the alignment. Other potential recombinant regions are also identified that were not obvious in RAT.

Finally, SimPlot provides the option of finding specific recombinant sites. After identifying potential recombinant sequences on the similarity plot, specific sequences within each group can be chosen for an informative site analysis. Overall, SimPlot provides a very effective means of detecting recombination, in an easy-to-use interface with fast results.

Recombination Detection Program (RDP)

RDP (for Windows) is yet another program that allows for detection of recombination amongst aligned sequences, however it is unique in that it incorporates several detection methods and analysis algorithms into one well laid-out interface, allowing the user to select which method of recombination detection is most suitable and provides the best results.

Figure 4. RDP Overall Display

Furthermore, recombination events that are detected are displayed graphically, with statistical evidence provided, and recombination events are also depicted on phylogenetic trees constructed from proposed recombinant regions. These features allow the user to decipher which events are true recombination events, and discard those that have been incorrectly identified. Importantly, possible recombination events are listed with warnings, to indicate when the program is not confident of the proposed recombination event, its location and sequence breakpoints, or its contributing sequences.

Easy navigation through the sequences is possible, as these are displayed alongside the statistical display of recombinant regions and breakpoints, the schematic display of recombinant sequences, as well as the dendogram display.

Figure 5. Sequence display

Figure 6. Schematic sequence display of recombinant regions; each recombinant block can be selected in order to view the supporting evidence for recombination.

Figure 7. Recombination information; displayed here are any relevant warning suggesting reasons why the recombinant may have been misidentified, as well as statistical evidence from each algorithm used supporting the event.

Figure 8. Graphical representation of recombinant region (shown in pink) as determined by the RDP algorithm. Plotted on the Y axis is Pairwise identity of each pair of sequence, against their position in the alignment on the X axis.

By clicking on the various potential recombinant sequence blocks in the schematic sequence display of recombinant regions, graphical evidence will appear for each on the bottom left display area. The pink region here shows the likely recombinant region. Each potential recombinant region will also have its accompanying statistical evidence displayed in the top right corner in the recombination information display. The tree display is quite useful in determining true recombination events; it provides a dendrogram of non-recombinant regions and recombinant regions which to compare. While the default setting is to create trees using the neighbor joining method (which requires less time), RDP is also able to create trees using UPGMA, least squares, Bayesian and Maximum Likelihood algorithms.

As mentioned, RDP provides statistical evidence for each recombination event, as determined by several different methods. Although the default displays evidence as determined by the method which most strongly suggested recombination in that region, the user can easily see graphical displays of the different methods used to find the breakpoint.

Among the different recombination detection algorithms in the tool, are RDP, Geneconv, Bootscan, MaxChi, Chimaera, SiScan, 3Seq, LARD and TOPAL, all of which are optimized to detect recombination in different ways, thus allowing for detection of recombination in various different alignments. Furthermore, each type of analysis has several customizable options, set to default values that work well. Manual Distance plots, similar to those created by SimPlot and RDP are also possible, where any selected sequence can be queried against all other in the alignment.

With the vast array of options and analysis preferences that are available on RDP, the average run-time for an alignment is longer than for the other programs, however, much more information is provided. Knowing which detection algorithm best suits the alignment allows the user to select which algorithms should be used, allowing the analysis to proceed much faster.

Finally, this program is accompanied by an extremely useful user’s manual, explaining the algorithms that are available and which is best suited to different alignments. The manual also includes a step by step guide, which details the process of detecting recombination in sequences, from preliminary hypothesis, to finding conclusive statistically supported recombinant regions.

Example sequences:

Clade A1: HIV-1 isolate 99UGA07072 from Uganda, partial genome

GenBank: AF484478.1

Clade D: HIV-1 isolate 99UGC06443 from Uganda, partial genome

GenBank: AF484479.1

RecombinantAD: HIV-1 isolate 99UGB21875 from Uganda, partial genome

GenBank: AF484480.1

Recombination between cellular and viral RNA produces a pathogenic virus

Bovine viral diarrhea virus is an economically important animal pathogen that may cause a fatal gastrointestinal disease in beef and dairy herds. Infection of a fetus with this virus during the first trimester leads to the birth of animals that are persistently infected for life. Some animals remain healthy, while others develop severe mucosal disease. The lethal outcome is a consequence of RNA recombination that produces a cytopathic virus.

Pathogenicity of bovine viral diarrhea virus is associated with the synthesis of a the viral protein NS3. This protein is not produced by the noncytopathic virus that persistently infects cows for life. Absence of the protein is due to failure to cleave the precursor of NS3, called NS2-3. In cells infected with the cytopathic, disease-causing virus, NS3 is produced because the virus has acquired an extra cleavage site. This difference is illustrated in the diagram (click for a larger view).

The extra cleavage site in the viral protein is acquired when the viral RNA of the noncytopathic virus recombines with cellular RNA. This exchange of sequence probably occurs when the enzyme copying the viral RNA briefly switches to a cellular RNA, and then back to the viral RNA. The result is a copy of the viral RNA into which a cellular sequence has been inserted.

The cleavage site for NS3 can be created in several ways. One of the most frequent is the insertion of a cellular RNA sequence coding for ubiquitin (UCH in the diagram). This small protein can be cleaved by members of a family of cellular proteases (proteases are enzymes that cut proteins). The insertion of ubiquitin leads to cleavage of NS2-3 and the production of NS3. The recombinant viruses replicate faster than noncytopathic viruses and cause disease in cattle. Why pathogenicity is associated with release of the NS3 protein, which is involved in viral RNA synthesis, is not known.

The production of pathogenic pestiviruses by recombination with cellular RNA is another illustration of the many unexpected pathways of viral evolution.

Meyers, G., Tautz, N., Dubovi, E., & Thiel, H. (1991). Viral cytopathogenicity correlated with integration of ubiquitin-coding sequences. Virology, 180 (2), 602-616 DOI: 10.1016/0042-6822(91)90074-L

A plant virus that switched to vertebrates

Circovirus genomeViruses can be transmitted to completely new host species that they have not previously infected. Usually host defenses stop the infection before any replication and adaptation can take place. On rare occasions, a novel population of viruses arises in the new host. These interspecies infections can sometimes be deduced by sequence analyses, providing a glimpse of the amazing and unpredictable paths of virus evolution. One example is a plant virus that switched hosts and infected vertebrates.

Circoviruses infect vertebrates and have small, circular, single-stranded DNA genomes. Nanoviruses have the same genome structure, but infect plants. The genes encoding one of the viral proteins – called the Rep protein – appear to be hybrids, and share significant sequence similarity. They also exhibit homology with a protein encoded by caliciviruses, which are RNA viruses that infect many different vertebrates.

Analysis of the viral DNA sequences suggests that two remarkable events occurred during the evolution of circoviruses and nanoviruses. Not long ago, a nanovirus was transmitted from a plant to a vertebrate. This event might have occurred when a vertebrate fed on an infected plant. The virus adapted to vertebrates, and the circovirus family was established. After the host switch from plants to vertebrates, recombination took place between the circovirus and a vertebrate calicivirus. A reverse transcriptase probably converted the circovirus RNA genome to DNA to allow recombination to occur.

Similar interspecies transmission events have lead to outbreaks of human disease. One notable example is the transfer of simian immunodeficiency virus-1 from chimpanzees to humans. This host switch event, which is believe to have occurred in the early part of the 20th century, lead to the current AIDS pandemic.

Gibbs, M. (1999). Evidence that a plant virus switched hosts to infect a vertebrate and then recombined with a vertebrate-infecting virus Proceedings of the National Academy of Sciences, 96 (14), 8022-8027 DOI: 10.1073/pnas.96.14.8022

The trajectory of evolution

quasispecies-selectionScientists and philosophers have long debated the trajectory of evolution. Some of the questions they consider include: is there a predictable direction for evolution, and if there is, what is the pathway? Are there evolutionary dead ends?

Viruses are excellent subjects for the study of evolution: they have short generation times, high yields of offspring, and prodigious levels of mutation, recombination, and reassortment. Furthermore, selection pressures can be readily applied in the laboratory, and may be often be identified in nature.

When studying evolution of viruses, it is important to avoid judging outcomes as ‘good’ or ‘bad’. Anthropormorphic assessments of virus evolution come naturally to humans, but concluding that viruses become ‘better adapted’ to their hosts, for example, fails to recognize the main goal of evolution: survival. Or, in the case of the non-living viruses, existence.

Evolution does not move a viral genome from simple to complex, or along a trajectory aimed at perfection. Change comes about by eliminating those viruses that are not well adapted for the current conditions, not by building something that will fare better tomorrow.