A coronavirus related to SARS-CoV-2 has been isolated from Malayan pangolins illegally imported into Guangdong province. It is not the precursor of SARS-CoV-2, but comparison of viral genome sequences provides further evidence that the virus currently infecting humans was not produced in a laboratory.
There are two important sequences in the viral spike glycoprotein (pictured) that are important for tracing the origin of SARS-CoV-2: a furin cleavage site (discussed last week) and the receptor binding domain (RBD).
The results of experiments in cells in culture have shown that the SARS-CoV-2 spike glycoprotein binds the cell receptor ACE2. Six amino acids in the RBD are critical for binding to this receptor. Five of these six amino acids differ in the RBD of SARS-CoV-2 compared with sequence from the bat virus RaTG13, the most closely related virus. The SARS-CoV-2 spike glycoprotein binds ACE2 with high affinity, an outcome not predicted by computational analysis of the RBD sequence. If someone were to engineer an RBD into a bat SARS-like CoV to allow efficient infection of human cells, they would not use the amino acid sequence in the SARS-CoV-2 spike. Rather the specific sequence was likely selected during replication in cells with human-like ACE2.
As discussed previously, the furin cleavage site in the SARS-CoV-2 spike is not present in the bat virus RaTG13. Its acquisition could allow enhanced infection of human cells. In addition to the furin cleavage site, an extra proline is also present, a change predicted to lead to the addition of O-linked glycans in the vicinity. If someone were to engineer the furin cleavage site into the spike, it is not likely that the extra proline would have been included. Furthermore, the addition of such glycans typically occurs under immune selection.
The genome sequences of CoVs recently isolated from pangolins are not close enough to SARS-CoV-2 to have been its immediate progenitor. However, the RBD of these pangolin CoVs are identical to that of SARS-CoV-2 at 6 of 6 of the key amino acids discussed above. This observation indicates that passage of CoV in a host with human-like ACE2 could select for a RBD with high-affinity binding. Such passage could also select for insertion of the furin cleavage site, which is not present in pangolin CoVs. Once a virus with the appropriate RBD and furin cleavage site arose in an animal – a bat or intermediate host – it would then replicate once introduced into humans.
Another possibility is that viruses with the correct RBD have been repeatedly jumping into humans, but efficient human to human transmission was not established until the acquisition of the furin cleavage site. Such is the scenario with MERS-CoV, which has jumped multiple times from camels to humans, but each chain of infection is short and soon ends. The virus has never become established in humans because the required mutations have not entered the viral genome. Serological surveys specific for SARS-CoV-2 might test this hypothesis for its emergence.
Could laboratory passage of a bat SARS-like virus lead to isolation and accidental emergence of SARS-CoV-2? This scenario would require starting with a virus that is very close to the current isolates. Passage in cell culture might have selected for the RBD amino acid changes to enable high affinity ACE2 binding. However this virus would have had to be very similar to SARS-CoV-2, and no such isolate is known to be present in any laboratory. Selection of viruses with a furin cleavage site would likely have taken extensive passaging in cells. Finally, it is unlikely that the O-linked glycan addition site would have emerged without immune pressure, which is absent in cell cultures.
Proving or disproving any of these hypotheses for the emergence of SARS-CoV-2 might never be possible. Nevertheless, isolation of SARS-like viruses from a variety of animals might help to clarify the steps to emergence in humans. For MERS-CoV, a priority should be to prevent human infections, perhaps by immunizing camels, to avoid the emergence of another epidemic CoV with sustained transmission in humans.