Pangolins and the origin of SARS-CoV-2 coronavirus

coronavirus SpikeA coronavirus related to SARS-CoV-2 has been isolated from Malayan pangolins illegally imported into Guangdong province. It is not the precursor of SARS-CoV-2, but comparison of viral genome sequences provides further evidence that the virus currently infecting humans was not produced in a laboratory.

There are two important sequences in the viral spike glycoprotein (pictured) that are important for tracing the origin of SARS-CoV-2: a furin cleavage site (discussed last week) and the receptor binding domain (RBD).

The results of experiments in cells in culture have shown that the SARS-CoV-2 spike glycoprotein binds the cell receptor ACE2. Six amino acids in the RBD are critical for binding to this receptor. Five of these six amino acids differ in the RBD of SARS-CoV-2 compared with sequence from the bat virus RaTG13, the most closely related virus. The SARS-CoV-2 spike glycoprotein binds ACE2 with high affinity, an outcome not predicted by computational analysis of the RBD sequence. If someone were to engineer an RBD into a bat SARS-like CoV to allow efficient infection of human cells, they would not use the amino acid sequence in the SARS-CoV-2 spike. Rather the specific sequence was likely selected during replication in cells with human-like ACE2.

As discussed previously, the furin cleavage site in the SARS-CoV-2 spike is not present in the bat virus RaTG13. Its acquisition could allow enhanced infection of human cells. In addition to the furin cleavage site, an extra proline is also present, a change predicted to lead to the addition of O-linked glycans in the vicinity. If someone were to engineer the furin cleavage site into the spike, it is not likely that the extra proline would have been included. Furthermore, the addition of such glycans typically occurs under immune selection.

The genome sequences of CoVs recently isolated from pangolins are not close enough to SARS-CoV-2 to have been its immediate progenitor. However, the RBD of these pangolin CoVs are identical to that of SARS-CoV-2 at 6 of 6 of the key amino acids discussed above. This observation indicates that passage of CoV in a host with human-like ACE2 could select for a RBD with high-affinity binding. Such passage could also select for insertion of the furin cleavage site, which is not present in pangolin CoVs. Once a virus with the appropriate RBD and furin cleavage site arose in an animal – a bat or intermediate host – it would then replicate once introduced into humans.

Another possibility is that viruses with the correct RBD have been repeatedly jumping into humans, but efficient human to human transmission was not established until the acquisition of the furin cleavage site. Such is the scenario with MERS-CoV, which has jumped multiple times from camels to humans, but each chain of infection is short and soon ends. The virus has never become established in humans because the required mutations have not entered the viral genome. Serological surveys specific for SARS-CoV-2 might test this hypothesis for its emergence.

Could laboratory passage of a bat SARS-like virus lead to isolation and accidental emergence of SARS-CoV-2? This scenario would require starting with a virus that is very close to the current isolates. Passage in cell culture might have selected for the RBD amino acid changes to enable high affinity ACE2 binding. However this virus would have had to be very similar to SARS-CoV-2, and no such isolate is known to be present in any laboratory. Selection of viruses with a furin cleavage site would likely have taken extensive passaging in cells. Finally, it is unlikely that the O-linked glycan addition site would have emerged without immune pressure, which is absent in cell cultures.

Proving or disproving any of these hypotheses for the emergence of SARS-CoV-2 might never be possible. Nevertheless, isolation of SARS-like viruses from a variety of animals might help to clarify the steps to emergence in humans. For MERS-CoV, a priority should be to prevent human infections, perhaps by immunizing camels, to avoid the emergence of another epidemic CoV with sustained transmission in humans.

Comments on this entry are closed.

  • Timothy Takemoto 22 February 2020, 5:40 pm

    You write
    “If someone were to engineer an RBD into a bat SARS-like CoV to allow efficient infection of human cells, they would not use the amino acid sequence in the SARS-CoV-2 spike.”

    Is there any chance that someone may have been aiming to *reduce* SARS binding efficiency?

    SARS binds to ACE2 and to C-type lectins. HIV1 binds to the latter while NCov-19 binds only to the former, which is a genome prevalent in East Asians. Could this have been deliberately achieved by replacing 4 of the 6 SARS RBD with the c type lectin bind sites from mutually incompatible HIV1 viruses?

  • Mercy Kelvin 25 February 2020, 6:46 pm

    diabetes with Herbal medicine
    I was diagnosed of DIABETES for sometime, I have tried all possible means to get cured but all my effort proved abortive, until a friend of mine introduced me to a herbal doctor from Africa by name Dr Nelson, who prepare herbal medicine to cure all kind of diseases including hepatitis B, DIABETES and (HPV). When I contacted this herbal doctor via his email, he sent me his herbal medicine via courier service, when i received the herbal medicine he gave me step by step instructions on how to apply it, I took it as instructed after 3 weeks I went for check up and my result was negative. I am very grateful to DR Nelson may God bless him and continue to give him wisdom. I will continue sharing this testimonies,You can also Contact him on. drnelsonsalim10@ gmail. com
    or send him a whatsapp text on +2348116522191.

  • Steve Hawkins 28 February 2020, 4:39 pm

    Thank Heavens we know where to come for the really reliable info on all things virus! Prof.VR for President! 😉

    I’ve been wondering if anyone is working on making an antivirus from antibodies in the blood of the survivors of this coronavirus? What are the prospects of something like this being produced, to tide us over until a vaccine is ready? (I’m a bit behind with reading, so forgive me if you’ve covered this already.)

    Many thanks for all that you do.

  • Phil Meeks 28 February 2020, 9:50 pm

    We all need an antidote asap in case of a pandemic

  • Rossana 29 February 2020, 1:07 pm

    Techniques to manipulate a furin cleavage site is described in Longping et al., 2014, Journal of Virology (A Novel Activation Mechanism of Avian Influenza Virus H9N2 by Furin), ethical implications are discussed.
    It is unlikely that the O-linked glycan addition site would have emerged without immune pressure, which is absent in cell cultures, but present after experiments in vivo, I believe.
    However this virus would have had to be very similar to SARS-CoV-2, and no such isolate is known to be present in any laboratory. What about the bat virus RaTG13?

  • Anonymous 1 March 2020, 10:02 am
  • Rossana 1 March 2020, 6:24 pm

    Another interesting paper where to search for possible better matches:

    https://www.mdpi.com/1999-4915/11/4/379

  • Rossana 2 March 2020, 6:03 pm
  • Rossana 6 March 2020, 3:42 am

    What about producing a chimeric virus combing KP876546 or RaTG13 (or similar) with part of the RBD from CoVs isolated from pangolins (or similar) searching the missing link between bats and humans? Furin cleavage site might come in addition to enhance virulence.

  • Rossana 6 March 2020, 8:29 am

    In this work a possible procedure to generate chimeric viruses is described:
    Manipulation of the Coronavirus Genome Using Targeted RNA Recombination with Interspecies Chimeric Coronaviruses
    Cornelis A.M. de Haan, Bert Jan Haijema, Paul S. Masters, and Peter J.M. Rottier

  • Rossana 8 March 2020, 5:14 pm

    Another interesting information:
    Title: Methods and compositions for chimeric coronavirus spike proteins United States Patent 9884895 Inventors: Baric, Ralph (Haw River, NC, US) Agnihothram, Sudhakar (Ellicott City, MD, US) Yount, Boyd (Hillsborough, NC, US) Publication Date: 02/06/2018
    Assignee:
    The University of North Carolina at Chapel Hill (Chapel Hill, NC, US)

  • Rossana 8 March 2020, 5:24 pm

    I have now a question: why the famous RaTG13 sequence, from a sample collected in 2013 was submitted to NCBI only on 27-JAN-2020, after the outbreak of SARS-CoV-2?
    From NCBI:
    LOCUS MN996532 29855 bp RNA linear VRL 24-FEB-2020
    DEFINITION Bat coronavirus RaTG13, complete genome.
    ACCESSION MN996532
    VERSION MN996532.1
    KEYWORDS .
    SOURCE Bat coronavirus RaTG13
    ORGANISM Bat coronavirus RaTG13
    Viruses; Riboviria; Nidovirales; Cornidovirineae; Coronaviridae;
    unclassified Coronaviridae.
    REFERENCE 1 (bases 1 to 29855)
    AUTHORS Zhu,Y., Yu,P., Li,B., Hu,B., Si,H.R., Yang,X.L., Zhou,P. and
    Shi,Z.L.
    TITLE Direct Submission
    JOURNAL Submitted (27-JAN-2020) CAS Key Laboratory of Special Pathogens,
    Wuhan Institute of Virology, Center for Biosafety Mega-Science,
    Chinese Academy of Sciences, No. 44 Xiao Hong Shan, Wuhan, Hubei
    430071, China
    source 1..29855
    /organism=”Bat coronavirus RaTG13″
    /mol_type=”genomic RNA”
    /isolate=”RaTG13″
    /isolation_source=”fecal swab”
    /host=”Rhinolophus affinis”
    /db_xref=”taxon:2709072″
    /country=”China”
    /collection_date=”24-Jul-2013″

  • Rossana 16 March 2020, 2:14 pm

    I get now back to the mysterious sequence RaTG13 and the hint that I should look closer at the sequence KP876546 that is cited in the article:
    Ge, X., Wang, N., Zhang, W. et al. Coexistence of multiple coronaviruses in several bat colonies in an abandoned mineshaft. Virol. Sin. 31, 31–40 (2016). https://doi.org/10.1007/s12250-016-3713-9

    The sequence KP876546 in NCBI is very short (only 370 bp) and is defined as Rhinolophus bat coronavirus BtCoV/4991 partial RNA-dependent RNA polymerase. This sequence is also analysed in the article Characterization of a New Member of Alphacoronavirus with Unique Genomic Features in Rhinolophus Bats https://doi.org/10.3390/v11040379.

    I blasted the KP876546 sequence in NCBI and I got 100% identities with RaTG13 and 99% identities with MT039890 Severe acute respiratory syndrome coronavirus 2 isolate SNU01, complete genome (South Korea). Next closer sequence not from SARS-CoV2 is the pangolin sequence MT084071.
    To my opinion the sequence KP876546 could be the first evidence of the RaTG13 sequence or a sequence even closer to SARs-CoV2. In Ge et al., it is stated that the 370 bp sequence was prolonged of 816 bp and the spike protein was sequenced but this information for this sample has been not made public.

    I have found a publication on the comparison of KP876546 with SARS-CoV2 before that RaTG13 was submitted to NCBI:

    Liangjun Chen, Weiyong Liu, Qi Zhang, Ke Xu, Guangming Ye, Weichen Wu, Ziyong Sun, Fang Liu, Kailang Wu, Bo Zhong, Yi Mei, Wenxia Zhang, Yu Chen, Yirong Li, Mang Shi, Ke Lan & Yingle Liu (2020) RNA based mNGS approach identifies a novel human coronavirus from two individual pneumonia cases in 2019 Wuhan outbreak, Emerging Microbes & Infections, 9:1, 313-319, DOI: 10.1080/22221751.2020.1725399

    The author writes that: “further sequencing of the corresponding PCR product (from SARS-CoV2) surprisingly suggested that the virus discovered is more closely related to BtCoV/4991” (KP876546) “(97.35%) but not SARS-CoV. The genomes of the 2019-nCoV were further analysed to determine its origin and evolutionary history. Full genome comparisons indicated that 2019-nCoV is close to CoVs circulating in Rhinolophus (Horseshoe bats). For example, it shared 98.7% nucleotide identity to bat coronavirus strain BtCoV/4991 (GenBank KP876546, only 370 nt sequence of RdRp gene) and 87.9% nucleotide identity to bat CoV strain bat-SLCoVZC45 and bat-SL-CoVZXC21, indicating that it was quite divergent from the currently known human CoV, including SARS-CoV (79.7%). The close relationship with BtCoV/4991 is quite essential in tracing the potential reservoir host of 2019-nCoV. Unfortunately, the BtCoV/4991 sequence was only partial (373bp in length) and thus no comparisons can be made for the rest of genomes.”

    In the article:
    Zhou, P., Yang, X., Wang, X. et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273 (2020). https://doi.org/10.1038/s41586-020-2012-7

    where for the first time RaTg13 appears, it is written:

    “We then found a short RdRp region from a bat coronavirus termed BatCoV RaTG13 which we previously detected in Rhinolophus affinis from Yunnan Province showed high sequence identity to nCoV-2019. We did full-length sequencing to this RNA sample. Simplot analysis showed that nCoV-2019 was highly similar throughout the genome to RaTG13, with 96.2% overall genome sequence identity.”

    Interestingly, the article of Ge et al. is not part of the bibliography of this work. To my opinion further sequencing of KP876546 was so interesting that the results were kept secret and manipulations of this virus was carried over until the outbreak of SARS-CoV2.

    It is important not to forget that the sequence RaTG13 could have been manipulated before submitting it to NCBI.

  • Rosmarina 16 March 2020, 2:25 pm

    Regarding the mysterious sequence RaTG13, and the hint that I should look closer at the sequence KP876546 that is cited in the article:
    Ge, X., Wang, N., Zhang, W. et al. Coexistence of multiple coronaviruses in several bat colonies in an abandoned mineshaft. Virol. Sin. 31, 31–40 (2016). https://doi.org/10.1007/s12250-016-3713-9

    The sequence KP876546 in NCBI is very short (only 370 bp) and is defined as Rhinolophus bat coronavirus BtCoV/4991 partial RNA-dependent RNA polymerase. This sequence is also analysed in the article Characterization of a New Member of Alphacoronavirus with Unique Genomic Features in Rhinolophus Bats https://doi.org/10.3390/v11040379.

    I blasted the KP876546 sequence in NCBI and I got 100% identities with RaTG13 and 99% identities with MT039890 Severe acute respiratory syndrome coronavirus 2 isolate SNU01, complete genome (South Korea). Next closer sequence not from SARS-CoV2 is the pangolin sequence MT084071.
    To my opinion the sequence KP876546 could be the first evidence of the RaTG13 sequence or a sequence even closer to SARs-CoV2. In Ge et al., it is stated that the 370 bp sequence was prolonged of 816 bp and the spike protein was sequenced but this information for this sample has been not made public.

    I have found a publication on the comparison of KP876546 with SARS-CoV2 before that RaTG13 was submitted to NCBI:

    Liangjun Chen, Weiyong Liu, Qi Zhang, Ke Xu, Guangming Ye, Weichen Wu, Ziyong Sun, Fang Liu, Kailang Wu, Bo Zhong, Yi Mei, Wenxia Zhang, Yu Chen, Yirong Li, Mang Shi, Ke Lan & Yingle Liu (2020) RNA based mNGS approach identifies a novel human coronavirus from two individual pneumonia cases in 2019 Wuhan outbreak, Emerging Microbes & Infections, 9:1, 313-319, DOI: 10.1080/22221751.2020.1725399

    The author writes that: “further sequencing of the corresponding PCR product (from SARS-CoV2) surprisingly suggested that the virus discovered is more closely related to BtCoV/4991” (KP876546) “(97.35%) but not SARS-CoV. The genomes of the 2019-nCoV were further analysed to determine its origin and evolutionary history. Full genome comparisons indicated that 2019-nCoV is close to CoVs circulating in Rhinolophus (Horseshoe bats). For example, it shared 98.7% nucleotide identity to bat coronavirus strain BtCoV/4991 (GenBank KP876546, only 370 nt sequence of RdRp gene) and 87.9% nucleotide identity to bat CoV strain bat-SLCoVZC45 and bat-SL-CoVZXC21, indicating that it was quite divergent from the currently known human CoV, including SARS-CoV (79.7%). The close relationship with BtCoV/4991 is quite essential in tracing the potential reservoir host of 2019-nCoV. Unfortunately, the BtCoV/4991 sequence was only partial (373bp in length) and thus no comparisons can be made for the rest of genomes.”

    In the article:
    Zhou, P., Yang, X., Wang, X. et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273 (2020). https://doi.org/10.1038/s41586-020-2012-7

    where for the first time RaTg13 appears, it is written:

    “We then found a short RdRp region from a bat coronavirus termed BatCoV RaTG13 which we previously detected in Rhinolophus affinis from Yunnan Province showed high sequence identity to nCoV-2019. We did full-length sequencing to this RNA sample. Simplot analysis showed that nCoV-2019 was highly similar throughout the genome to RaTG13, with 96.2% overall genome sequence identity.”

    Interestingly, the article of Ge et al. is not part of the bibliography of this work. To my opinion further sequencing of KP876546 was so interesting that the results were kept secret and manipulations of this virus was carried over until the outbreak of SARS-CoV2.

  • John Smith 17 March 2020, 11:31 am

    So could BatCoV RaTG13 be the previously reported BtCOV/4991 (KP876546 partial cds for RdRp)? Wonder what would be leading virologists’ opinion on this?

  • Chris 19 March 2020, 8:55 am

    @Rosanna what do you think about the counterpoints in this article? : https://www.nature.com/articles/s41591-020-0820-9 It’s a shame there isn’t more discussion going on here, you do bring up interesting points

  • Rosmarina 19 March 2020, 1:10 pm
  • Chris 19 March 2020, 2:44 pm

    @Rosmarina: Thanks, I have. Sorry for miss-spelling your name! Just to be clear, I share your point of view. I am really surprised no one is mentioning this elsewhere (e.g. why RaTG13 sequencing shared only now and how reliable it is). Skepticism is good for science. Also apologies, I wasn’t specific. The Nature article I linked is from 17th of March, I don’t think it was mentioned in another topic. I was referring to the section “3. Selection during passage”:
    “Basic research involving passage of bat SARS-CoV-like coronaviruses in cell culture and/or animal models has been ongoing for many years in biosafety level 2 laboratories across the world27, and there are documented instances of laboratory escapes of SARS-CoV28. We must therefore examine the possibility of an inadvertent laboratory release of SARS-CoV-2.

    In theory, it is possible that SARS-CoV-2 acquired RBD mutations (Fig. 1a) during adaptation to passage in cell culture, as has been observed in studies of SARS-CoV11. The finding of SARS-CoV-like coronaviruses from pangolins with nearly identical RBDs, however, provides a much stronger and more parsimonious explanation of how SARS-CoV-2 acquired these via recombination or mutation.

    The acquisition of both the polybasic cleavage site and predicted O-linked glycans also argues against culture-based scenarios. New polybasic cleavage sites have been observed only after prolonged passage of low-pathogenicity avian influenza virus in vitro or in vivo. Furthermore, a hypothetical generation of SARS-CoV-2 by cell culture or animal passage would have required prior isolation of a progenitor virus with very high genetic similarity, which has not been described. Subsequent generation of a polybasic cleavage site would have then required repeated passage in cell culture or animals with ACE2 receptors similar to those of humans, but such work has also not previously been described. Finally, the generation of the predicted O-linked glycans is also unlikely to have occurred due to cell-culture passage, as such features suggest the involvement of an immune system.”

    I don’t think what they mention holds (“The acquisition of both the polybasic cleavage site and predicted O-linked glycans also argues against culture-based scenarios”) given the sources you that you cited.

  • John 20 March 2020, 8:36 am

    R and C: Thank you for these enlightening comments. I also have been exploring the linkages and was unable to find mention of RaTG13 prior to this year. The order of events is intriguing considering the previously reported viruses discovered in bats that were less likely to or did not have at all affinity for ACE2. Why if discovered years ago and the ACE2 affinity known was RaTG13 missing from genebank databases?

    Being an amateur at best I also wonder what the likelihood of the cleavage being introduced in gain of function work with something like RaTG13 not in vitro but in vivo? Is it unheard of to conduct GoF research in living organisms? If it is not I would speculate that in an attempt to further understand / craft drugs to combat / prevent another SARS analog that something akin to these changes could be introduced and therefore could have been leaked due to poor containment practices ala SARS outbreak from a couple of years ago in Beijing.