Study: Recombination-aware phylogenetic analysis sheds light on the evolutionary origin of SARS-CoV-2. Image Credit: NicoElNino/Shutterstock

Scientists study the evolutionary origin of SARS-CoV-2 using recombination-sensitive phylogenetic analysis

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and severe acute respiratory syndrome coronavirus (SARS-CoV) belong to the subgenus Sarbecovirus, which has a genomic length of approximately 30,000 base pairs. These viruses encode four structural proteins, namely spike (S), envelope (E), membrane (M), and nucleocapsid (N).

Study: Recombination-aware phylogenetic analysis sheds light on the evolutionary origin of SARS-CoV-2.  Image Credit: NicoElNino/Shutterstock
Study: Recombination-aware phylogenetic analysis sheds light on the evolutionary origin of SARS-CoV-2. Image Credit: NicoElNino/Shutterstock


Protein S contains a receptor binding domain (RBD) that binds to the host receptor i.e. angiotensin converting enzyme 2 (ACE2). RBD is the most variable part of coronaviruses which determines the host range of each member of this group.

To date, the evolutionary origin of SARS-CoV-2 has not been elucidated. Given that genomic sequence analysis indicated that the closest relative of this virus is the bat Sarbecovirus, there is a strong possibility that the SARS-CoV-2 strain infecting humans has emerged after an overflow of bats directly or via an intermediate host.

Similar to coronaviruses, Sarbecoviruses are highly recombinant and RBD sequence analysis of SARS-CoV-2 indicated that its origin involves genetic recombination. Of four key hypotheses about the origin of SARS-CoV-2, three include recombination.

Detection of recombination events is imperative to analyze the evolutionary history of the S gene, especially in SARS-CoV-2. Although several tools, such as SimPlot, RDP4, and GARD, have been used to identify the presence of recombination among Sarbecoviruses, they failed to estimate ancestral recombination plots (ARGs) that characterize reticular evolution triggered by genetic recombination. Thus, there are phylogenetic uncertainties in estimating recombination events.

Although ARG reconstruction based on sequence data is extremely difficult, several software packages, such as the BEAST2 Bacter software package, have been developed to overcome the challenge. Bacter’s ClonalOrigin model was used to estimate a new type of ARC, known as Ancestral Conversion Graphs (ACGs). This method can also be used to estimate recombination events within a phylogeny.

About the study

A currently published study on the Research Square* pre-release server under review for publication in Scientific reports used Bacter to identify recombination events with RBD regions of the Sarbecovirus genome. The main objective of this study is to determine the origin of the amino acids located in the variable loop of the RBD, which is responsible for the high affinity of SARS-CoV-2 for the ACE2 receptors of human cells.

A total of eighty-seven genomes were obtained from GenBank and GISAID databases. Sequences were aligned using MAFFT 7.475 default settings. The Gblocks program allowed the detection of misaligned amino acid positions. In this study, regions defined by SARS-CoV-2 RBD were extracted from the whole genome alignment and analyzed using the bayesian horizon coalescing model.

Study results

Phylogenetic analysis aware of RBD region recombination was performed in thirty-nine Sarbecoviruses. Interestingly, multiple recombination events were detected with posterior probability support greater than 0.5 related to different Rhinolophus species, indicating close interaction between the bat population.

Three Rhinolophus the species had overlapping geographical ranges, namely R. pusillus, R. sinicus and R. affinis. However, Linked to R. and R. little have been proposed to be possible hosts for SARS-CoV-2 progenitors.

A recombination event within the RBD involving RaTG13 supported the common ancestor hypothesis. This hypothesis states that the bat virus has lost all but one amino acid residue, which was present in the common ancestors of SARS-CoV-2, RaTg13, BANAL-103 and GD410721. The ancestral virus could have been a pathogen capable of infecting different mammalian hosts. This observation was validated by laboratory experiments which revealed that SARS-CoV-2 could bind to ACE2 receptors in cats, cattle and dogs.

The study results strongly support the natural emergence of the SARS-CoV-2 hypothesis. This virus underwent vertical evolution, combined with several recombination events, to become extremely efficient in human transmissions.

Notably, the recombination event could have occurred beyond the RBD region, i.e. the recombination event could have occurred at any point in the Sarbecovirus genome. The current study detected SARS-CoV-2 lineage-associated recombination events on the 5′ and 3′ ends of the S gene.


Mindful phylogenetic analysis of Sarbecovirus recombination has helped elucidate the evolutionary origin of SARS-CoV-2. Nevertheless, the computational approach used in this study limited the analysis of the full data set. The methodology used in this study allowed the analysis of only a fragment of the Sarbecovirus genomes. Therefore, in the future, more research associated with whole genomic sequence analysis is needed to better understand the recombination history of RBD.

#Scientists #study #evolutionary #origin #SARSCoV2 #recombinationsensitive #phylogenetic #analysis

Leave a Comment

Your email address will not be published. Required fields are marked *