Finding Shared Genes Between Species
By Claire Asher, on 7 May 2015
a guest blog by Natasha Glover, written for the 2015 Write About Research Competition.
Did you know we share approximately 98% of our protein-coding genes with chimpanzees? Chimps are commonly referred to as our evolutionary “cousins.” This makes sense to anyone who’s seen Planet of the Apes – chimps and humans share many of the same physical characteristics. But did you also know that we share approximately 90% of our genes with mice? About 70% of our genes with zebrafish? Even about 15% of human genes can be found in fruit flies!
These shared genes are evidence of evolution from a common ancestor and the relatedness of all life on Earth. The shared genes are called homologous genes, or genes which share a common ancestry either between or within species. They can be further classified into two main categories: orthologs, which are pairs of genes that started diverging through speciation, and paralogs, which are pairs of genes that started diverging through gene duplication. Finding and studying homologous genes is important, because the same gene in two different species (orthologs) are more likely to have the same cellular function than two duplicated genes (paralogs).
This brings us to the concept of model organisms, which are representative species studied by many scientists from which the knowledge learned from them can be transferred to other, closely related species. For example, this is why researchers experiment on mice instead of humans to test new drugs. Orthologs between mice and humans allow for observing basic human biological processes in mice, and then transferring the knowledge to humans. Orthologs are also applicable to agricultural research. Imagine if a scientist finds an interesting gene in the model plant Arabidopsis thaliana, perhaps a gene controlling an important agronomical trait like seed size, flowering time, or tolerance to drought. It would be useful to find the ortholog of this gene in another economically important crop such as rice, wheat or soybean in order to exploit the trait of interest.
Homologous genes correspond to shared attributes between species. We can identify the shared traits just by looking at them. But how can we identify orthologs and paralogs at the molecular level, that is, how do we identify these genes by analyzing their sequence? It’s important to keep in mind that the concepts of homology are purely from an evolutionary perspective. Thus, we can deduce orthologous and paralogous relationships between pairs of genes using a phylogenetic tree (See Box 1).
Box 1. This tree represents the relationship between 5 gene sequences. Each node of the tree either represents a speciation (S1 and S2) or duplication event (star). Thus to know the relation between pairs of genes, you just have to trace them back to their shared node (closest common ancestral copy). In this example, the blue genes between dog and human are orthologous to each other (because they trace back to a speciation event). The red dog and red human genes are also orthologous to each other. However, all the blue genes are paralogous to all the red genes because they trace back to a duplication node. All of these red and blue genes are orthologous to the black (frog) gene, an example of a many:1 relationship.
Evolutionary scenarios and relationships become complicated when dealing with many lineage-specific gene duplications and losses. In plants especially, homologous relationships are hard to infer because of their highly complex genomes compared to animals. Plant genomes tend to be much larger and much more duplicated than animal genomes, making ortholog inference in plants very challenging.
Several algorithms and tools are available to predict homologous relationships between genomes. OMA (Orthologous Matrix) is one of them. It’s a method and database for the inference of orthologs and paralogs among completely sequenced genomes. Launched by Dessimoz and colleagues in 2004, OMA has steadily increased the number of species in the database to 1706, including both prokaryotes and eukaryotes. With its many genomes and accurate orthology prediction, OMA is a great starting point for evolutionary biology and genomics analyses. Recently OMA has undergone its 17th browser release to include a website facelift, gene function prediction, and more support for plant genomes. For plants in particular, there is now over 450 million years of evolution represented with the orthology prediction between the species Selaginella moellendorffii (representing early vascular plants) and Physcomitrella patens (representing the non-vascular plants).
The burst of larger, more complex sequenced genomes in the past decade provides a unique challenge in terms of orthology prediction. OMA tackles this problem, and provides a valuable resource to the scientific community. So, want to find out how many genes humans have in common with yeast? Try OMA.
- Altenhoff AM, Dessimoz C. Inferring Orthology and Paralogy. In: Anisimova M, editor. Evolutionary Genomics. Totowa, NJ: Humana Press; 2012. pp. 259–279. Available: http://discovery.ucl.ac.uk/1395519/
- Altenhoff AM, Škunca N, Glover N, Train C-M, Sueki A, Piližota I, et al. The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements. Nucleic Acids Res. 2014; gku1158. doi:10.1093/nar/gku1158
Natasha Glover received her Bachelor of Science and PhD from the Department of Crop and Soil Environmental Science at Virginia Tech in the U.S. Her PhD was focused on plant genomics and biotechnology. She received a Marie Curie International Incoming Fellowship for her first postdoc and worked in Clermont-Ferrand, France at the Institut Nationale de la Recherche Agronomique for 3 years. There, she concentrated on computational biology, with a focus on synteny and duplication in the wheat genome. Natasha is a currently a postdoc based at Bayer CropScience in Ghent, Belgium as part of the Marie Curie PLANT FELLOWS program. Her co-advisor is Dr. Christophe Dessimoz in the department of Genetics, Evolution, and Environment at UCL.