This directory contains data pertaining to orthology assignments
among Candida species, and between Candida species and other organisms.
The individual pairwise mappings are contained within the subdirectories.



The ortholog mappings among Candida strains, and between Candida
strains and S. cerevisiae, are derived from the curated syntenic
groupings at the Candida Gene Order Browser (CGOB;

The complete set of CGOB groups is contained in the file
All_Species_Orthologs_from_CGOB.txt (also known as cgob_pillars.txt)
in this directory. This file includes C. glabrata information from the
Yeast Gene Order Browser (YGOB; in
addition to the other species from CGOB.  The current list of species
(as of September, 2012) is:

Candida albicans SC5314	
Candida albicans WO-1	
Candida dubliniensis CD36 	
Candida tropicalis MYA-3404	
Candida parapsilosis CDC317	
Candida orthopsilosis Co 90-125	
Lodderomyces elongisporus NRLL YB-4239	
Debaryomyces hansenii CBS767	
Pichia stipitis Pignal	
Candida tenuis NRRL Y-1498	
Spathaspora passalidarum NRRL Y-27907	
Candida guilliermondii ATCC 6260	
Candida lusitaniae ATCC 42720	
Saccharomyces cerevisiae S288C	
Candida glabrata CBS138   

*** Note: to view the ortholog mapping between any pair (or more) of
    the species, you may open the file using spreadsheet software and
    select the columns containing the species of interest.  (Any blank
    lines may then be removed efficiently using the sort function.)

In this tab-delimited file,
each column represents a different strain (identified in the header
row). Each row below the header represents a syntenic/orthologous
group, and contains the gene identifiers for all the strains included
in that group. 

Note that the file includes strains that are not
yet in the curated set of strains at CGD. The corresponding protein
sequences for all the strains are available from the CGOB website.
We thank Sarah Maguire and Geraldine Butler of the Conway Institute,
University College Dublin, for making the CGOB data available to CGD.


Please note, regarding the issue of orthology and shared function, we
use the standard evolutionary definition of orthology: genes descended
from a common ancestral gene sequence. This definition does not
require that the genes are functionally equivalent after the
speciation event, although it will often be the case.

The following ortholog mappings are provided in their respective subdirectories:

C. albicans SC5314 Assembly 21 (haploid protein complement) against S. cerevisiae S288C
C. albicans SC5314 Assembly 21 against C. glabrata CBS138
C. albicans SC5314 Assembly 21 against C. parapsilosis CDC317
C. albicans SC5314 Assembly 21 against C. dublinensis
C. albicans SC5314 Assembly 21 against S. pombe

C. glabrata CBS138 against S. cerevisiae S288C
C. glabrata CBS138 against C. albicans SC5314 Assembly 21
C. glabrata CBS138 against C. parapsilosis CDC317
C. glabrata CBS138 against S. pombe

C. parapsilosis CDC317 against S. cerevisiae S288C
C. parapsilosis CDC317 against C. albicans SC5314
C. parapsilosis CDC317 against C. glabrata CBS138
C. parapsilosis CDC317 against S. pombe

Please note that orthologs are not computed or displayed for ORFs that
were present in a previous version of the reference annotation, but
which are designated as "deleted" in the current set.

The format of the pairwise ortholog and Best Hit files (updated
September 2012) contains three tab-delimited columns for each
For each organism, the columns display:
- the systematic name of the gene
- the standard/genetic name of the gene (if one exists)
- the database identifier for the gene


The ortholog mappings between Candida strains and S. pombe are made
by pairwise comparisons using the InParanoid software, developed at
the Karolinska Institutet (
C. elegans proteins are used as outgroup. Stringent cutoffs are set:
BLOSUM80 (instead of the default BLOSUM62), and an InParanoid score of
100% (parameters: -F \"m S\" -M BLOSUM80). Note, that the InParanoid
ortholog pairings are automatically generated, with no curator intervention.
Thus, there will occasionally be pairings that may not occur using different
methods or with a different scoring matrix.


The file Calb_Cdub_positional_orthologs_Assem20.txt contains ortholog
mappings between C. albicans and C. dubliniensis, and was provided to
CGD by John Gamble and Matthew Berriman at the Wellcome Trust Sanger
Institute ( The C. dubliniensis orthologs
are manually curated positional orthologs based on synteny. Please note:
This is a static file that is not updated.  Mappings were based on
C. albicans Assembly 20.


For proteins that do not have an ortholog in a given species identified 
using these methods, we use BLASTP to identify a best hit in that species.
These results are available for download at: