Index of /download/homology/orthologs/Calb_Scer_by_inparanoid/Assem21orthologs
Name Last modified Size Description
Parent Directory -
CA_SC_orthologs.txt 09-Aug-2010 15:54 91K
inparanoid_output.08-09-2010.txt 09-Aug-2010 15:54 1.6M
orf_trans_all_Candida_Assembly_21_haploid.08-09-2010.fasta.gz 09-Aug-2010 15:54 2.1M GZIP compressed docume>
orf_trans_all_Saccharomyces.08-09-2010.fasta.gz 09-Aug-2010 15:54 2.4M GZIP compressed docume>
rejected_sequences.wormpep.08-09-2010.fasta.gz 09-Aug-2010 15:54 970 GZIP compressed docume>
wormpep.08-09-2010.fasta.gz 09-Aug-2010 15:54 5.3M GZIP compressed docume>
This directory contains the input sequences that were used to
determine orthology assignments between C. albicans Assembly 21 and
S. cerevisiae, using InParanoid version 3.0 (http://inparanoid.sbc.su.se/)
and the output file that was generated from InParanoid. In addition,
a file containing the processed output, listing orthology assigments
is also provided. The ortholog mappings are updated quaterly to ensure
that the predictions are based on the most up-to-date information.
To run InParanoid, the haploid complement of C. albicans proteins from
CGD was compared to the latest set of S. cerevisiae proteins from SGD,
and the set of C. elegans proteins from the Sanger Institute was used
as an outgroup. Stringent cutoffs were set: BLOSUM80 (instead of the
default BLOSUM62), and an InParanoid score of 100%.
Note, that the ortholog pairings were automatically generated, with no
curator intervention. Thus, there will occasionally be pairings that
may not occur with a different scoring matrix. In the interests of
automating the process, we do not intend to hand-curate the ortholog
pairs at this time.
Please also note that, in the Assembly 21-based mapping, orthologs are
not computed or displayed for the C. albcans ORFs that were present in
a prior assembly (Assembly 19 or 20) and subsequently deleted from Assembly 21. The
orthologs of these ORFs are present in the Assembly 20-based mapping file and/or
Assembly 19-based mapping file.
For C. albicans proteins that did not have an ortholog that meets
these criteria, we used BLASTp, using the same parameters as were used
by InParanoid (-F \"m S\" -M BLOSUM80) with an expectation value (E)
of 1e-5 to identify their best hit in the S. cerevisiae protein
complement. These best hits data are available here:
http://candidagenome.org/download/homology/best_hits/Calb_Scer_best_hits_Assem21.txt
in the same format as the files containing the ortholog data.
The following files are available:
orf_trans_all_Candida_Assembly_21_haploid.MM-DD-YYYY.fasta.gz - the C. albicans haploid protein complement
orf_trans_all_Saccharomyces.MM-DD-YYYY.fasta.gz - the S. cerevisiae protein complement
wormpep.MM-DD-YYYY.fasta.gz - the C. elegans protein set used as an outgroup
inparanoid_output.MM-DD-YYYY.txt - the raw output from InParanoid
rejected_sequences.wormpep.MM-DD-YYYY.fasta.gz - the sequences rejected due to the worm outgroup
CA_SC_orthologs.txt - the processed output, with the orf19 id, the SGDID,
and the gene/ORF name from SGD
The dates (indicated by MM-DD-YYYY) in the above file names represent the date when
the input files were downloaded and latest set of ortholog predictions generated.