Index of /download/gff

Icon  Name                         Last modified      Size  Description
[DIR] Parent Directory - [DIR] C_albicans_WO-1/ 11-Dec-2013 12:10 - [DIR] C_guilliermondii_ATCC_6260/ 11-Dec-2013 12:13 - [DIR] C_lusitaniae_ATCC_42720/ 11-Dec-2013 12:15 - [DIR] C_tropicalis_MYA-3404/ 11-Dec-2013 12:18 - [DIR] L_elongisporus_NRLL_YB-4239/ 11-Dec-2013 12:20 - [DIR] C_orthopsilosis_Co_90-125/ 11-Dec-2013 15:07 - [DIR] D_hansenii_CBS767/ 11-Dec-2013 15:17 - [DIR] C_glabrata_CBS138/ 05-Feb-2017 07:06 - [DIR] C_dubliniensis_CD36/ 05-Feb-2017 07:07 - [DIR] C_parapsilosis_CDC317/ 12-Mar-2017 07:06 - [DIR] C_albicans_SC5314/ 25-Jun-2017 07:03 -
Sub-directories within this directory contain data for various Candida 
species in the Generic Feature Format (GFF). These files describe features, 
including chromosomes, ORFs, CDSs, introns, sequence gaps, intergenic regions, etc.

Please see http://www.sequenceontology.org/gff3.shtml for a detailed description 
of the Generic Feature Format (GFF).


C_albicans_SC5314/ contains the data for the current CGD annotation of the 
C. albicans genome sequence, along with the historic C. albicans genome assembly 
mapping files, gap regions in Assembly 21, introns in 5' UTRs from Mitrovich et al. (2007),
SNPs from Forche et al. (2004), and unannotated transcripts detected in various 
high-throughput sequencing projects. The current annotations files are updated weekly.


C_dubliniensis_CD36/ contains the data for the current CGD annotation of the 
C. dubliniensis genome sequence.  These files are updated weekly.

  
C_glabrata_CBS138/ contains the data for the current CGD annotation of the 
C. glabrata genome sequence.  These files are updated weekly.

  
C_parapsilosis_CDC317/ contains the data for the current CGD annotation of the 
C. parapsilosis genome sequence.  These files are updated weekly.  

The remaining directories contain annotation information for
Candida-related species and strains which are not actively 
curated by CGD at this time.  These files are updated sporadically,
as new gene models become available.

#############################################################################


GFF files are in the canonical GFF3 format specifications as of October 2012.

Introns are no longer listed, but can be inferred by comparing exons to
transcripts.

5' and 3' untranslated regions (UTRs) are no longer listed, but can be
inferred by comparing CDS to exons.

Below are listed the relationship heirarchies for various feature types.
The type of a given feature is given in column 3, and its parent feature is
given in column 9.

Protein-coding genes:
Type = 'gene', parent of 'mRNA'
Type = 'mRNA', parent of 'exon'
Type = 'exon', parent of 'CDS'
Type = 'CDS', no children

Upstream open reading frame (uORF):
Listed as CDS of parent gene, with column 9 attribute "parent_feature_type=uORF"

Non-coding RNAs (tRNA, rRNA, snRNA, etc.):
Type = 'gene', parent of 'tRNA' (or rRNA, snRNA, etc.)
Type = 'tRNA' (or rRNA, snRNA, etc.), parent of 'exon'
Type = 'exon', no children

Non-transcribed feature (repeat_region, centromere, etc):
Type = repeat_region (or centromere, etc.), no children

Pseudogene of protein-coding genes:
Type = 'pseudogene', parent of 'mRNA'
Type = 'mRNA', parent of 'exon'
Type = 'exon', parent of 'CDS'
Type = 'CDS', no children

Pseudogene of non-coding RNAs (tRNA, rRNA, snRNA, etc.):
Type = 'pseudogene', parent of 'tRNA' (or rRNA, snRNA, etc.)
Type = 'tRNA' (or rRNA, snRNA, etc.), parent of 'exon'
Type = 'exon', no children