Index of /download/gff

Icon  Name                         Last modified      Size  Description
[DIR] Parent Directory - [DIR] L_elongisporus_NRLL_YB-4239/ 11-Dec-2013 12:20 - [DIR] D_hansenii_CBS767/ 11-Dec-2013 15:17 - [DIR] C_tropicalis_MYA-3404/ 11-Dec-2013 12:18 - [DIR] C_parapsilosis_CDC317/ 01-Aug-2017 09:45 - [DIR] C_orthopsilosis_Co_90-125/ 11-Dec-2013 15:07 - [DIR] C_lusitaniae_ATCC_42720/ 11-Dec-2013 12:15 - [DIR] C_guilliermondii_ATCC_6260/ 11-Dec-2013 12:13 - [DIR] C_glabrata_CBS138/ 24-Sep-2017 07:01 - [DIR] C_dubliniensis_CD36/ 01-Aug-2017 09:33 - [DIR] C_albicans_WO-1/ 11-Dec-2013 12:10 - [DIR] C_albicans_SC5314/ 31-Jul-2017 16:25 -
Sub-directories within this directory contain data for various Candida 
species in the Generic Feature Format (GFF). These files describe features, 
including chromosomes, ORFs, CDSs, introns, sequence gaps, intergenic regions, etc.
We also provide annotation of protein-coding genes in Gene Transfer Format (GTF).

Canonical GFF3 format is described below. Please see http://www.sequenceontology.org/gff3.shtml
for a detailed description.

Please see http://mblab.wustl.edu/GTF22.html for a description
of the Gene Transfer Format (GTF).

C_albicans_SC5314/ contains the data for the current CGD annotation of the 
C. albicans genome sequence (Assembly 22), as well as for two previous assemblies
(21 and 19). In addition it contains mappings to older genome assemblies, plus data
for introns in 5' UTRs from Mitrovich et al. (2007), SNPs from Forche et al. (2004),
and unannotated transcripts detected in various high-throughput sequencing projects.
Assembly 22 annotations files are updated weekly.


C_dubliniensis_CD36/ contains the data for the current CGD annotation of the 
C. dubliniensis genome sequence.  These files are updated weekly.

  
C_glabrata_CBS138/ contains the data for the current CGD annotation of the 
C. glabrata genome sequence.  These files are updated weekly.

  
C_parapsilosis_CDC317/ contains the data for the current CGD annotation of the 
C. parapsilosis genome sequence.  These files are updated weekly.  


The remaining directories contain annotation information for
Candida-related species and strains which are not actively 
curated by CGD at this time.  These files are updated sporadically,
as new gene models become available.

#############################################################################


GFF files are in the canonical GFF3 format specifications as of October 2012.

Introns are no longer listed, but can be inferred by comparing exons to
transcripts.

5' and 3' untranslated regions (UTRs) are no longer listed, but can be
inferred by comparing CDS to exons.

Below are listed the relationship heirarchies for various feature types.
The type of a given feature is given in column 3, and its parent feature is
given in column 9.

Protein-coding genes:
Type = 'gene', parent of 'mRNA'
Type = 'mRNA', parent of 'exon'
Type = 'exon', parent of 'CDS'
Type = 'CDS', no children

Upstream open reading frame (uORF):
Listed as CDS of parent gene, with column 9 attribute "parent_feature_type=uORF"

Non-coding RNAs (tRNA, rRNA, snRNA, etc.):
Type = 'gene', parent of 'tRNA' (or rRNA, snRNA, etc.)
Type = 'tRNA' (or rRNA, snRNA, etc.), parent of 'exon'
Type = 'exon', no children

Non-transcribed feature (repeat_region, centromere, etc):
Type = repeat_region (or centromere, etc.), no children

Pseudogene of protein-coding genes:
Type = 'pseudogene', parent of 'mRNA'
Type = 'mRNA', parent of 'exon'
Type = 'exon', parent of 'CDS'
Type = 'CDS', no children

Pseudogene of non-coding RNAs (tRNA, rRNA, snRNA, etc.):
Type = 'pseudogene', parent of 'tRNA' (or rRNA, snRNA, etc.)
Type = 'tRNA' (or rRNA, snRNA, etc.), parent of 'exon'
Type = 'exon', no children