Index of /download/gff/C_albicans_SC5314/Assembly22
Name Last modified Size Description
Parent Directory -
C_albicans_SC5314_version_A22-s08-m01-r01_intergenic.gff 2024-12-08 07:00 2.6M
C_albicans_SC5314_version_A22-s08-m01-r01_features_with_chromosome_sequences.gff.gz 2024-12-08 07:00 9.7M
C_albicans_SC5314_version_A22-s08-m01-r01_features.gtf 2024-12-08 07:00 4.2M
C_albicans_SC5314_version_A22-s08-m01-r01_features.gff 2024-12-08 07:00 13M
C_albicans_SC5314_haplotype_variations.gff 2023-06-29 09:53 18M
C_albicans_SC5314_A22_current_intergenic.gff 2024-12-08 07:00 2.6M
C_albicans_SC5314_A22_current_features_with_chromosome_sequences.gff.gz 2024-12-08 07:00 9.7M
C_albicans_SC5314_A22_current_features.gtf 2024-12-08 07:00 4.2M
C_albicans_SC5314_A22_current_features.gff 2024-12-08 07:00 13M
A22_Unannotated_transcripts_Tuch_et_al.gff 2023-06-29 09:53 534K
A22_Unannotated_transcripts_Sellam_et_al.gff 2023-06-29 09:53 736K
A22_Unannotated_transcripts_Bruno_et_al.gff 2023-06-29 09:53 262K
A22_Jones_PMID_15123810_Polymorphisms.vcf 2023-06-29 09:53 11M
A22_Jones_PMID_15123810_Polymorphisms.gff 2023-06-29 09:53 15M
A22_Historic_Assemblies.gff 2023-06-29 09:53 14M
A22_ForcheSNPs.gff 2023-06-29 09:53 127K
5prime_utr_intron_A22.gff 2023-06-29 09:53 12K
This directory contains the downloadable CGD files in the Generic
Feature Format (GFF). These files describe features in CGD, including
chromosomes, ORFs, CDSs, introns, sequence gaps, intergenic regions, etc.
We also provide annotation of protein-coding genes in Gene Transfer Format (GTF).
Please see http://www.sequenceontology.org/gff3.shtml for a detailed description
of the Generic Feature Format (GFF).
Please see http://mblab.wustl.edu/GTF22.html for a description
of the Gene Transfer Format (GTF).
The notation "version_A22_sXX-mYY-rZZ" in the filenames indicates the genome version
to which data in the file corresponds. Detailed explanation about the genome
version notation can be found at: http://www.candidagenome.org/help/SequenceHelp.shtml#versions
Information pertaining to each version update for C. albicans SC5314 Assembly 22 can be found at:
http://www.candidagenome.org/cgi-bin/genomeVersionHistory.pl?seq_source=C.%20albicans%20SC5314%20Assembly%2022
Files with "current" in their names are provided as stable filenames for
automated downloads. They are identical to (technically, symbolic links to) the
corresponding versioned files.
Files for previous genome versions are available in the archive sub-directory,
http://www.candidagenome.org/download/gff/C_albicans_SC5314/archive/.
The following Assembly 22 files are updated weekly:
C_albicans_SC5314_version_A22-sXX-mYY-rZZ_features.gff
This file contains the current CGD annotation of all features in GFF
based on Assembly 22 of the C. albicans SC5314 genome sequence.
C_albicans_SC5314_version_A22-sXX-mYY-rZZ_features.gtf
This file contains the current CGD annotation of protein-coding genes in GTF
based on Assembly 22 of the C. albicans SC5314 genome sequence.
C_albicans_SC5314_version_A22-sXX-mYY-rZZ_features_with_chromosome_sequences.gff.gz
This file contains the current CGD annotation and the current genomic sequence
of all chromosomes of the genome sequence. The annotations in this file and the
previous file are the same. The chromosome sequences are specified
in the "##FASTA" section at the end of this file according to GFF3 file format
specifications (see http://www.sequenceontology.org/gff3.shtml).
C_albicans_SC5314_version_A22-sXX-mYY-rZZ_intergenic.gff
This file lists the intergenic regions between coding regions in the
chromosomes. This file also contains lengths of these intergenic sequences
and GC and AT content of each intergenic region (percent GC and percent AT).
The following files map special features or historic assemblies to current assemblies.
The mappings are only updated following major sequence updates to the current assemblies.
These files are not converted to the canonical GFF format due to the historic nature
of the data represented.
5prime_utr_intron_A22.gff
This file contains annotation of the 5-prime UTR introns described in
Mitrovich et al. (2007) "Computational and experimental approaches double
the number of known introns in the pathogenic yeast Candida albicans."
Genome Res 17:492-502.
A22_ForcheSNPs.gff
This file contains all the SNP locations from Forche A, Magee PT, Magee BB,
May G "Genome-wide single-nucleotide polymorphism map for Candida albicans."
Eukaryotic Cell. 2004 Jun;3(3):705-14. SNP locations were mapped to Assembly 19
contigs using the original marker sequences.
A22_Historic_Assemblies.gff
This file contains mappings of historic assemblies to Assembly 22 chromosomes.
BLAST analysis was performed to map Contigs and ORF sequences from each of the
older assemblies to the Assembly 21 chromsomes, and liftOver was used to map
to Assembly 22.
A22_Jones_PMID_15123810_Polymorphisms.gff
This file contains all polymorphisms discussed in Jones et al. (2004) "The Diploid
Genome of Candida albicans." PNAS 101:7329-7334. Polymorphism locations were mapped
to Assembly 21 using 50 bp flanking sequence on both sides of each polymorphism to
locate exact matches using BLAST. Locations for "Deletion" type polymorphism indicates
the region that is deleted, including the start and stop coordinates. Locations for
"Insertion" type polymorphism indicate that an insertion has been made in the homolog
sequence immedeatly AFTER the location specified. Locations for "Substitution" type
polymorphisms indicate the site of a single nucleotide substitution.
The liftOver tool was used to map this to Assembly 22.
A22_Jones_PMID_15123810_Polymorphisms.vcf
Same as above, converted to vcf format.
A22_Unannotated_transcripts_Bruno_et_al_2010.gff
This file contains novel transcriptionally active regions detected in high-throuhgput
sequencing of cDNA (RNA-seq) under several environmental conditions, described in
Bruno VM, Wang Z, Marjani SL, Euskirchen GM, Martin J, Sherlock G, Snyder M (2010)
"Comprehensive annotation of the transcriptome of the human fungal pathogen Candida
albicans using RNA-seq." Genome Res 20(10):1451-8
A22_Unannotated_transcripts_Sellam_et_al.gff
This file contains novel, unannotated transcripts detected in tiling microarray experiments
from Sellam A, Hogues H, Askew C, Tebbji F, van het Hoog M, Lavoie H, Kumamoto CA, Whiteway M,
Nantel A "Experimental annotation of the human pathogen Candida albicans coding and noncoding
transcribed regions using high-resolution tiling arrays." Genome Biol 2010; 11(7):R71.
A22_Unannotated_transcripts_Tuch_et_al_2010.gff
This file contains novel, unannotated transcriptionally active regions detected by strand-specific
sequencing of RNA from white and opaque cells, described in Tuch BB, Mitrovich QM, Homann OR,
Hernday AD, Monighetti CK, De La Vega FM, Johnson AD (2010) "The transcriptomes of two heritable
cell types illuminate the circuit governing their differentiation." PLoS Genet 6(8)