Index of /download/Assembly20notes/Advisory

Icon  Name                                 Last modified      Size  Description
[PARENTDIR] Parent Directory - [TXT] ORFsWithinSuspectRegions.txt 2023-06-29 09:53 77K [TXT] ORFsWithinSuspectRegions_reduced.txt 2023-06-29 09:53 85K [   ] suspect_WO1_regions.gff 2023-06-29 09:53 26K [   ] suspect_WO1_regions_reduced.gff 2023-06-29 09:53 41K
This directory contains files with additional information related to the Assembly
20 Sequence Advisory.  For the latest information and status updates, please see:
http://www.candidagenome.org/help/Assembly20_Advisory.shtml

------------------------------------

suspect_WO1_regions.gff

Lists the regions that were flagged by the BRI as potentially 
derived from WO-1, and chromosomal coordinates of these regions.  

This file is in Generic Feature Format 
(GFF, http://www.sequenceontology.org/gff3.shtml).  

------------------------------------

ORFsWithinSuspectRegions.txt

Lists the ORFs and non-ORF features (e.g., tRNA) that are affected by the suspect
regions (i.e., fully or partly contained within a suspect region).  Includes
chromosomal coordinates of the ORF/feature and the suspect region that overlaps
it, with additional descriptive information about each ORF.

This file is in tab-delimited text format.

------------------------------------

suspect_WO1_regions_reduced.gff

Lists the regions and their chromosomal coordinates that are potentially derived from 
WO-1.  The BRI identified as "suspect" the gaps between contigs, which may have been 
filled with sequence from WO-1, plus 1 kb regions flanking each gap, in which the BRI
may have made changes to the SC5314 sequence based on WO-1 sequence.  CGD compared the 
1kb flanking parts of each suspect region to Contig19 sequences, and reduced the size 
of the suspect region where the sequence was clearly the same as the original sequence 
from SC5314.   

Specifically, this was accomplished as follows:  Beginning from the side 
of the suspect flanking region furthest from the gap (the side that abuts the non-suspect 
region of the contig), a region of 100 bp was compared to the corresponding Assembly 19 
contig by BLAST.  If the sequence matched perfectly, the region was considered "no longer 
suspect," and the adjoining 100 bp region of the suspect flanking region was compared 
to the Assembly 19 contig.  Iterations continued, and the suspect region was reduced in 
100 bp increments, as long as the 100 bp section of the Assembly 20 flanking region and 
the corresponding Assembly 19 contig showed 100% identity.  If any sequence discrepancy 
was encountered, the entire 100 bp section of the flanking region, and all of the flanking 
region remaining between the section of the flanking region and the gap, remains classified 
as "suspect." The section of the flanking region which aligns perfectly with the contig has 
been removed from the suspect list. These are the regions that now appear with the label 
"Suspect WO1" in the Genome Browser on the CGD web site.

------------------------------------

ORFsWithinSuspectRegions_reduced.txt

Lists the ORFs and non-ORF features (e.g., tRNA) that are affected by the suspect
regions (i.e., fully or partly contained within a suspect region) after the
regions have been reduced by CGD, as described above.  Includes chromosomal coordinates of the
ORF/feature and the suspect region that overlaps it, with additional descriptive
information about each ORF.

This file is in tab-delimited text format.

------------------------------------