Index of /download/Assembly20notes

Icon  Name                                         Last modified      Size  Description
[DIR] Parent Directory - [DIR] Advisory/ 12-Jul-2012 10:19 - [TXT] ClassificationPerGene.txt 02-Feb-2009 10:34 634K [   ] ClassificationTablePerGene.xls 02-Feb-2009 10:34 2.2M [TXT] DeletedFromAssembly20.txt 02-Feb-2009 10:34 6.6K [TXT] MergedORFs.txt 02-Feb-2009 10:34 8.7K [   ] Missing_contigs.xls 02-Feb-2009 10:34 92K [TXT] NewInAssembly20.txt 02-Feb-2009 10:34 211 [TXT] NoSeqChangeInAssembly20.txt 02-Feb-2009 10:34 57K [TXT] OrfsAtEndOfContigInAssembly19.txt 02-Feb-2009 10:34 774 [TXT] OrfsWithInternalStopCodonsInAssembly20.txt 02-Feb-2009 10:34 322K [TXT] OrfsWithIntrons_Assembly19.txt 02-Feb-2009 10:34 2.9M [TXT] OrfsWithIntrons_Assembly19_List.txt 02-Feb-2009 10:34 44K [TXT] OrfsWithIntrons_Assembly20.txt 02-Feb-2009 10:34 4.3M [TXT] OrfsWithIntrons_Assembly20_List.txt 02-Feb-2009 10:34 47K [TXT] OrfsWithNonAUGstartInAssembly20.txt 02-Feb-2009 10:34 6.7K [TXT] OrfsWithPartialTerminalCodonInAssembly20.txt 02-Feb-2009 10:34 351K [TXT] OrfsWithoutEndStopCodonInAssembly20.txt 02-Feb-2009 10:34 616K [TXT] SimpleSeqChangesInAssembly20.txt 02-Feb-2009 10:34 394K [TXT] SplitContig19ToChromosomes.txt 02-Feb-2009 10:34 2.0K [TXT] SynonymousOnlyChangeInAssembly20.txt 02-Feb-2009 10:34 185K [TXT] complexSeqChangesInAssembly20.txt 02-Feb-2009 10:34 1.1M [TXT] intronChangesInAssembly20.txt 02-Feb-2009 10:34 3.3M [TXT] intronChangesInAssembly20_OrfList.txt 02-Feb-2009 10:34 4.2K [TXT] problemContigMappingToChr.txt 02-Feb-2009 10:34 1.4K [TXT] problemORFInEMBLfiles.txt 02-Feb-2009 10:34 17K
Assembly 20 of the C. albicans sequence was a collaborative effort of
groups at the Biotechnology Research Institute of the National
Research Council, Canada; the University of Minnesota, USA; and Chiba
University, Japan.


** PLEASE NOTE: Assembly 20 Sequence Advisory
** 
** posted October 19, 2006, updated October 25, 2006
** 
** The collaborative group who generated Assembly 20 has discovered that 
** the sequence traces that they had been using to fill some of the gaps 
** and determine overlaps between Assembly 19 contigs were derived from 
** strain WO-1, rather than from the reference strain, SC5314. 
** 
** Please see http://www.candidagenome.org/help/Assembly20_Advisory.shtml
** for the latest information and status updates.


Whereas Assembly 19 is a diploid assembly that includes both alleles
of each gene for cases in which they show significant sequence
differences, Assembly 20 is a haploid assembly: in the production of
Assembly 20, updates to Assembly 19 have been made in only one allele
of each pair, though in some cases, genes may have been assembled from
data from the two different alleles. The chromosomes may be thought
of as 'reftigs', where they are mosaics of haplotypes, rather than
representative of a single haploid genome in the sequenced strain.
The process used to generate this assembly is described on the project
web site at URL:
http://candida.bri.nrc.ca/candida/alignments/index.cfm.  The files
generated by these groups are posted at URL:
http://candida.bri.nrc.ca/alignments/editedEMBL/final.  All of the
Assembly 20 data in CGD come from these EMBL-format files.

The Assembly 20 files were processed at CGD to identify and classify
changes that occurred between Assembly 19 and Assembly 20, and to
identify other features in which users may be interested (e.g.,
introns), as described in detail below.  Files containing all of these
analyses (ORF lists, sequences, and/or alignments) are available for
download from CGD
(http://www.candidagenome.org/DownloadContents.shtml).

Assembly 20 ORF Classification:
The entire classification is summarized in the file "ClassificationTablePerGene.xls."

Sequence comparisons between Assembly 19 and Assembly 20 were
performed.  Each ORF from Assembly 20 has been classified according to
how it changed between Assembly 19 and Assembly 20.  The
classifications for each gene appear on its CGD Locus page, next to
the "Feature type" heading.  In addition, ORFs in each category have
explanatory Locus History notes on the CGD web site.

The source of all Assembly 20 information are the EMBL-format files posted at 
http://candida.bri.nrc.ca/alignments/editedEMBL/final/
Ca20Chr1.03s02.embl  07/21/2006 11:30:42 AM
Ca20Chr2.03s03.embl  05/10/2006 02:09:50 PM
Ca20Chr3.03s01.embl  05/11/2006 09:58:18 AM
Ca20Chr4.02s01.embl  05/11/2006 10:56:56 AM
Ca20Chr5.02s01.embl  05/11/2006 11:10:14 AM
Ca20Chr6.04s04.embl  05/11/2006 01:34:54 PM
Ca20Chr7.02s02.embl  05/11/2006 01:44:44 PM
Ca20ChrR.04s03.embl  05/09/2006 02:16:36 PM
Ca20FinalMay11.zip 05/11/2006 01:47:06 PM

The source of all Assembly 19 sequence information is the Candida Genome Database, July 2006.

Protein and nucleotide local sequence alignments were performed using
bl2seq from the BLAST suite from NCBI.  Global nucleotide sequence
alignments were performed with the MUSCLE (multiple sequence
comparison by log-expectation) software, available at the URL:
http://www.drive5.com/muscle/

The following files are available for download:

1)  New ORFs in Assembly 20
Criteria:  The orf19 name is new in Assembly 20.  This assignment was made computationally.
File contains:  list of all ORFs that are new in Assembly 20, not present in Assembly 19
File name: NewInAssembly20.txt


2) ORFs deleted from Assembly 20 Criteria: The orf19 name is present
in Assembly 19 and it is not present in Assembly 20.  This assignment
was made computationally.  File contains: list of all ORFs that were
removed during preparation of Assembly 20; they were present in
Assembly 19 File name: DeletedFromAssembly20.txt

Note: A subset of the ORFs on this list were subsumed by, or "merged
into" another ORF in Assembly 20.  Some merged ORFs were combined with
a neighboring ORF on the Contig from Assembly 19 (Contig-19).  In
other cases, an ORF was merged with an ORF that was not adjacent to it
in Assembly 19; that is, the Contig-19s containing the two ORFs were
not associated with each other in Assembly 19 but have been assembled
next to, or overlapping with, each other in Assembly 20.


3) ORFs with no sequence change in Assembly 20 Criteria: The
nucleotide sequence of the ORF in Assembly 19 and 20 is the same
(sequence across the whole ORF, including any intronic sequence).
This assignment was made computationally.  File contains: list of all
ORFs with no changes to the nucleotide sequence between Assembly 19
and Assembly 20 File name: NoSeqChangeInAssembly20.txt

Note: These criteria do not exclude ORFs in which adjustments have
been made to the position of an intron without any change in the
underlying sequence.


4) ORFs with synonymous changes ONLY, between Assembly 19 and Assembly
20 Criteria: The nucleotide sequence of the coding sequence or CDS,
excluding any intronic sequence, is not the same between the two
assemblies, however, the translated amino acid sequence is the same.
This assignment was made computationally.  File contains: list of all
ORFs with only synonymous changes between Assembly 19 and Assembly 20
(the nucleotide sequence has changed, yet the predicted amino acid
translation is unchanged), with nucleotide alignments between the
sequence of the ORF in Assembly 19 and the sequence of the ORF in
Assembly 20 File name: SynonymousOnlyChangeInAssembly20.txt

Note: ORFs classified in the categories "Simple Sequence Changes" and
"Complex Sequence Changes" may have synonymous changes in addition to
other, nonsynonymous sequence changes.

Note: Problem ORFs that have been extended by one or two basepairs in
Assembly 20, in the absence of other sequence changes that affect the
translated sequence, will meet the criteria for inclusion in this
category.



5) ORFs with simple sequence changes in Assembly 20 Criteria: The
aligned region encompasses the entire length of the ORF in both
Assembly 19 and Assembly 20, and amino acid identity is 98% or
greater.  This assignment was made computationally.  File contains:
list of all ORFs with small changes in protein sequence between
Assembly 19 and Assembly 20, with protein sequence alignments.  File
name: SimpleSeqChangesInAssembly20.txt

Note: This category includes ORFs that may contain substitutions,
small insertions, and/or small deletions, yet overall identity between
the two predicted protein sequences is 98% or greater.  Cases in which
only intronic sequence has changed, and the translated sequence has
not been affected, are also included in this category.


6) ORFs with complex sequence changes in Assembly 20 Criteria: ORF has
changed in nucleotide sequence, and changes do not fall into the
"synonymous changes only" or "simple amino acid changes" categories.
This assignment was made computationally.  File contains: list of all
ORFs that have changed significantly in sequence between Assembly 19
and Assembly 20, with protein sequence alignments.  File name:
complexSeqChangesInAssembly20.txt

Note: This category includes ORFs that may contain substitutions,
insertions, deletions, and/or changes to the 5' and/or 3' boundary
(annotation changes, in which the ORF boundary is moved without an
underlying sequence change, or sequence changes).  The protein
alignment may show 100% identity if complex changes have taken place
outside of the aligned region (e.g., if the N- or C-terminal region
has been changed).


7) Excel-format spreadsheet of all Assembly 20 ORFs and a
classification of the type of change, if any, that affected the ORF
between Assembly 19 and Assembly 20

File contains: Excel workbook with two worksheets.   
The first worksheet contains a list of all of the Assembly 20 ORFs 
and their classification into the six categories outlined above. 
The columns in the first worksheet are as follows:
A) Assembly 20 ORF name
B) Complex Sequence changes in Assembly 20
C) New in Assembly 20
D) No change in Assembly 20
E) Simple sequence including substitutions and indels in Assembly 20
F) Synonymous changes Only in Assembly 20
G) Chromosome 
H) Start 
I) Stop
J) Strand
K) Exon segments 
L) Contig19 coordinates
A "1" in columns B through F indicates that the ORF is classified in the category.   
The second worksheet contains a list of all of the Assembly 19 ORFs that are not 
present in Assembly 20, and the Contig19 name and coordinates.
File name: ClassificationTablePerGene.xls


8) Tab-delimited text file of all Assembly 20 ORFs and a classification of the 
type of change, if any, that affected the ORF between Assembly 19 and Assembly 20
File contains: A list of all of the Assembly 20 ORFs and their classification 
into the six categories outlined above. 
The columns are as follows:
A) Assembly 20 ORF name
B) Classification (into the categories described for files 1-6, above
C) Chromosome (ORF name appears in this column if ORF is classified as 
"deleted from Assembly 20")
D) Start coordinate on chromosome (Contig coordinates appear in this column 
if ORF is classified as "deleted from Assembly 20")
E) Stop coordinate on chromosome
F) Strand
G) Exon Segments
H) Contig19 coordinates

File Name: ClassificationPerGene.txt


9) Merged ORFs Criteria: Merged ORFs were evaluated as follows: The
Assembly 19 nucleotide sequence, with any introns, of each of the ORFs
that were deleted from Assembly 20 were compared by BLAST against the
set of all Assembly 20 ORFs (nucleotide sequence, with introns).  A
strong match indicates that the deleted ORF may have been subsumed by
the Assembly 20 ORF.  Such candidates were evaluated *manually*.  If
the orf19 names of the possible merged pair were numerically close to
each other (e.g., orf19.1556 and orf19.1555), the candidate pairs were
evaluated in the GBrowse genome browser.  If the ORFs overlapped on
the same strand, the pair was scored as "merged."  If the ORFs did not
overlap, or were on opposite strands, the pair was scored as "not
merged." The possible merged pairs with the orf19 names that were not
close to each other were evaluated in the GBrowse genome browser
displaying the position of the Assembly 19 contigs overlaid on the
Assembly 20 chromosomes.  The ORFs were scored as "merged" if they
were located on the overlapping segments of the adjacent contigs or if
they spanned a junction between the adjacent contigs.  File contains:
The Feature name (orf19 name) of the ORF that remains after the merge,
the Locus name (e.g., ABC1) of the ORF that remains after the merge,
the Feature name (orf19 name) of the ORF that is deleted (subsumed)
during the merge, the Locus name (e.g., ABC1) of the deleted/subsumed
ORF.  File name: MergedORFs.txt


10) ORFs truncated by contig ends in Assembly 19, along with the new
coordinates in Assembly 20 Criteria: In Assembly 19, one terminus of
the ORF was a contig end.  File contains: ORF name, chromosomal
coordinates in Assembly 20, contig coordinates in Assembly 19, length
of protein in Assembly 20, length of protein in Assembly 19.
Tab-delimited file.  File name: OrfsAtEndOfContigInAssembly19.txt

Note: This does not include any ORF whose terminus was near, but not
at, the end of a contig in Assembly 19 and which was extended in
Assembly 20.  However, these ORFs are classified as having "complex
sequence changes" as described above.


11) ORFs containing gaps/introns/adjustments in Assembly 19 Criteria:
ORFs from Assembly 19 are included in this category if the coding
sequence (CDS) comprises more than one segment.  File contains: ORF
name; contig and coordinates; size of the intron/gap (nucleotides);
orthologous gene from S. cerevisiae, if any; whether or not
orthologous gene from S. cerevisiae contains an intron; global
nucleotide alignment of the entire sequence (including the introns) to
the CDS (with introns removed) File name:
OrfsWithIntrons_Assembly19.txt

Note: ***This category includes gaps that are NOT bona fide
introns.*** The Annotation Working Group added small gaps to make
adjustments to the reading frame, or to eliminate stop codons in cases
in which the annotator judged that the sequence was likely to be in
error.  Note that the length of some intron/gaps are negative numbers
(i.e., a region of the exon is counted twice).

All intron predictions should be considered to be preliminary, and
these predictions should be subject to further evaluation.

If there are multiple gaps/introns, the sizes of the gaps/introns are separated by commas.

12) ORFs containing gaps/introns/adjustments in Assembly 19 (without
alignments) Criteria: ORFs from Assembly 19 are included in this
category if the coding sequence (CDS) comprises more than one segment.
This file is identical to the file OrfsWithIntrons_Assembly19.txt,
except that it does NOT contain the alignments and is therefore more
amenable to viewing as a spreadsheet.  File contains: ORF name; contig
and coordinates; size of the intron/gap (nucleotides); orthologous
gene from S. cerevisiae, if any; whether or not orthologous gene from
S. cerevisiae contains an intron.  The file is in tab-delimited text
format.  File name: OrfsWithIntrons_Assembly19_List.txt


13) ORFs containing gaps/introns in Assembly 20 Criteria: ORFs from
Assembly 20 are included in this category if the coding sequence (CDS)
comprises more than one segment.  File contains: ORF name; chromosome
and coordinates; size of the intron/gap (nucleotides); orthologous
gene from S. cerevisiae, if any; whether or not orthologous gene from
S. cerevisiae contains an intron; global nucleotide alignment of the
entire sequence (including the introns) to the CDS (with introns
removed).  The ortholog assignments have been updated to reflect the
Assembly 20-based mapping generated on November 26, 2006.  File name:
OrfsWithIntrons_Assembly20.txt

Note:  ***This category includes gaps that are NOT bona fide introns.***  

The Annotation Working Group added small gaps to make adjustments to
the reading frame, or to eliminate stop codons in cases in which the
annotator judged that the sequence was likely to be in error.  Some of
the gaps introduced by the Annotation Working Group have a length that
is a negative number; that is, the coding sequence comprises two
overlapping segments, such that some sequence is counted twice.  These
are called "Adjustments," rather than "Introns" on the Locus page of
the affected ORFs.  Like the introns/gaps that are small in size,
these "adjustments" should also be considered flags that indicate that
resequencing of the area is advised.

Please also note: Changes in the position of gaps/introns (a
gap/intron that had "slid" or "slipped"), without other changes to the
annotation of the region, appear to be due to some problem with file
manipulations during generation of Assembly 20.  In several such
cases, an internal stop codon was generated in Assembly 20 in ORFs
that did not have such internal stops in Assembly 19 (and in which the
underlying nucleotide sequence was unchanged between the two
assemblies).  These ORFs are the following: orf19.1261, orf19.130,
orf19.1639, orf19.1693, orf19.2440, orf19.3245, orf19.4136,
orf19.5880.  After the initial loading of the Annotation Working
Group's Assembly 20 data into CGD, CGD adjusted the position of these
gaps to restore their position as defined in Assembly 19.  The other
sequence will remain as-is in CGD until further information is
available.

All intron predictions should be considered to be preliminary, and
these predictions should be subject to further evaluation.  We provide
the size of the intron/gap/adjustment in Assembly 20 and information
about the S. cerevisiae ortholog in this file to facilitate initial
assessment.

If there are multiple gaps/introns, the sizes of the gaps/introns are separated by commas.

14) ORFs containing gaps/introns/adjustments in Assembly 20 (without
alignments) Criteria: ORFs from Assembly 20 are included in this
category if the coding sequence (CDS) comprises more than one segment.
This file is identical to the file OrfsWithIntrons_Assembly20.txt,
except that it does NOT contain the alignments and is therefore more
amenable to viewing as a spreadsheet.  File contains: ORF name; contig
and coordinates; size of the intron/gap (nucleotides); orthologous
gene from S. cerevisiae, if any; whether or not orthologous gene from
S. cerevisiae contains an intron.  The ortholog assignments have been
updated to reflect the Assembly 20-based mapping generated on November
26, 2006.  The file is in tab-delimited text format.  File name:
OrfsWithIntrons_Assembly20_List.txt


15) ORFs with changes to intron/gap/adjustment regions between
Assembly 19 and Assembly 20 Criteria: Assembly 20 ORFs are included if
the number or nucleotide sequence of introns/gaps/adjustments differs
between Assembly 19 and Assembly 20.  File contains: ORF name;
coordinates of exons in Assemblies 20 and 19; alignment of the
Assembly 19 genomic nucleotide sequence (coding sequence plus
intron(s)) vs. the Assembly 20 version; alignment of the Assembly 19
ORF protein sequence vs. the Assembly 20 version.  File name:
intronChangesInAssembly20.txt (show alignments: nucleotide and
translations)

Note: Small changes in coordinates may not result in changes at either
the nucleotide or amino acid sequence levels.

Note: ***Not all gaps are bona fide introns.*** The Annotation Working
Group added small gaps to make adjustments to the reading frame, or to
eliminate stop codons in cases in which the annotator judged that the
sequence was likely to be in error.  All intron predictions should be
considered to be preliminary, and these predictions should be subject
to further evaluation.

Please also note: Changes in the position of gaps/introns (a
gap/intron that had "slid" or "slipped"), without other changes to the
annotation of the region, appear to be due to some problem with file
manipulations during generation of Assembly 20.  In several such
cases, an internal stop codon was generated in Assembly 20 in ORFs
that did not have such internal stops in Assembly 19 (and in which the
underlying nucleotide sequence was unchanged between the two
assemblies).  After the initial loading of the Annotation Working
Group's Assembly 20 data into CGD, CGD adjusted the position of these
gaps to restore their position as defined in Assembly 19.


16) ORFs with changes to intron/gap/adjustment regions between
Assembly 19 and Assembly 20 (without alignments) Criteria: Assembly 20
ORFs are included if the number or the nucleotide sequence of
introns/gaps differs between Assembly 19 and Assembly 20.  This file
is identical to the file intronChangesInAssembly20.txt, except that it
does NOT contain the alignments and is therefore more amenable to
viewing as a spreadsheet.  File contains: ORF names File name:
intronChangesInAssembly20_OrfList.txt


17) Problem ORFs that have internal stop codons (with translation)
Criteria: This set of ORFs has a stop codon within the reading frame,
as presented in the Assembly 20 files from the Annotation Working
Group.  File contains: List of ORFs in this category, with nucleotide
sequence (full, including any intronic sequence), coding sequence
(CDS, with introns removed), and amino acid translation File name:
OrfsWithInternalStopCodonsInAssembly20.txt Note: Most of the stop
codons are near the end of the ORF described in the Assembly 20 file.
Some are followed by a few residues of predicted protein sequence,
some are followed by additional stop codons.  After loading the data
from the original Assembly 20 file and archiving this starting data,
CGD plans to adjust the boundary of these ORFs in the database and in
the subsequent sequence files.  The four exceptions are orf19.4384.1,
orf19.3813, orf19.359 and orf19.5775.3 (described in more detail in
the file "problemORFInEMBLfiles.txt"); these ORFs will remain as-is in
CGD until additional data are available.

18) Problem ORFs that are lacking terminal stop codons Criteria: This
set of ORFs lacks the terminal stop codons, as presented in the
Assembly 20 files from the Annotation Working Group.  File contains:
List of ORFs in this category, with nucleotide sequence (full,
including any intronic sequence), coding sequence (CDS, with introns
removed), and amino acid translation File name:
OrfsWithoutEndStopCodonInAssembly20.txt Note: In most of these cases,
adjusting the end coordinates to extend the ORF by a few nucleotides,
relative to its coordinates in the initial Assembly 20 release, would
append an in-frame stop codon. After loading the data from the
original Assembly 20 file and archiving these starting data, CGD has
adjusted the boundary of these ORFs. The new coordinates now appear in
the CGD sequence files.  There are two ORFs that end with undetermined
sequence ("NNN"), orf19.2657 and orf19.7398.1, and the termini of
these two ORFs will not be modified by CGD in the absence of
additional sequence data.  In addition, orf19.3073 runs of the end of
Assembly 20 Chromosome 4 and it therefore lacks a terminal stop.  Also
included in this file are ORFs that extend downstream of an in-frame
stop codon by a few residues.  (These ORFs are also included in the
category, "Problem ORFs that have internal stop codons," and are
listed in the file OrfsWithInternalStopCodonsInAssembly20.txt, as
described above.)  The coordinates of ORFs with in-frame stops within
a few codons of the terminus have also been adjusted; they have been
truncated so that they end at the stop codon. These adjustments were
performed after loading the data from the original Assembly 20
EMBL-format files and archiving this starting data at CGD. The
adjustments are now present in the CGD sequence files.

19) ORFs with partial codons Criteria: Length of the coding sequence
(CDS, with any intronic sequence removed), in nucleotides, is not a
multiple of three File contains: ORF name, nucleotide sequence of the
ORF (any intronic sequence included), translated sequence File name:
OrfsWithPartialTerminalCodonInAssembly20.txt

Note: Coordinates of ORFs have been adjusted so that the ORF ends at
the stop codon; the extra nucleotides (partial codon) have been
removed from the CGD sequence files. These adjustments were performed
after loading the data from the original Assembly 20 EMBL-format files
and archiving this starting data at CGD.  Please also note that this
query was run after other coordinate adjustments were made; some of
the ORFs with partial codons in Assembly 20 were detected by other
queries and corrected before this list was generated (e.g., ORFs
without terminal stop codons).

20) ORFs with non-AUG start Criteria: ORF nucleotide sequence does not
begin with ATG File contains: List of ORFs, with nucleotide sequence
(including any intronic sequence) File name:
OrfsWithNonAUGstartInAssembly20.txt Note: Eight cases in Assembly 20.

21) Missing Contig19s, and the Assembly 19 ORFs that they contain
Criteria: Contig19s are included if they are not listed in the
EMBL-format Assembly 20 files File contains: Contig19 name, name of
ORF contained on the missing contig, Locus name (if any) of the ORF,
Feature Type of ORF, notes File name: Missing_contigs.xls

Note: The EMBL-format Assembly 20 files released by the Annotation
Working Group/Assembly 20 collaboration specify mapping of some of the
Assembly 19 contigs to the Assembly 20 chromosomes; however, not all
of the Contig19s are included in the EMBL-format files.  The file
"Missing_contigs.xls" contains information about the Contig19s that
are missing from the EMBL-format Assembly 20 files.

Each ORF is contained on a single line; missing Contig19s that
comprise multiple ORFs are listed on multiple lines.  The Feature Type
of each ORF indicates whether it is present in Assembly 20 and, if so,
whether the sequence has changed between Assembly 19 and 20. The notes
were entered based on manual investigation by BLAST.  Excel format
file.


22) Subdivided Contig19's Criteria: Contig 19's that are listed in the
EMBL-format file, and which are split into pieces in Assembly 20 File
contains: ID of Contig19 fragment; name of Contig19, Assembly 20
chromosome where contig fragment matches, chromosomal coordinates of
match File name: SplitContig19ToChromosomes.txt

Note: The subdivided Contig19 fragments are designated numerically,
for example, "Contig19-10070_1," "Contig19-10070_2,"
"Contig19-10070_3."

23) List of other Contig mapping problems File contains: Notes on some
problems with the Contig19 mapping onto Assembly 20 chromosomes from
the EMBL-format files.  File name: problemContigMappingToChr.txt


24) Notes on problematic entries in the Assembly 20 files File
contains: List of problematic ORFs from the Assembly 20 EMBL-format
files released by the Annotation Working Group/Assembly 20
collaboration.  Notes on the way in which these issues will be handled
in CGD.  File name: problemORFInEMBLfiles.txt

Note: This file describes the following types of problems in the
EMBL-format files released by the Annotation Working Group/Assembly 20
collaboration: orf19 names that have been used for two different
regions in the EMBL-format Assembly 20 files (4 cases), orf19 name
that is used as the name of one ORF and also as an allele name of a
different ORF (1 case), ORF without a name in the EMBL-format Assembly
20 files (1 case), ORFs with internal stop codons that are not
amenable to correction by a simple adjustment in the terminal
coordinate (4 cases), ORFs that are extremely changed in sequence
between Assembly 19 and Assembly 20 (4 cases), and ORFs that contain a
stop codon in Assembly 20 in the absence of any underlying sequence
changes (coordinates of an intronic or gap sequence has changed
position ("slipped") between the two assemblies, creating an in-frame
stop codon).