The file, orf19_orf6_mapping.txt, provides a mapping from the names of
the Open Reading Frames identified in C. albicans SC5314 Assembly 19,
to the names of the ORFs in Assembly 6.  

This mapping was done by
blasting the haploid set of orf19 predicted proteins (file available
at http://www.candidagenome.org/
download/sequence/genomic_sequence/orf_protein/orf_trans_all_haploid.fasta,
as of October 27, 2005) against orf6 predicted proteins (file from the
Stanford Genome Technology Center, downloaded from
http://www.candidagenome.org/
download/sequence/genomic_sequence/archived_assemblies/Ca-Assembly6.orf_trans).
The best hit, or hits with >90% identity were retained.  The pairs
were subsequently screened, such that if an orf6 in an orf19-orf6
pairing had a more significant hit to a different orf19, then the less
significant pairing was removed.  In cases where multiple orf6 matches
were observed for a single orf19, some subsequent manual curation was
performed to remove pairs with less significant E values.  An attempt
was made to ensure that adjacent orf6's aligned with adjacent orf19's;
however, this approach proved not to be helpful as a measure of
validation due to apparent regions of misassembly in Assembly 6.

Note, this is not necessarily a 1-to-1 mapping; some ORFs have multiple matches.  The file of pairing contains the following columns:

Column	Description
1	The orf19 identifier
2	The Assembly 19 Contig from which the orf19 ORF derives
3	The orf6 identifier
4	The Assembly 6 Contig from which the orf6 ORF derives
5	E, the expectation or E-value
6	N, the number of scores considered jointly in computing E
7	Sprime, the normalized alignment score, expressed in units of bits
8	S, the raw alignment score
9	alignlen, the overall length of the alignment including any gaps
10	nident, the number of identical letter pairs
11	npos, the number of letter pairs contributing a positive score
12	nmism, the number of mismatched letter pairs
13	pcident, percent identity over the alignment length (as a fraction of alignlen)
14	pcpos, percent positive letter pairs over the alignment length (as a fraction of alignlen)
15	qgaps, number of gaps in the query sequence
16	qgaplen, total length of all gaps in the query sequence
17	sgaps, number of gaps in the subject sequence
18	sgaplen, total length of all gaps in the subject sequence
19	qframe, the reading frame in the query sequence (+0 for protein sequences in BLASTP and TBLASTN searches)
20	qstart, the starting coordinate of the alignment in the query sequence
21	qend, the ending coordinate of the alignment in the query sequence
22	sframe, the reading frame in the subject sequence (+0 for protein sequences in BLASTP and BLASTX searches)
23	sstart, the starting coordinate of the alignment in the subject sequence
24	send, the ending coordinate of the alignment in the subject sequence