CGD Help: GO Slim Mapper


Contents



Background and Description

The Gene Ontology (GO) project was established to provide a common language to describe aspects of a gene product's biology. A gene product's biology is represented by three ontologies: molecular function, biological process and cellular component. The use of a consistent vocabulary allows genes from different species to be compared based on their GO annotations. To provide the most detailed information available, gene products are annotated to the most granular GO term(s) possible. For example, if a gene product is localized to the perinuclear space, it will be annotated to that specific term only and not the parent term nucleus. In this example the term perinuclear space is a child of nucleus. Parent-children relationships also be viewed using AMIGO.

However, for many purposes, such as reporting the results of GO annotation of a genome, analyzing the results of microarray expression data, or cDNA collection, it is very useful to have a high level view of the three ontologies. For example, if you wanted to find all the genes in an expression cluster that were localized to the nucleus, it would be useful to be able to map the granular annotations such as perinuclear space to general terms like nucleus. Thus, GO Slim was created. GO Slim is a high level view of GO: a slice of the broad, high level terms such as DNA replication, transcription, and transport. There are several versions of GO Slims created for different genomes and the GO Slim terms are updated periodically. To view and/or download other GO Slims, go to the GO Slim ftp site. The GO Slim Mapper tool at CGD uses the GO Slim terms picked by the CGD curators based on annotation statistics and biological significance. The Candida GO Slim may be downloaded from CGD's Download Data page.

The GO Slim Mapper tool at CGD was created to allow you to map the granular annotations of the query set of genes to one or more high level, parent GO Slim terms. This is possible with GO because there are parent:child relationships recorded between granular terms and more general parent (ie. GO Slim) terms.

For more information on GO in general, visit the Gene Ontology website or the GO help page provided by the Saccharomyces Genome Database (SGD).

Query Page

The query page allows you to enter the list of gene names and select your GO Slim terms.

  1. Choose the strain:
    Select a strain name from the pull-down menu.

  2. Enter your gene(s):
    You can either type the names of the genes in the input box or upload a file that contains the gene names. Note that the program requires more time to process a long list (greater than 100 genes) than a short list. Each query can only process gene names from a single species.

  3. Choose your GO Slim terms:
    Select one or more GO Slim terms from one of the three (biological process, molecular function, or cellular component) ontologies by checking the boxes. This tool is designed to search only one of the three ontologies at a given time in order to minimize the searching time.

  4. If you click the Search button after Step 3, the tool will map annotations made to your input list of genes by compiling data from the Manually curated, High-throughput, and Computational sets. You can go to optional Step 4 to filter by Annotation Method.

  5. An optional 4th step allows you to select any desired combination of annotation sets (Manually curated, High-throughput, and Computational) when using GO Slim Mapper to map annotations to your input set of genes.

    Note that the default sets of annotations used by GO Slim Mapper are analogous in CGD and in the Aspergillus Genome Database (AspGD), but different from the set used at SGD. In CGD and AspGD, all GO annotations (Manually curated, High-throughput, and Computational) are used as the default set, while in SGD Computational annotations are excluded from the default set. Computational annotations are included for CGD and AspGD because they augment the GO annotation coverage of the genomes, providing annotation for many uncharacterized genes. In contrast, in SGD the greater extent of characterization of S. cerevisiae genes means that computational annotations are frequently redundant with or less specific than experimentally-derived annotations, and dilute this higher-quality set.

Results

The results page displays the GO Slim term(s) to which your gene(s) granular annotations have been mapped. You could click on the locus name to see the details of the GO annotations for each of the gene /ORF names. Also listed is the frequency with which each GO Slim term is used to annotate (directly, or indirectly, via a parental relationship with a granular term) the genes in your list.

You can also download the results into a tab-delimited (ie. Excel readable) file by clicking on the Download Results link.

Example

Let's take an example from SGD. Consider a small group of 4 S. cerevisiae genes-- PHO1, PHO2, PHO3 and PHO4. The following are the granular molecular function annotations for these genes in SGD.

Gene NameMolecular Function Annotation
PHO1hydrogen-transporting two-sector ATPase
PHO2transcription factor
PHO3acid phosphatase
PHO4transcription factor

Searching for all the GO Slim function terms will map these annotations to the following:

GO-Slim termCluster frequencyGenes annotated to the term
Function: transcription regulator2 out of 4 genes, 50% PHO2, PHO4
Function: enzyme2 out of 4 genes, 50%ATP6/PHO1, PHO3
Function: transporter1 out of 4 genes, 25%ATP6/PHO1
Function: molecular_function unknown0 out of 4 genes, 0%none
Function: structural molecule0 out of 4 genes, 0%none


From the two tables above the following conclusions can be drawn: You can see the relationships mentioned above and much more using the AmiGO browser.


Return to CGD Send a Message to the CGD Curators