CGD Help: BLAST Results

Contents

Description

Graphic Display
One-line Descriptions
Sequence Alignments
Parameters & Statistics

Accessing BLAST Results
Other Relevant Links

Description

The results of a BLAST query are reported in roughly the same format, regardless of the program selected. The first section is a graphical overview of the results, the second is a series of one-line descriptions of matching database sequences, the third is a set of the actual alignments of the query sequence with database sequences, and the last section lists the parameters used and the statistics generated during the search. More information is available in the help document for the BLAST input form.

The graphic display and one-line descriptions give information about database sequences that form a High-Scoring Segment Pair (HSP) with the query sequence. An HSP is created when two sequence fragments (one from the query sequence and the other from a database sequence) show a locally maximal alignment for which the alignment exceeds a pre-defined cutoff score. BLAST uses HSPs to identify hits.

Graphic Display

The above is a reduced example of the new BLAST graphical overview format. The graph's purpose is to give the user a brief summary of the entire result set that provides a broad perspective on the data. Significant features include:

Color coding of P-values
A wider selection of hits
Annotations displayed via JavaScript
A date stamp for archival reference

The graph is meant simply to be a short sketch of the (possibly huge) results so its size is limited; sometimes not all hits can be shown.

How HSPs are Shown

Each hit may contain one or more high-scoring segment pairs (HSPs). Each HSP is drawn as a line, and is aligned with the query sequence. In the close-up we see two short HSPs, and three long ones running off the right edge. The smallest HSP begins at 185 bp and ends at 233 bp along the query sequence.

HSPs are Directional

In the full text BLAST results, each HSP is either either plus or minus. If the query and HSP strands are the same, the HSP is termed forward. If they differ, the HSP is termed reverse.

HSPs Share a Background Color

All HSPs for a displayed hit are drawn. They share a single background color to signify their relationship. Here are two hits, each containing multiple HSPs:

Thus for the first hit, orf19.13673, the background is white. For the second, orf19.5015, it is gray.

Hits are Color Coded

The hits are color coded according to their P value. A set of five fixed ranges is used to determine a color for each hit. These ranges, from "worst" to "best," are:

1.0	to	1e-10
1e-10	to	1e-50
1e-50	to	1e-100
1e-100	to	1e-200
1e-200	to	0.0

The key shows these colors, and notes the value of the negative exponents in each range. It progresses from "worst" on the left to "best" on the right.

Note that ranges might not contain any hits, since the ranges are fixed while the hit P-values are not. When ranges share a boundary value (e.g.: 1e-50), that value falls in the "better" range and will be colored thus (e.g.: green).

How Hits are Chosen for Display

Often, there will be more data available than can be displayed in the graphic. The current system takes a particular approach to selecting data to include, biased in favour of giving a complete overview of the data rather than showing only the top hits. The rationale is that it can be important to show results further away from identity.

First, the hits are sorted into color coded ranges. Next, the top hit from each range is picked, starting with the "best." It keeps track of how much space each hit will take up when drawn; if, after including those, there is still room left over, it iterates once more, picking the next top hit from each range. This process continues until there are either no more hits, or there is no room left in the display.

Note that the final drawing of the hits will be in proper order, even though they have been selected in an interleaved fashion: all of the best hits are drawn at the top of the image.

Range Counts

If not all hits are shown, range counts will appear at the right side of the graph. In our example, all hits from the top range are shown and thus the annotation says "All." However, not all hits in the next range were able to be displayed so "1/3" indicates two omitted hits.

Note that if a range contains no hits, no count is shown (thus, there are no green or cyan notations in our example). If all of the BLAST results fit into the graph, no range counts are displayed at all.

Static and JavaScript Annotations

Hit names and P-values are displayed at the left side of the graph.

If you enable JavaScript in your web browser, annotations for each hit will be displayed in a text field just above the graph as you move the mouse; score is included along with P-value.

p=0.0e0 s=7741 YOR326W|MYO2, Chr XV from 925712-930436

One-line Descriptions

The one-line descriptions summarize information about the database sequences that form HSPs with the query sequence. At the left end of each one-line description is the name of the database sequence that forms an HSP with the query sequence. Each description also includes the score and P-value for the hit.

Sequence Alignments

The sequence alignments show the query sequence at the top, with the aligned database sequence (Sbjct, or subject) at the bottom. The starting and ending coordinates of the areas of similarity are shown at the left and right of the aligned sequences. When nucleotide sequences are being aligned, vertical lines between the bases signify identities. Amino acid identities are shown by the repetition of the one-letter code for that amino acid between the residues. Conservative amino acid changes are shown by a "+" sign between the aligned residues. Places where gaps had to be introduced to achieve the alignment are signified by a "-" in the query or subject sequences.

The database sequences that are similar to the query sequence can be retrieved by using the "Gene/Sequence Resources" link.

Parameters & Statistics

For amino acid sequences, the default filter setting is "seg." This filter removes repetitive sequences. Removed residues are indicated by Xs. For nucleic acid sequences, the default filter setting is "dust." The removed residues are represented as Ns. To turn off this filter, return to the BLAST search page and select "none" as a filter option. You can also use the pull-down "Filter options" menu to select a different type of filter.

If a BLAST search results in no, or few, matches, the user may try to increase the number of matches in a number of ways. Going back to the BLAST search page, one can change the database searched, change the comparison matrix, or increase the number of alignments shown.

For a more detailed description of BLAST results and the statistical information they provide, please refer to the NCBI BLAST Help Manual.

Accessing BLAST Search Results

BLAST results can be accessed by entering a sequence using the BLAST Search Page, then choosing the appropriate BLAST Program. The S-score can be set to 30 to facilitate searches for very short imput sequences. For more information about performing a BLAST search, see the CGD's BLAST Searches help page.