HPC University: mpiBlast

Sunday August 8, 2010

mpiBLAST is a parallel implementation of the BLAST algorithm. BLAST is used to search large nucleotide or protein databases and find similar sequences to an input sequence.

"Relatedness"

mpiBLAST is often used to compare how related two (or more) different species are based on how similar a particular gene is. This is often used in the biological field called cladistics.

You have two databases available to you to work with: the genome
of Drosophila melanogaster (the common fruit fly) and Saccharomycetes cerevisiae (yeast). One common gene used to evaluate evolutionary similarity is cytochrome c, and you happen to have the sequence for the human version of this gene: Human cytochrome c. Copy this into an input file and compare it against the fruit fly (drosoph.nt) and yeast (yeast.nt) databases. These databases have already been formatted for you to run on four processors, so you'll want to specify this in your qsub script!

Resources

mpiBLAST Download/Main site

NCBI Toolbox, necessary for compiling the challenge

Overview of the Implementation

Results

The first part of the mpiBLAST results show a summary of the findings. Each line shows a match - a gene sequence similar to the input sequence. Each match is on a separate line followed by its BLAST score. Points are awarded for nucleotides that match between the input sequence and the resulting sequence; points are deducted for places where a nucleotide is missing or different from the input sequence. A higher score means the two sequences are more related.

The second part of the score is the E-value. The E-value represents how likely these two sequences are matching due solely to chance. Shorter sequences tend to be more likely to match by chance; longer sections of matching sequences are less likely.

If you scroll down below, you will see the gene sequence that the input sequence has been matched by. Vertical lines indicate where the nucleotides in the two sequences match exactly. You can also find the number of nucleotides matching between the two sequences by number and by percentage.

How do you interpret your results?

What species is more closely related to humans when judged by the cytochrome c gene? How do you know?

-Please note that the current version of mpiBLAST is unavailable for Windows platforms at this time; you will need a Mac OS X or Linux system.

Show solution

| XSEDE Code of Conduct | Not Logged In. Login

Home

Careers

Educators

Events

Resources

Students