Assignment 4: Database similarity searching
In this assignment you should prepare a report with the answers to various questions.
This report can be written in any text editor or word proccessor.

In the report JUST give the answers - DO NOT print out the questions and write the answers in.

READ THROUGH THE WHOLE ASSIGNMENT BEFORE YOU START. It will help you understand the purpose of the questions!!!!

Add your name and e-mail address on top of the page!!!


This assignment will deal with the various ways you can change parameters to fine tune a database search.

Our target gene is Lip1, a single-pass transmembrane protein that is required for de novo ceramide (sphingolipid) synthesis in yeast. According to the local campus bioinformatics expert, the gene is limited to a very specific branch of fungi. As the other genes involved in ceramide synthesis are highly conserved in almost all eukaryotes, you find this strange. Your job is to see if what was reported is correct, and to try to find the gene in other species (particularly mammals) as well.

Enter the NCBI web site (this will open a new window): http://www.ncbi.nlm.nih.gov/ and get the protein. The accesion number is: NP_014027

Go to the protein blast page: Blast at NCBI

and run a search will all of the default parameters.

  1. How many hits are there?
  2. Open the search summary (link at the top of the graphic): what is the matrix and word size?
  3. What family are most of the hits from? (use the Taxonomy reports link at the top of the graphic) How many hits are not from that family?
  4. How many of the hits are not statistically significant?
  5. Please make a chart of the last 10 hits with the following columns: 1) description 2) accession number 3) percent similarity 4) percent identity 5) length of alignment 6) e-value
  6. Look at the alignments of the sequences in your chart. Which do you think are truly Lip1 proteins? (No or I can't tell are legitimate answers).

Now that we have a baseline, we'll start to change parameters (click "Edit and Resubmit" from the top of the page, do not go back, it doesn't work in all browsers. Be sure to change the title of the search so that you can keep track of what you've done). The first thing we'll do is change the word size to 2, and run the blast again.

  1. How many hits are there now?
  2. Look at the bottom five hits and make a similar chart as before (question 5).
  3. Compare the two charts. What has changed? Has anything stayed the same? Why?

Now we are going to change databases. Click "Edit and Resubmit" again. Change the wordsize back to 6, and the database to swissprot.

  1. How many hits are there?
  2. Make a chart of the hits as before (question 5)
  3. Find the hit from the last species in your previous results, (you may have to go back to the results pages). What has changed?
  4. Why are these results so different than the others?

Now we'll see what changing the matrix will do to the search. Click "Edit and Resubmit" again. Change the database back to nr, and change the matrix to PAM250.

  1. How many hits are there?
  2. How many hits do we have that are not statistically significant? What species are they from?
  3. Make a chart of the last 10 hits as before.

One last time, we'll change the matrix again, this time to PAM30.

  1. How many hits are there? How many are not statistically significant?
  2. How many hits are not from the same family (from question 3)?
  3. Make a chart of the last 5 hits.
  4. Why are these results so different from the previous runs?

To summarize:

  1. Look back over your various charts. There is one sequence that is sometimes significant, sometimes borderline, or doesn't show up (in the four nr searches, don't count the swissprot). Look at its alignment, its species, its description. What do you think, is it a true Lip1 or not? Explain your reasoning.
  2. What can we do to test if this sequence is a true hit or not? (Hint, think about what you know about this protein, there are at least two things you can do).
  3. We started the research to see if there is a mammalian equivalent. Using your results to explain, what do you think?

Hand in the report with all the answers Assignment #4