Pairwise Alignment (and translation)
In this assignment you should prepare a report with the answers to various questions.
This report can be written in any text editor or word proccessor.

NOTE - Do NOT hand in the program output - just answer the questions!

Add your name and e-mail address on the front page!!!

Don't forget! Read through the assignment first, so you get a better picture of what its all about!


In this assignment, we will be working with the Needle, Matcher and Water programs (from the EMBOSS package) and with Translate from Expasy.

(Ludwig van Beethoven, aged 50, by Joseph Karl Stieler)

Back to our musical inspiration, you join a lab working on deafness. It turns out that there is gene named for Beethoven, who had progressive hearing loss. The only problem, when you look in the Gene database, is that there are two genes called Beethoven, both isolated in screens for hearing loss. One is in mouse and one in drosophila, but unfortunately they are nothing alike. As mouse is the principal model system for the lab, you decide to continue on the mouse gene, and see if it has a homolog in drosophila. The official gene symbol for the mouse gene is TMC1 (Transmembrane channel-like protein 1). You do a search in the database and come up with a drosophila sequence, which you can get here.

The mouse sequence accession number is NM_028953.

Take the mouse sequence from NCBI in fasta format (if you give it a simple name, like the species, it will make it easier to tell things apart afterwards. You can change the name after you paste, or you can take the sequence without the header line and create your own). Make sure to keep a window with each sequence open, you will have to copy/paste them a few times.

First of all, we'd like to see how similar these sequences are overall, so we'll run a global alignment program, Needle. We'll use the EBI website for this NEEDLE.

Make sure you choose DNA, and check the parameters (but use the defaults):

  1. How similar are these sequences?
  2. Over what length?
  3. How many gaps are there (percentage)?
  4. From where to where (give nucleic acid coordiates) are the hits (the aligned segments) in both sequences?
  5. Are there any areas that you think are aligned better? From where to where (give approximate nucleotide coordinates) in the two sequences?
Now we will try the same sequences in a local alignment program, Water.

Open a window at EBI by clicking here

Make sure you choose DNA. Paste in the sequences, and run using the default parameters (Open them so that you know what they are).

  1. How many hits are there? (Note, if there is more than one hit, answer the following questions separately for each one!)
  2. How similar are these sequences?
  3. Over what length?
  4. How many gaps are there (percentage)?
  5. From where to where (give nucleic acid coordiates) are the hits (the aligned segments) in both sequences?
  6. Are there any areas that you think are aligned better? From where to where (give approximate nucleotide coordinates) in the two sequences?
Now we'll look at another local alignment program, Matcher. It is slightly different, and so are its parameters (and yes, this can effect the alignment, but we are more interested in another parameter at this point.)

Open a Matcher window here

Make sure you choose DNA, and in the parameters, set alternative matches to 3.

  1. How many hits are there? (Note, if there is more than one hit, answer the following questions separately for each one!)
  2. How similar are these sequences?
  3. Over what length?
  4. How many gaps are there (percentage)?
  5. From where to where (give nucleic acid coordiates) are the hits (the aligned segments) in both sequences?
  6. Compare your answer here to the last answers of the previous two questions. Did all the programs find the same best region?
Now we'll translate the sequences, and run local alignments changing the parameters.

We'll translate the sequences using Expasy translate, with the new interface, here. Open two windows (so you can get the sequences again if you need them). Paste in the sequences and hit the "TRANSLATE!" button.

  1. For each sequence, which frame is the proper protein in? How long are the proteins?
Click on the Methionine of the correct reading frame and it will open in a new panel in fasta format. Take these sequences and go back to the Water program. Now we'll start to play with the parameters. (You may want to run these in separate windows so you can compare).

We'll start by using the default parameters. Make sure the program is set on protein.

    The standard questions...
  1. How identical are these sequences?
  2. How similar are they?
  3. How long is the alignment?
  4. Describe the alignment. Is this similar or different from the DNA alignment of Water with the default parameters?
Raise the Gap opening penalty to 20, and the gap extension penalty to 1.
  1. How did the alignment change from the previous run? (Use your previous answers as guidelines)
  2. Why?
Now we'll raise it a bit more, Gap opening to 50 and gap extension to 5.
  1. How did the alignment change from the two previous runs? (Use your previous answers as guidelines)
  2. Why?
Now lower the penalties. Change the gap opening penalty to 1, and the gap extension penalty to 0.1.
  1. How does the alignment change now? (Use your previous answers as guidelines)
  2. Why?
Now we'll compare the various aligments, DNA and protein and the various programs:
  1. In your opinion, are these proteins related? (Yes or No isn't enough, explain!)
  2. Explain the advantages and disadvantages of a global vs. a local search? DNA vs. protein?