Multiple Alignment Assignment


In this exercise we will align proteins using the ClustalW, Clustal Omega and Muscle programs and look at the different outputs. The outputs have to be handed in as well, please give them indentifiable names!

READ THROUGH THE ENTIRE ASSIGNMENT BEFORE YOU START!!


This week, our gene of interest is KiSS1, which was isolated in a lab in Hershey, Pennsylvania, and named in honor of the chocolate kiss.

The protein is known to be involved in metastasis suppression, as well as puberty. We are interested in the puberty related function, which is due to its neuropeptide activity. The gene is known in mouse and human, but we're looking for a different model system. We decided to try to find homologs in fish, as it is a good model system for studying puberty. * Note: neuropeptides are usually very short and very conserved, though the overall gene in which it sits may not be as conserved. The active peptides are produced by proteolytic cleavage. *

We decide to take our mouse protein and do a blast search.

After running Blast, a number of hits were found. Surprisingly, in zebrafish, there were two different hits, even in the neuropeptide part (10-14 amino acids, depending on which definition you use)! Then we notice there is a second hit in frog too. We decided to take a represntative sequence from various evolutionary groups to try to figure out whats going on. The hits we decided to use are:

A file containing the sequences in fasta format is available by clicking here
(If this doesn't open in a plain html window, just copy it into word and save as text only)

We would like to use a graphical viewer to look at some of the alignments. If you don't have Jalview on your computer, please download and instill (it's free) the version appropriate to your system from here: http://www.jalview.org/getdown/release/ There is a link under the table to "View all platforms" if you're not on a Mac. Follow the directions for installation.

Open a window to the ClustalW page.

Now we are going to run ClustalW:

Paste in your sequences, and use all of the default parameters. Click on the "Submit" button.

NOTE! This may work smoothly, this may be slow, and it may 'time out' - if it does, just reload the page, the results will come back! (Don't go back, don't send again, just reload. Don't try a different site, they all have different defaults! Very early in the morning tends to work the best, or start it running, and check back a while later...). If all else fails, results are available here (just not as nice looking):here

Scroll down to the "Result files (text):" (it's below the alignment), click on the link "CLUSTALW" underneath it, and save the page (this will have to be handed in. Give it a recognizable name!). Look at the output file (or at the colored alignment on the previous page, whicever you find easier) and answer the following questions (Describe the alignment):

  1. Are there conserved regions in the alignment (if yes, where are they)?
  2. Are there gaps, and if so, where are they placed (what causes them, one sequence or many)?
  3. How similar are the various proteins overall?
  4. Which of the zebrafish proteins is closest to our starting sequence?
  5. Did ClustalW organize the sequences in an evolutionarily logical manner?
Now we will run Clustal Omega, an improved version of the algorithm.

Open a window at EBI here

Paste or upload the same sequences, and run with the default parameters. Save the results here too, from the "Download Alignment File" tab.

  1. How did the alignment change?
  2. Which do you think is better? (explain why)
Now we are going to try to look at the alignment with color.

Click on the "Results Viewers" tab. Click on "View result with Jalview". This should download the file. (On certain Macs it may give a security warning. If it does, you will have to allow the download). Double click on the file, and it should open in automatically in Jalview. If you've already opened the program, make sure to close the default sample alignment it opens. You can be sure you are in the right viewer by checking the sequence names.

Stretch the window so that you can see the whole alignment. Go to the "Colour" menu and choose "Percentage Identity". Look at the results, and answer the following:

  1. Do you see subgroups (one or more groups of more closely related sequences)? If so, which proteins can you group together (give names as they appear in the alignment)?

Go to the "Colour" menu (in the Jalview window, not the browser window) and choose "Above Identity Threshold" (NOT Modify) - you will get a slider that will let you control the percent identity of the coloration.

Play around with the slider a bit to answer the following:

  1. How many amino acids are completely conserved in these sequences?
  2. Where do you see conserved regions in this view?

Keep your various Clustal windows open. Now we are going to compare to Muscle.

Open the EBI page for Muscle.

Paste in the sequences and click "Submit". Save the alignment file as before.

You can open the result in Jalview (the same way you did for ClustalO) or just use your alignment files to answer the following:

  1. How many completely conserved amino acids are in this alignment?
  2. Where are the conserved regions?
  3. Do you see subgroups? If so, what are they?

Compare the results of Muscle and Clustal Omega:

  1. Are alignments the same? different? Describe what you see. (conserved regions, gaps, which sequences cause the gaps, order of sequences...)
  2. In Muscle a mistake is made in the neuropeptide region - what is it? Why is it important?

Due to the mistake, and in attempt to understand the 'extra' sequences better, we decide to add more sequences, particularly of the second group, if we can find them. We find a lizard sequence, and another fish with two sequences, add add them:

Run ClustalW and Muscle on this file. (Here are the links again to make this easier ClustalW and Muscle and a link in case the ClustalW gets stuck: results)

Print out the alignment files, and answer the following (once again, feel free to use jalview, the plain alignment, the color option, whatever you find easier to deal with):

First we'll compare ClustalW:

  1. What happened to the alignment in ClustaW? (This might be easiest to look at in color. Describe as you did above, use the previous questions as guidelines on how to respond)
  2. Why did this happen?
  3. What can we do to fix this?

Now for Muscle:

  1. What happened to the alignment here? Compare to the previous run of Muscle.
  2. What about the mistake that we had before?
  3. Why is Muscle's second alignment so different than ClustalW's?
Based on the second Muscle alignment:
  1. Please discuss the relatedness of the various subgroups in the family - can we say its one protein family? Why or why not?
  2. Please define patterns for the various subgroups in the neuropeptide region.
Now we're ready for motif definition.....
Hand in the outputs you saved - there should be five (don't forget to LABEL them with the name of the program and which input file), and a separate file with the answers to the questions as Assignment #5 (Do NOT paste in parts of the alignment or the jalview in the answer sheet!!!)