Assignment #5

Multiple Alignment Assignment

In this exercise we will align proteins using the ClustalW, Clustal Omega and Muscle programs and look at the different outputs.

READ THROUGH THE ENTIRE ASSIGNMENT BEFORE YOU START!!

This week, our gene of interest is KiSS1, which was isolated in a lab in Hershey, Pennsylvania, and named in honor of the chocolate kiss.

The protein is known to be involved in metastasis suppression, as well as puberty. We are interested in the puberty related function, which is due to its neuropeptide activity. The gene is known in mouse and human, but we're looking for a different model system. We decided to try to find homologs in fish, as its a good model system for studying puberty. * Note: neuropeptides are usually very short and very conserved, though the overall gene in which it sits may not be as conserved. The active peptides are produced by proteolytic cleavage. *

We decide to take our mouse protein and do a blast search.

After running Blast, a number of hits were found. Surprisingly, in zebrafish, there were two different hits, even in the neuropeptide part (10-14 amino acids, depending on which definition you use)! Then we notice there is a second hit in frog too. We decided to take a represntative sequence from various evolutionary groups to try to figure out whats going on. The hits we decided to use are:

Mouse (our input sequence, rodent)
Human (primate)
Opposum (marsupial - "pouch" mammals)
Frog-A (non-mammalian vertebrate)
Zebrafish-A (our target organism!)
Frog-B (the second hit in frog)
Zebrafish-B (the second hit in fish)

A file containing the sequences in fasta format is available by clicking here
(If this doesn't open in a plain html window, just copy it into word and save as text only)

Open a window to the ClustalW page.

Now we are going to run ClustalW:

Paste in your sequences, and use all of the default parameters. Click on the "Submit" button.

Scroll down to the "Result files (text):", click on the link "CLUSTALW" underneath it, and print it out. Look at the output file (or at the colored alignment on the previous page, whicever you find easier) and answer the following questions (Describe the alignment):

Are there conserved regions in the alignment (if yes, where are they)?
Are there gaps, and if so, where are they placed (what causes them, one sequence or many)?
How similar are the various proteins overall?
Which of the zebrafish proteins is closest to our starting sequence?
Did ClustalW organize the sequences in an evolutionarily logical manner?

Now we will run Clustal Omega, an improved version of the algorithm.

Open a window at EBI here

Paste or upload the same sequences, and run with the default parameters. Print the results here too, either directly from the page, or download and print (If it doesn't align when downloaded, make sure the font is Courier, and drop the size a bit if it doesn't fit nicely in lines).

How did the alignment change?
Which do you think is better? (explain why)

Now we are going to try to look at the alignment with color.

Click on the "View result with Jalview" tab (it may take a bit of time to load). It will open a new window (it may show up behind your window - if you don't see it after about a minute, move your window, and it may be there).

Stretch the window so that you can see the whole alignment. Go to the "Colour" menu and choose "% Identity". Look at the results, and answer the following:

Do you see subgroups (one or more groups of more closely related sequences)? If so, which proteins can you group together (give names as they appear in the alignment)?

Go to the "color" menu (in the Jalview window, not the browser window) and choose "Above Identity Threshold" (NOT Modify) - you will get a slider that will let you control the percent identity of the coloration.

Play around with the slider a bit to answer the following:

How many amino acids are completely conserved in these sequences?
Where do you see conserved regions in this view?

Keep your ClustalW windows open. Now we are going to compare to Muscle.

Open the EBI page for Muscle.

Paste in the sequences and click "Submit". Print out the alignment file as before.

You can open Jalview or just use your printouts to answer the following:

How many completely conserved amino acids are in this alignment?
Where are the conserved regions?
Do you see subgroups? If so, what are they?

Compare the results of Muscle and Clustal Omega:

Are alignments the same? different? Describe what you see. (conserved regions, gaps, which sequences cause the gaps, order of sequences...)
In muscle a mistake is made in the neuropeptide region - what is it? Why is it important?

Due to the mistake, and in attempt to understand the 'extra' sequences better, we decide to add more sequences, particularly of the second group, if we can find them. We find a lizard sequence, and another fish with two sequences, add add them:

Mouse (our input sequence, rodent)
Human (primate)
Opposum (marsupial - "pouch" mammals)
Frog-A (non-mammalian vertebrate)
Zebrafish-A (our target organism!)
Frog-B (the second hit in frog)
Zebrafish-B (the second hit in fish)
Lizard (another non-mammalian vertebrate)
Mackerel-a (another fish)
Mackerel-b

Run Clustalw and Muscle on this list. (here are the links again to make this easier ClustalW and Muscle) Print out the alignment files, and answer the following (once again, feel free to use jalview, the plain alignment, the color option, whatever you find easier to deal with):

First we'll compare ClustalW:

What happened to the alignment in ClustaW? (This might be easiest to look at in color. Describe as you did above, use the previous questions as guidelines on how to respond)
Why did this happen?
What can we do to fix this?

Now for Muscle:

What happened to the alignment here? Compare to the previous run of Muscle.
What about the mistake that we had before?
Why is Muscle's second alignment so different than ClustalW's?

Based on the second Muscle alignment:

Please discuss the relatedness of the various subgroups in the family - can we say its one protein family? Why or why not?
Please define patterns for the various subgroups in the neuropeptide region.

Now we're ready for motif definition.....

Hand in the outputs you printed - there should be five (don't forget to LABEL them with the name of the program and which input file), and a separate sheet with the answers to the questions as Assignment #5 (Do NOT paste in parts of the alignment or the jalview in the answer sheet!!!)