Instructions and Requirements for the Final Project |
DO NOT include the outputs in your discussion (refer to the results, do NOT copy/paste them). The text should include all the information necessary to understand your project, and the outputs should be in an appendix at the end.
Please print the final project on a Laser Printer (do not submit a handwritten paper).
The project is due NO LATER THAN Thursday, August 8, 2019, by 4pm. Bring it to the Levine building, Room 111. (You can bring it before the deadline too.)
If you have a problem with the deadline (for example: miluim, out of the country - not "I have another exam") speak to Shifra BY July 31. (Telephone x2470).
The computer classroom will be reserved for us after the end of the semester, through the due date at the regularly scheduled hours.
For the Project
Please report on which chromosome your sequence is located, on which strand the gene is located, whether the sequence is draft or finished (if your genome has draft), and the exon/intron structure of your gene (as far as you can tell from your results), any splice variants. Don't forget to explain ALL the hits (even those that are unexpected).
Run a similarity search of your Protein sequence against a Protein database using BLAST.
Please describe in your report:
What program you used, what database, scoring matrix and if you used the filter.
Look at the hits list: the distribution of the hits with the various E scores.
Look at the Alignments and report:
For the top 10 hits: relate to their length and % of identity (similarity). (in addition to e-score, organism, related proteins...)
Summary of the rest of the results (which organisms they come from, are they from the same family.....).
You need to test the validity of the last hit on the hits list from the database similarity search.
Please describe in your report:
Which sequences you compared, which program you used, % similarity and identity
what the alignment looks like and how it compares to the database search that found it
Don't forget that the algorithms have to match! (database search and pairwise - global or local)
When choosing the sequences for multiple alignment, choose sequences
that are 80% similar or less (if possible - if you have a particular reason why you want to use more similar sequences, discuss it with us first). (If you don't get less similiar from your database search, redo do it with more hits!)
Use at least 5 sequences in addition to yours.
According to the similarities to your query sequence, you'll choose the
method for the multiple alignment. You only have to use ONE method (you can use either clustalw, clustalo or muscle).
Please describe in your report:
What are the sequences used for the alignment, how similar are they to your sequence, what program was used.
Describe the results, namely pointing out regions which are conserved and regions which are variable.
Describe your results. (Don't forget to list which databases, what the hits are, where they are in your sequence....and the sequence signatures, if they are available.)
Can you say that the "motifs" found in your sequence are represented in the multiple alignment (if so, how well are they conserved? if not, why not?)
Summarize your findings.
1) your sequence
2) a printout of the genomic viewer with your blat hits (the blat output itself, in other words, the full list of hits, and the alignment and a printout of the genome browser with your hits visible on it for the best match to your gene)
3) the output of your translation program
4) The full hits list and at least the top ten alignments of your database search (you should also include the alignments of any other sequences you use later on - particularly the sequence you use for pairwise analysis)
5) your pairwise alignment output
6) your multiple alignment
7) a printout of your interpro results (the graphical view is enough, not all of the internal pages!)
Shifra shifra.ben-dor@weizmann.ac.il