Instructions and Requirements for the Final Project
to be submitted by August 8, 2019

The project is due NO LATER THAN Thursday, August 8, 2019, by 4pm. Bring it to the Levine building, Room 111. (You can bring it before the deadline too.)

If you have a problem with the deadline (for example: miluim, out of the country - not "I have another exam") speak to Shifra BY July 31. (Telephone x2470).

The computer classroom will be reserved for us after the end of the semester, through the due date at the regularly scheduled hours.

For the Project

  1. Choose any mRNA (your favorite gene, a project from your lab, something that always interested you....). The species must be one that is in the UCSC genome browser. Bacteria and Viral sequences are not acceptable, if you want to work with plants, ask us. To make life easier, try not to take a long gene!

  2. Run a search with your sequence against the appropriate genome database.

    Please report on which chromosome your sequence is located, on which strand the gene is located, whether the sequence is draft or finished (if your genome has draft), and the exon/intron structure of your gene (as far as you can tell from your results), any splice variants. Don't forget to explain ALL the hits (even those that are unexpected).

  3. Take the mRNA and run a translation program - find the open reading frame and translate it to a Protein.

  4. Database similarity search

    Run a similarity search of your Protein sequence against a Protein database using BLAST.

    Please describe in your report:

    What program you used, what database, scoring matrix and if you used the filter.

    Look at the hits list: the distribution of the hits with the various E scores.

    Look at the Alignments and report:

    For the top 10 hits: relate to their length and % of identity (similarity). (in addition to e-score, organism, related proteins...)

    Summary of the rest of the results (which organisms they come from, are they from the same family.....).

  5. Pairwise comparison

    You need to test the validity of the last hit on the hits list from the database similarity search.

    Please describe in your report:

    Which sequences you compared, which program you used, % similarity and identity
    what the alignment looks like and how it compares to the database search that found it
    Don't forget that the algorithms have to match! (database search and pairwise - global or local)

  6. Multiple Alignment

    When choosing the sequences for multiple alignment, choose sequences that are 80% similar or less (if possible - if you have a particular reason why you want to use more similar sequences, discuss it with us first). (If you don't get less similiar from your database search, redo do it with more hits!)
    Use at least 5 sequences in addition to yours.
    According to the similarities to your query sequence, you'll choose the method for the multiple alignment. You only have to use ONE method (you can use either clustalw, clustalo or muscle).

    Please describe in your report:

    What are the sequences used for the alignment, how similar are they to your sequence, what program was used.

    Describe the results, namely pointing out regions which are conserved and regions which are variable.

  7. Use the InterProScan to look for motifs in your Protein sequence.

    Describe your results. (Don't forget to list which databases, what the hits are, where they are in your sequence....and the sequence signatures, if they are available.)

    Can you say that the "motifs" found in your sequence are represented in the multiple alignment (if so, how well are they conserved? if not, why not?)

    Summarize your findings.

Your outputs should include:

1) your sequence
2) a printout of the genomic viewer with your blat hits (the blat output itself, in other words, the full list of hits, and the alignment and a printout of the genome browser with your hits visible on it for the best match to your gene)
3) the output of your translation program
4) The full hits list and at least the top ten alignments of your database search (you should also include the alignments of any other sequences you use later on - particularly the sequence you use for pairwise analysis)
5) your pairwise alignment output
6) your multiple alignment
7) a printout of your interpro results (the graphical view is enough, not all of the internal pages!)


For questions or suggestions please contact:

Shifra shifra.ben-dor@weizmann.ac.il