Assignment 2: Sequence Download and Assembly

Read through the assignment carefully before you start!! It will make answering the questions easier!!

Add your name and e-mail address on top of the page!!!


A family with the STXBP1 mutation requested that you make a model mouse in order to test various treatment options, and to study the nature of the disease. Your use CRISPR to introduce the mutation, and due to the distance, have to add several silent muations as well. The first thing you do is sequence the region in mouse to make sure the wild type is the same as the published sequence. You set up a PCR and send it to the sequencing unit with the forward primer to ensure that your sequence is correct. Ideally, we would send it with both forward and reverse primers, but due to primer design considerations, in this case, we wouldn't be able to see the change in the reverse primer.

You receive an email that your sequence is waiting - now you have to download.

The sequences are in the moodle assignment folder (if you can't get in, send me an email)

Click the link on the main moodle course page: Sequences for Assignment 2. Download the whole DNASeq folder. On a Mac, it will either expand automatically, or you double click to expand. On a PC, you have to extract the files (it downloads as a zipped file) Open the DNASeq folder, there should be two subfolders inside:

Go into the wildtype sequence folder, and double click on the file ending in .seq

  1. How long is the sequence?
  2. How clean is the sequence (describe)?

Now we are going to look at the chromatograms.

Download a demo copy of Sequencher, unless your lab has a licensed copy, in which case you can use the "real" thing. If you are in Midrasha classroom B, it is already installed.

Click on the link and there is a button for download: http://www.genecodes.com/. Install the demo.

Open Sequencher. From the file menu, choose "Import - Sequences" and choose the .ab1 files from both of the sequence folders (one at a time).

You now have a project with two sequences in it. We will start with the wildtype sequence (the one that is called 103). The first thing we have to do is clean it up at the ends. Open up the sequence (double click on it) and then click the "Show Chromatogram" button on top of the window. Cut off the first 20 bases.
Answer the following questions (base your answers on the chromatograms):

  1. Why did we throw out the bases in the beginning of the sequences?
  2. Can you read the rest of the bases? Why or why not?

Now we want to correct what we can of the sequence. Use the chromatogram window to fix the N's (when you can tell for sure what the change should be. If its not clear, leave an N!)

  1. Give a list of the corrected bases (or N if not). There should be 14 ambiguous positions.

You now have a clean piece of sequence. The next step is to look at the mutant sequence. The mutation is lethal in the homozygous form, so we'll look at a heterozygote. Open the mutant sequence, open the chromatogram, cut off the first 20 bases, and fix the N's in the first half of the sequence.

  1. Do you have the same number of bases to fix as in the wild-type? Why?
  2. Now look at the N's at the end of the sequence, where we introduced the mutations. Can we figure out what the mutations are?

First we want to put them together in a contig. Close the open sequence and chromatogram windows. Choose both sequences from the main project window, and click "assemble automatically" (the button is on top of the project window).
Open the contig (double click on the icon). click on the bases link at the top of the window. In the consensus line (on the bottom of the contig window) click on the first ambiguous base (it has either a black dot or a + underneath it.) Click on the "Show Chromatograms" button on top of the contig window. Skip the first ambiguous base, and go the end of the sequence, where there are a bunch of differences.

Compare the sequences on the basis of the chromatogram, and use the wild type to find the mutant sequence.

  1. How many positions have doubled peaks?
  2. Some of the doubled peaks have an N, some don't. Some are labeled as ambiguous and some aren't. Can you explain why?
  3. Write down the mutant bases that the CRISPR/Cas9 introduced into the genomic sequence, in mutation format (wildtype > mutant, for example A > G). Make sure you look at all the positions with doubled peaks in the mutant!


Hand in the answers ONLY (not the questions!) as Assignment #2