Assignment 2: Sequence Download and Assembly

Read through the assignment carefully before you start!! It will make answering the questions easier!!


A colleague of yours had a child that was recently diagnosed with a rare developmental disease. The DNA of family members was sequenced, and they have a novel mutation. Your job is to make a mouse model of the human mutation in order to study possible treatments. The first thing you do is sequence the region in mouse to make sure the wild type is the same as the published sequence. You set up a PCR and send it to the sequencing unit with primers in both directions to be sure your sequence is correct.

You receive an email that your sequence is waiting - now you have to download.

From inside the Weizmann firewall you would usually download from the susanc server: http://susanc.weizmann.ac.il
For this assignment or from outside the Weizmann firewall: http://noys.weizmann.ac.il/dlims/

Log in with the userid introbioinfo - the password is the same as the userid.

Click on the DNAseq tab on top, choose all of the sequences, and click download

If you are working on a Mac, it should download a folder "DNAseq" - some computers will not open it automatically, it will just download. Usually a double click will upzip it, and put the files in a folder.
If you are working on a PC, it should download a zipped file whose name starts with "DNAseq". Unzip the file, and then you'll have the folder "DNAseq". Double clicking may be enough, and you may have to use a program, such as 7-Zip (if your computer doesn't have it, it can be downloaded free at: https://www.7-zip.org)

Open the folder, and there should be four subfolders:

Go into the folder called 9651_45-L-RFLP-L and double click on the file ending in .seq

  1. How long is the sequence?
  2. How clean is the sequence (describe)?

Now we are going to look at the chromatograms.

Download a demo copy of Sequencher, unless your lab has a licensed copy, in which case you can use the "real" thing. If you are in midrasha classroom B, it is already installed.

Click on the link and there is a button for download: http://www.genecodes.com/. Install the demo.

Open Sequencher. From the file menu, choose "Import - Sequences" and choose the .ab1 files from all of the sequence folders (one at a time).

You now have a project with four sequences in it. We will start with the wildtype sequences (the ones that start with 9651). The first thing we have to do is clean them up at the ends. Open up the Forward sequence (double click on it) and then click the "Show Chromatogram" button on top of the window. Cut off the first 20 bases. Do the same for the reverse sequence.
Answer the following questions (base your answers on the chromatograms):

  1. Why did we throw out the bases in the beginning of the sequences?
  2. Can you read the rest of the bases? Why or why not?
Now we want to put them together in a contig. Choose both sequences from the main project window, and click "assemble automatically" (the button is on top of the project window).
Open the contig (double click on the icon). click on the bases link at the top of the window. In the consensus line (on the bottom of the contig window) click on the first ambiguous base (it has either a black dot or a + underneath it.) Click on the "Show Chromatograms" button on top of the contig window.

Compare the sequnces on the basis of the chromatogram, and correct the wrong bases. Be very careful when you edit contigs!! If you are standing on the base in the consensus, it will change both sequences, if you only want to change one click on that position in the particular sequence!

  1. Give a list of the correct bases (there should be 18 ambiguous positions, though there may be more bases, if a base was skipped.)

You now have a clean piece of sequence. The next step is to look at the mutant sequence. We'll start with the homozygote to make things easier to see. Open the sequence, open the chromatogram, cut off the first 20 bases, and fix the N's. Make sure you look over the whole sequence, N's aren't always in the beginning!

  1. How many bases did you fix?

Now we'll combine this with the sequences we already have. Close the various open windows except for the main project window. Click on the contig and on the sequence you just corrected, and click on the "Assemble automatically" button. Open the new contig, and the chromatograms.

  1. How many different bases are there?
  2. List them in mutation format (wildtype > mutant, for example A > G).
  3. Most of the mutations are clustered but one is separate - what can you guess about the separate one?

Go back to the bases window, and click on the overview button on top. Go to the View menu on the top of the program and choose "Show Codon Map"

  1. Given that this is cDNA, which stand has an open reading frame all the way through?

Now we'll add the heterozygote sequence. Open and clean it as you have done before, but only when you clearly know what the base is. This time cut off the first 30 and the last 10 bases, and add it to the contig. It shouldn't go in.

Now we'll try to get it to go in. Click on Assembly parameters and choose large gap. Try to assemble. If it doesn't work, drop the minimum match percentage, 5% at a time until it does go in. Open the new contig, bases and chromotograms.

  1. Why didn't it go in? (There are at least two reasons. One you can see in the chromotogram, one you have to think about what you know of the sequences).

Scroll to the end of the contig, to the cluster of bases that was different before. Look very carefully at the chromotograms.

  1. How many more mutations are there than what we saw before (questions 7 and 8?)
  2. List the new ones in mutation format as before.
  3. How many of these were listed as N's? Why or why not?

Hand in the answers to the questions ONLY (do NOT write the answers in on a printout of the assignment) as Assignment #2