Working with genome browsers - Assignment #7
In this assignment you should prepare a report with the answers to various questions.
This report can be written in any text editor or word proccessor.

Read through the assignment carefully before starting!

Add your name and e-mail address on the front page!!!


This week we'll work with a gene from a family of interferon responsive genes, IFI27 and its relatives. These genes were found in a lab that is interested in response to infection. As the lab bioinformatics expert, you are given the mRNA sequence and its your job to find out what you can about the genomic structure and sequence.

The point of this assignment is to see some of the various functions of genome browsers.

The gene is mouse Ifi27l1 (IFI27 like 1). In mouse its not clear which gene is the direct homolog of the human IFI27, as there are different numbers of family members in different species. We'll do this search at the UC Santa Cruz Genome Browser.

We have to go to NCBI, and get the mRNA in fasta format. The accession number of the is: BC128276. Open and copy the sequence. Keep this window open in the background, you'll need it at the end.

Now we'll open a page at UC Santa Cruz.

Because you may be working on computers where either you or others have worked on the genome browser before, we'll reset the browser to make sure everything comes out the way we want it to. Mouse over the "Genome Browser" link in the blue tool bar on top of the page. Then click on the "Reset all user settings" link at the bottom of the pull down menu. It should reset to the latest version (hg38) of the human genome.

Click on Blat (from "Tools" in the top menu bar) and run a search against the mouse genome. Make sure you have the most recent version! Paste in the sequence you just copied (from NCBI), and click on the "submit" button.

  1. How many hits are there? On which chromosomes (don't forget to say which hit is which)?
  2. Which one is the true hit?
  3. From where to where (bp numbers of the mRNA) is the true hit?
  4. What is the genomic span?
  5. What strand is our gene on?
  6. What are the other hits?
Click on the "details" link of the longest hit.
  1. How many exons are in this gene?

Now let's figure out our gene structure:

  1. Make a table with exons, introns, locations on the mRNA (for the exons only of course!) an d the genomic regions. Use the "together" section at the bottom of the page to help you.

The chart should look something like this:
exon/intronlocation in mRNALocation in genomic DNA
exon 1 1-10015,354,876-15,354,976
intron 115,354,976-15,400,987
NOTE! the values in this table are made up! It's just an example!

  1. Which is the longest exon? the longest intron?

Go back to the previous window and click on the browser link

Now we have to make sure we have the right tracks open. First we'll concentrate on the genomic questions. Scroll down to the bottom of the page

Go to the section labeled "Mapping and Sequencing tracks." Open the "Contigs" "Assembly" and "Gap" to "full". Make sure that "Blat sequence" is also on full - that is our input sequence. Click on refresh (either from the top of the control panels, or the link on the right hand side of any of the blue bars dividing the track types.) (If you want more information on any of the maps, click on the title of the map in the control panels, and it will take you to an explanation page)

  1. What version of the genome are you using?
  2. What are the accession numbers of the map contig and "fragment" in this area?
  3. How many gaps are in this region?
Now we'll look at other tracks. Scroll down again, this time to the "Genes and gene prediction tracks" and close all of the tracks in this section - put them on "hide" - it'll make things easier to see. Then scroll down to "mRNA and EST tracks" and make sure that "mouse mRNAs" is on "full" (close the "Spliced EST track to make things easier).

Click "refresh"

Now we are going to compare what's available in the database to our gene. To answer these questions, compare the various tracks to "your sequence from blat". Look at the mouse mRNA track. Zoom out once (1.5x) to make sure you have the full length of the mRNA's (the browser by default opens to the full length of the sequence we input into blat).

  1. How many mRNA's are there that match our gene (including incomplete mRNA's)?
  2. What is the top sequence in the mRNA track?
  3. Which exons in our gene are alternately spliced?
  4. Looking at the full length mRNAs only, how many different gene structures do you see? (write them out exon 1 - exon 2-.....)

Let's see how the gene prediction programs fared in this region. Go to the Genes and Gene Prediction Tracks section, and open the following tracks to full: SGP, Geneid, and Genscan.

  1. How did the programs do? Describe the output of all of the programs, relate to their length, number of exons correct, number of incorrect exons. What do you think of gene predictions?

This gene family is different in different organisms, so lets see what the conservation of this member looks like. Go down to the track controls again. Go to "mRNA and EST tracks" and hide the mouse EST tracks if they're open. Go to the Comparative Genomics section and put Conservation on full. Click on the track title (the blue link) and enter the track controls. In the boxed section, in the top row, click the + next to Species selection, and make sure it chooses all the species. Click the submit button above the box.

  1. In which organisms do you see conservation of the whole gene? In which organisms do you see conservation of the all the coding exons (they start from exon 2)?
  2. Which part of the gene is the most conserved?

Now we want to see what is going on in human, so take the mouse protein (from the first page in NCBI that you had to open), and run a Blat against the human genome. (when you change the species to human on the blat page, wait for it to reload, and make sure that you see the Human Blat header on top.)

  1. How many hits do you have? To which part(s) of your protein are they?
  2. Click on the details, on the browser links (hint: you may want to open tracks to get some more information...), and answer, what do you think is going on here (why are there so many similar hits?) Explain your answer.