Download Finding Protein-Coding Genes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Non-coding DNA wikipedia , lookup

Essential gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

NEDD9 wikipedia , lookup

Transposable element wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Gene therapy wikipedia , lookup

Oncogenomics wikipedia , lookup

Genomic library wikipedia , lookup

Gene nomenclature wikipedia , lookup

Human genome wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene desert wikipedia , lookup

Gene expression programming wikipedia , lookup

Ridge (biology) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genomic imprinting wikipedia , lookup

Public health genomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Pathogenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Helitron (biology) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene wikipedia , lookup

Genome editing wikipedia , lookup

RNA-Seq wikipedia , lookup

Minimal genome wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Assignment 7: Finding protein-coding genes
The purpose of this exercise is to illustrate some of the concepts in the lectures and
readings by using web servers to annotate genes. As with all my assignments, if your
interests lead you in a different direction, you are free to follow that direction as long as
it deals with gene annotation. You may do the assignment on genomic regions from
ANY organism (including bacteria, plants, and fungi) but you will probably have to do
more independent investigation than if you choose to use the assigned sequence. Of
course, please tell me what you did. The report from this exercise should be around two
to four pages, including figures. Quantitative answers are preferable to qualitative ones.
Describe your observations in your own words, and cite your sources for information.
Pick a genetic locus (single gene or multiple genes) that you are interested in. You can
choose the locus from any organism. The following description of the assignment is
based on a gene that almost everyone is interested in at some level, TP53. This gene
encodes a transcription factor, “tumor protein 53”, that regulates several aspects of cell
growth. It is also frequently mutated in many cancers. If you have no better preference,
then work on TP53. It and some adjacent genes are located at chr17:7,550,0017,608,000 in the GRCh37/hg19 assembly of the human genome. This 58 kb sequence
(in fastA format) is at the Angel course site.
(1) Run the sequence through Genscan to find the predicted genes. Genscan and the
associated server were developed by Chris Burge (now at MIT) and it is still supported
there:
http://genes.mit.edu/GENSCAN.html
If you are working with a bacterial sequence, try Glimmer (Salzberg lab); you can use
the server at NCBI:
http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi
Briefly state how the gene predictions were produced, and describe the results of the
gene predictions.
(2) Now compare these results to (a) evidence of transcription and (b) gene models built
by a comprehensive pipeline, such as “UCSC genes” or “GENCODE”. A good way to do
this is to examine tracks in the UCSC Genome Browser for
- A comprehensive pipeline, such as “UCSC genes” or “GENCODE”
- mRNA data
- Genscan predictions
- results of RNA-seq
Describe the gene annotations from these different sources. What similarities and
differences to you see? What is the basis for the differences? (This is asking about the
power and limitations – the good points and not-so-good points – about the different
methods.)
1
To help in getting started, I have shared a “browser session” with you. Copying and
pasting the following URL into your internet browser will open a view at the UCSC
Genome Browser.
http://genome.ucsc.edu/cgibin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=rosshardison&hgS_oth
erUserSessionName=TP53andFlanks
This is a good starting point, but I encourage you to explore these tracks, change the
settings, open other tracks, etc. This is an opportunity to delve more deeply into the
material we covered.
2