Download There are a number of ways to find genes and gene information in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transposable element wikipedia , lookup

Oncogenomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Human genetic variation wikipedia , lookup

Genomics wikipedia , lookup

Protein moonlighting wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Epistasis wikipedia , lookup

Human genome wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Copy-number variation wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Minimal genome wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genetic engineering wikipedia , lookup

Genomic imprinting wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Point mutation wikipedia , lookup

History of genetic engineering wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

NEDD9 wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene therapy wikipedia , lookup

Genome editing wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene desert wikipedia , lookup

Gene wikipedia , lookup

Gene expression programming wikipedia , lookup

Public health genomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genome evolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome (book) wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Transcript
MMG 433
April 15, 2004
THE HUMAN GENOME PROJECT
KAREN FRIDERICI ([email protected])
This exercise is meant to acquaint you with just a small sample of the resources available
through the Human Genome Project. Please keep in mind that these powerful resources are
constantly changing and expanding so the look and links can change from year to year.
In this session we will examine two rather different human genes and find out how to find
mRNA and protein sequence, functional information, disease information, homology to other
species, and human variation. There are a number of ways to find genes and gene information in
the various databases and browsers that are available. There are three currently existing
browsers that use the databases supported by the human genome project. Each browser has a
different look and different strengths in the way they present the information.
The genes we will be examining are -actin and myosin 15. These are genes that interact with
each other but are very different in size and other physical and genomic features. The easiest
entry into the databases is by way of the NCBI home page. So you can begin by entering the
name of the protein you wish to study in the search bar.
Search Entrez For: gamma actin or myosin 15
This will bring you to a collection of databases that are available for you to explore. We will start
by finding out more about the gene and its structure.
Click on Gene: gene centered information
Find the Homo sapiens entry and click on that. This takes you to a page with lots of important
information. You should be able to fill in a large segment of the worksheet from this page.
Official Gene Name: A committee assigns official names for genes. When a new gene is
identified and a function is suggested the investigator can apply for a name. Human Genes are
named with capital letters; the Mouse Genes have the same letters but only the first letter is
uppercase, the rest lowercase letters.
As you make your way through various databases you will find that the search engines make
mistakes and link to the wrong genes. This is because there are lots of genes that are very similar
to each other. To make sure that you don’t get sidetracked, it is important to pay attention to the
reference numbers as well as the official names and also to know what other similar genes might
be floating around in the genomes. So it is a good idea to get the Gene ID, chromosomal location
and base position on the chromosome. Also of interest is knowing what the genomic region looks
like, specifically what other genes are nearby so find the nearest gene.
Now it is time to get some information about the mRNA and the protein. From the beginning
and end position in the chromosome one can determine the size of the mRNA transcript. There
is a graphic on the Entrez Gene page that gives you a graphic of the exon structure. You can
count the exons and tell which if any are non-coding. The direction of transcription is also
1
MMG 433
April 15, 2004
marked on the graphic (to the right is toward the q telomere and called forward, to the left is
toward the p telomere and is called reverse).
Reference sequences have been designated by the HGP to standardize the literature by allowing
authors to reference the same sequences. These are individually curated and receive a number
starting with N. NP=protein, NM=mRNA, NC=contig. If multiple splice forms are known then
there will be multiple NM numbers.
Now let’s consider the function of the gene you are studying. There are many ways to find out
the function but one of the easiest for getting started is to look at the summary provided. This
will usually tell you a bit about the function and a bit about other members of the gene family.
Of course for more complete information there are always the PubMed links.
Diseases caused by mutations in genes tell us a good deal about the function of a gene and there
is one major database that keeps track of genetic diseases. OMIM is Online Mendelian
Inheritance in Man and is a good source of both disease and gene information. At the bottom of
the page are Unigene links and MIM number. The MIM link will tell you more about a disease
caused by mutations in -actin.
To explore other data bases and the major genome browsers it is easy to link through Unigene
(way at the bottom of the page). If there are several unigenes listed be sure the one you pull up is
really for the gene we are studying and not some closely related gene. At UniGene you can find
the protein similarity of the gene in different species. Again it is important to use caution in
interpreting the similarity because the genes that are chosen for comparison in other species are
not always the orthologs. (look carefully in the case of myosin)
One of my favorite ways to get to the various browsers and other links is to use Locus Link.
Click on Locus Link. You will see that some of the information is duplicated from the other
sites. Look first at the homology information for the gene by clicking on Homol (homologene)
and examine the mouse (M musculus) protein graphic to look for protein domain and click on the
gene to look at the structure.
Back at Locus Link compare the browsers. The three major browsers are the NCBI browser
(MAP), the U Cal Santa Cruz browser (UCSC), and the Ensembl browser (e!) From here you can
go to the UCSC database. (Use only the UCSC database for actin; the Ensembl site is confused
about this gene.) Choose the refseq gene for the browser to display. This is a very powerful
browser that can display an amazing amount of information about the structure of a gene. To
customize the information you should go to the lower part of the page and enter hide in all the
tracks then turn on the following (and whatever else might interest you!)
Base Position = On
Chromosome Band = dense
Map Contigs = full
Gap = pack
Known Genes = pack
RefSeq Genes = pack
2
MMG 433
April 15, 2004
MGC Genes = pack (These are IMAGE clones that are available through a number of
suppliers)
Human mRNAs = squish
Notice that not all of the ACTG1 Known Genes are full length. Sometimes these represent
splice variants; sometimes these are just incomplete clones or partial sequences. Zoom in and
out from the top bar to see other genes in the region.
Try zooming out using the buttons at the top of the screen. How far away from actin is the first
Gap? This is a gap in the contigs that define the region. For some reason there are no
overlapping clones to link the two assemblies together.
Add the STS markers to the display by clicking on the STS track in the lower part of the display
and choose pack. The genetic markers are shown in blue. What is the closest marker? Is it in
the same assembly as the Actin gene? (you will need to zoom out to find these)
Remember that this is a browser that is accessing much of the same information as the NCBI
browser. You can try clicking on some of the links to see where they take you. Click on the
RefSeq Gene. The most interesting link here is the mRNA/Genomic Alignment so click on that.
What you see first is the cDNA sequence with the 5’UTR and 3’ UTR in red and the coding
sequence in blue. The location of the introns is shown in yellow in the UTR and in teal in the
coding region. If you click on the Human.chr17 link on the left of the page you can see the
genomic region surrounding the exons. This is the DNA that is copied in the RNA transcript and
subsequently spliced out before the mRNA is delivered to the cytoplasm. Review the answers
you gave regarding the number of exons, the occurrence of non-coding exons, the size of the
transcript and the position of the gene in the chromosome.
Back at Locus Link there is one more database you should be aware of. There is currently a
good deal of interest in the genetic variation between two individuals and how that variation
might affect a person’s susceptibility to disease. One of the current thrusts of the HGP is to
develop a haplotype map for human populations. The initial stages have begun and information
about variation within a gene can be accessed. From the Locus Link site click on VAR. Here
you will find a graphic of the genomic region. You can mouse over the various SNP sites in the
graphic and it will give you the heterozygosity. The color coding tells you what region of the
gene the variation is in. Below is information about each SNP and a link to the sequence. Most
of the SNPs are not yet verified and may exist at very low heterozgosity levels so it is important
to have this information if one wants to distinguish between alleles of a gene.
At Locus Link one can access all three of the genome browsers. Please take a few minutes to
explore the graphics and interfaces in the other browsers.
What differences did you notice between these two genes did you notice while you were
gathering information?
3
MMG 433
WORKSHEET
April 15, 2004
NAME:
-Actin
Myosin VII
GENE
Official Gene name
Gene ID
Chromosomal location
Base position
Begin
End
What is the nearest gene?
TRANSCRIPT
Size of transcript
Number of exons
How many are not coding exons?
Direction of transcription?
Is the nearby gene transcribed in the
same direction?
What is the Reference mRNA number?
How big is the protein?
FUNCTION
What is the Reference Protein number?
Does this gene belong to a gene family?
Name some other family members.
What is the function of the gene?
Which disease(s) are associated with
mutations in the gene?
What is the OMIM number?
4
MMG 433
April 15, 2004
-Actin
Myosin VII
OTHER SPECIES INFORMATION
Homology to mouse?
Protein similarity-%
What are the common protein
domains?
Is genome structure similar?
# of exons?
(check out nearby genes too)
Homology to Drosophila?
Protein similarity-%
Homology to Yeast?
Protein similarity-%
GENE MAPPING
What is the name of the nearest genetic
marker?
Where is the nearest gap in the contig
Is the marker in the same contig as
the gene?
VARIATION INFORMATION
How many coding seq SNPs with
heterozygosity >0.10?
Are any amino acids changed?
How many intron seq. SNPs with
heterozygosity>0.01?
Interesting differences??
5