* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download There are a number of ways to find genes and gene information in
Transposable element wikipedia , lookup
Oncogenomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Human genetic variation wikipedia , lookup
Protein moonlighting wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Human genome wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Minimal genome wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genetic engineering wikipedia , lookup
Genomic imprinting wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Point mutation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene therapy wikipedia , lookup
Genome editing wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene desert wikipedia , lookup
Gene expression programming wikipedia , lookup
Public health genomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene nomenclature wikipedia , lookup
Genome evolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome (book) wikipedia , lookup
Microevolution wikipedia , lookup
MMG 433 April 15, 2004 THE HUMAN GENOME PROJECT KAREN FRIDERICI ([email protected]) This exercise is meant to acquaint you with just a small sample of the resources available through the Human Genome Project. Please keep in mind that these powerful resources are constantly changing and expanding so the look and links can change from year to year. In this session we will examine two rather different human genes and find out how to find mRNA and protein sequence, functional information, disease information, homology to other species, and human variation. There are a number of ways to find genes and gene information in the various databases and browsers that are available. There are three currently existing browsers that use the databases supported by the human genome project. Each browser has a different look and different strengths in the way they present the information. The genes we will be examining are -actin and myosin 15. These are genes that interact with each other but are very different in size and other physical and genomic features. The easiest entry into the databases is by way of the NCBI home page. So you can begin by entering the name of the protein you wish to study in the search bar. Search Entrez For: gamma actin or myosin 15 This will bring you to a collection of databases that are available for you to explore. We will start by finding out more about the gene and its structure. Click on Gene: gene centered information Find the Homo sapiens entry and click on that. This takes you to a page with lots of important information. You should be able to fill in a large segment of the worksheet from this page. Official Gene Name: A committee assigns official names for genes. When a new gene is identified and a function is suggested the investigator can apply for a name. Human Genes are named with capital letters; the Mouse Genes have the same letters but only the first letter is uppercase, the rest lowercase letters. As you make your way through various databases you will find that the search engines make mistakes and link to the wrong genes. This is because there are lots of genes that are very similar to each other. To make sure that you don’t get sidetracked, it is important to pay attention to the reference numbers as well as the official names and also to know what other similar genes might be floating around in the genomes. So it is a good idea to get the Gene ID, chromosomal location and base position on the chromosome. Also of interest is knowing what the genomic region looks like, specifically what other genes are nearby so find the nearest gene. Now it is time to get some information about the mRNA and the protein. From the beginning and end position in the chromosome one can determine the size of the mRNA transcript. There is a graphic on the Entrez Gene page that gives you a graphic of the exon structure. You can count the exons and tell which if any are non-coding. The direction of transcription is also 1 MMG 433 April 15, 2004 marked on the graphic (to the right is toward the q telomere and called forward, to the left is toward the p telomere and is called reverse). Reference sequences have been designated by the HGP to standardize the literature by allowing authors to reference the same sequences. These are individually curated and receive a number starting with N. NP=protein, NM=mRNA, NC=contig. If multiple splice forms are known then there will be multiple NM numbers. Now let’s consider the function of the gene you are studying. There are many ways to find out the function but one of the easiest for getting started is to look at the summary provided. This will usually tell you a bit about the function and a bit about other members of the gene family. Of course for more complete information there are always the PubMed links. Diseases caused by mutations in genes tell us a good deal about the function of a gene and there is one major database that keeps track of genetic diseases. OMIM is Online Mendelian Inheritance in Man and is a good source of both disease and gene information. At the bottom of the page are Unigene links and MIM number. The MIM link will tell you more about a disease caused by mutations in -actin. To explore other data bases and the major genome browsers it is easy to link through Unigene (way at the bottom of the page). If there are several unigenes listed be sure the one you pull up is really for the gene we are studying and not some closely related gene. At UniGene you can find the protein similarity of the gene in different species. Again it is important to use caution in interpreting the similarity because the genes that are chosen for comparison in other species are not always the orthologs. (look carefully in the case of myosin) One of my favorite ways to get to the various browsers and other links is to use Locus Link. Click on Locus Link. You will see that some of the information is duplicated from the other sites. Look first at the homology information for the gene by clicking on Homol (homologene) and examine the mouse (M musculus) protein graphic to look for protein domain and click on the gene to look at the structure. Back at Locus Link compare the browsers. The three major browsers are the NCBI browser (MAP), the U Cal Santa Cruz browser (UCSC), and the Ensembl browser (e!) From here you can go to the UCSC database. (Use only the UCSC database for actin; the Ensembl site is confused about this gene.) Choose the refseq gene for the browser to display. This is a very powerful browser that can display an amazing amount of information about the structure of a gene. To customize the information you should go to the lower part of the page and enter hide in all the tracks then turn on the following (and whatever else might interest you!) Base Position = On Chromosome Band = dense Map Contigs = full Gap = pack Known Genes = pack RefSeq Genes = pack 2 MMG 433 April 15, 2004 MGC Genes = pack (These are IMAGE clones that are available through a number of suppliers) Human mRNAs = squish Notice that not all of the ACTG1 Known Genes are full length. Sometimes these represent splice variants; sometimes these are just incomplete clones or partial sequences. Zoom in and out from the top bar to see other genes in the region. Try zooming out using the buttons at the top of the screen. How far away from actin is the first Gap? This is a gap in the contigs that define the region. For some reason there are no overlapping clones to link the two assemblies together. Add the STS markers to the display by clicking on the STS track in the lower part of the display and choose pack. The genetic markers are shown in blue. What is the closest marker? Is it in the same assembly as the Actin gene? (you will need to zoom out to find these) Remember that this is a browser that is accessing much of the same information as the NCBI browser. You can try clicking on some of the links to see where they take you. Click on the RefSeq Gene. The most interesting link here is the mRNA/Genomic Alignment so click on that. What you see first is the cDNA sequence with the 5’UTR and 3’ UTR in red and the coding sequence in blue. The location of the introns is shown in yellow in the UTR and in teal in the coding region. If you click on the Human.chr17 link on the left of the page you can see the genomic region surrounding the exons. This is the DNA that is copied in the RNA transcript and subsequently spliced out before the mRNA is delivered to the cytoplasm. Review the answers you gave regarding the number of exons, the occurrence of non-coding exons, the size of the transcript and the position of the gene in the chromosome. Back at Locus Link there is one more database you should be aware of. There is currently a good deal of interest in the genetic variation between two individuals and how that variation might affect a person’s susceptibility to disease. One of the current thrusts of the HGP is to develop a haplotype map for human populations. The initial stages have begun and information about variation within a gene can be accessed. From the Locus Link site click on VAR. Here you will find a graphic of the genomic region. You can mouse over the various SNP sites in the graphic and it will give you the heterozygosity. The color coding tells you what region of the gene the variation is in. Below is information about each SNP and a link to the sequence. Most of the SNPs are not yet verified and may exist at very low heterozgosity levels so it is important to have this information if one wants to distinguish between alleles of a gene. At Locus Link one can access all three of the genome browsers. Please take a few minutes to explore the graphics and interfaces in the other browsers. What differences did you notice between these two genes did you notice while you were gathering information? 3 MMG 433 WORKSHEET April 15, 2004 NAME: -Actin Myosin VII GENE Official Gene name Gene ID Chromosomal location Base position Begin End What is the nearest gene? TRANSCRIPT Size of transcript Number of exons How many are not coding exons? Direction of transcription? Is the nearby gene transcribed in the same direction? What is the Reference mRNA number? How big is the protein? FUNCTION What is the Reference Protein number? Does this gene belong to a gene family? Name some other family members. What is the function of the gene? Which disease(s) are associated with mutations in the gene? What is the OMIM number? 4 MMG 433 April 15, 2004 -Actin Myosin VII OTHER SPECIES INFORMATION Homology to mouse? Protein similarity-% What are the common protein domains? Is genome structure similar? # of exons? (check out nearby genes too) Homology to Drosophila? Protein similarity-% Homology to Yeast? Protein similarity-% GENE MAPPING What is the name of the nearest genetic marker? Where is the nearest gap in the contig Is the marker in the same contig as the gene? VARIATION INFORMATION How many coding seq SNPs with heterozygosity >0.10? Are any amino acids changed? How many intron seq. SNPs with heterozygosity>0.01? Interesting differences?? 5