* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download What is bioinformatics? - The British Association of Sport and
Epigenetics of human development wikipedia , lookup
Molecular cloning wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene desert wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Frameshift mutation wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Transposable element wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Oncogenomics wikipedia , lookup
DNA vaccination wikipedia , lookup
Gene therapy wikipedia , lookup
Minimal genome wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Epigenomics wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
SNP genotyping wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Gene expression profiling wikipedia , lookup
Primary transcript wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Pathogenomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genetic engineering wikipedia , lookup
Genomic library wikipedia , lookup
Human genetic variation wikipedia , lookup
Microsatellite wikipedia , lookup
Metagenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Public health genomics wikipedia , lookup
Genome (book) wikipedia , lookup
Point mutation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Human genome wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome evolution wikipedia , lookup
Designer baby wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome editing wikipedia , lookup
Microevolution wikipedia , lookup
Molecular Exercise Physiology Bioinformatics Presentation 5 Henning Wackerhage Learning outcomes At the end of this presentation, you should be able to: • Find information on any DNA sequence, gene, RNA, protein in various species online. This information includes the position of genes in the genome, the function of the proteins, their role in disease. • Carry out blast searches to identify homologous sequences. • Explain the cause of genetic variability. • Explain how a microarray experiment is carried out. This presentation will be supported by a computer practical in bioinformatics. Please revise this presentation carefully before the practical or otherwise you will struggle. Bioinformatics Part 1 Why study bioinformatics? Introduction The human and many other genomes have now been sequenced and this data has been deposited online. In addition, there is a wealth of information on genes and their products on networked computers. Numerous programmes that allow you to analyse this data do also exist. Most of this data is freely accessible online via user-friendly computer programmes. It is easy to download the DNA sequence for any gene that might respond to exercise or to find reliable information on a protein that is involved in the response to exercise. In this presentation, you will learn how to find, use and analyse this information. You will mainly learn by doing and you will sometimes need to be stubborn and click numerous button using the trial and error method to finally get the information you want. It is not rocket science but it will require stamina and patience at times! What is bioinformatics? NIH bioinformatics definition: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyse, or visualise such data. Why study bioinformatics? Why should a sports biomedicist study bioinformatics? The DNA encodes all the information necessary for letting cells develop into a functioning organism. The DNA thus also encodes all the organs involved in exercise and their adaptive response to exercise. In addition, the differences in the DNA between two individuals encodes the differences in the structure and function of the two organisms; this includes differences such as differences in muscle size, adaptation to training or in the motor regions of the nervous system. Therefore, bioinformatics will help us among other to: a) Identify differences in the DNA sequence (i.e. single nucleotide polymorphisms) between individuals that correlate with athletic talent or the extent of adaptation to exercise; b) Discover the regulatory mechanisms that mediate the adaptation to exercise; c) Interpret the results of microarray experiments where the expression of thousands of genes is measured in response to exercise. Bioinformatics Part 2 Genome viewing Genomes online The genomes for many prokaryote, eukaryote, plant, invertebrate and vertebrate model species have now been sequenced. The DNA sequences of these genomes have been posted online. However, these websites contain much more than just the “naked” DNA sequence which has limited use. With the help of special computer algorithms, genes (exons, introns) have been identified based by using available research information and by de novo prediction. Identified genes have been linked to various other sites including those that list information on the same gene in other species, the gene product (protein databases), PubMed, disease databases etc. Genome browsers are therefore powerful tools not only for the specialist but also for the essay-writing student. The following website shows an incomplete tree of sequenced genomes and the slide thereafter the information available on genenome browsers. Genomes online (incomplete) http://www.ncbi.nlm.nih.gov/mapview Genomes online Online Mendelian Inheritance in Man (OMIM) PubMed: Reference search Full-text electronic journals Nucleotide sequences 3D Structures Maps & Genomes Protein sequences MQKLQLCVY … Taxonomy Genomes online The by largest project was the human genome project, the sequencing of our own DNA sequence. Some findings are surprising: • Human genome size: about 3,200 Mb (mega bases). • Gene numbers: human: 31,000, yeast: 6000, fly: 13,000, worm: 18,000, plant: 26,000. • Only 1.1 to 1.4 % of the human sequence encodes protein. The rest is non-coding. • 28 % of the sequence is transcribed into RNA (5 % of this is translated into proteins). • Only 94 of 1,278 protein families are specific to vertebrates. • Why do we differ? Humans differ from another by about one base pair per thousand: single nucleotide polymorphisms (SNPs). Human genome project Landmarks 1953 Watson-Crick structure of DNA published 1975 F.Sanger, and independently A. Maxam and W. Gilbert, develop methods for sequencing DNA 1981 Human mitochondrial DNA sequenced: 16560 base pairs 1990 International Human Genome launched – target horizon 15 years 1991 J.C. Venter and colleagues identify active genes via expressed sequence tags (ESTs) 2000 Joint announcement of complete sequence of human genome 2003 Completion of human genome Project draft Major genome browsers You can browse genome data using one of the following browsers. We will mainly use Ensembl, the European and user-friendly version: www.ensembl.org www.ncbi.nlm.nih.gov http://genome.cse.ucsc.edu/ Task: Enter each of these websites and just click many buttons and see what information you can obtain. We will mainly use the Ensembl website. Searching for gene information OK, browsing the genome browsers and clicking on chromosomes is pretty simple. However, you will most of the time search for a specific gene where you do not know the genomic location. In these cases, you will have to use a search engine and type the name of the gene or protein in. To do so, open the Ensembl website (www.ensembl.org) and click the species, normally human. On the top of the page it states “Search for anything with” and a box follows where you have to type in your search term. Click “Lookup” and you will obtain results. Worked example: Type in “malate dehydrogenase” and click “lookup”. Many items will be listed starting with “9 matches in the homo sapiens disease index”. However, you are interested in the gene. Therefore scroll down until you see “170 matches in the Homo sapiens Gene index”. The first entry under this heading is “Malate dehydrogenase, cytoplasmic (EC 1.1.1.37)”. Other isoforms of this enzyme are listed as well and you might have to get more information now in what isoform you are interested. Searching for gene information On the Ensembl human genome website, enter “troponin” into the search box. Find the following gene among the search results: Troponin C, skeletal muscle Click this and a website with numerous clickable links will appear. Task 1: Click “Export gene data in EMBL, GenBank or FASTA”. Scroll down, select output format “text” and “export”. The DNA sequence of the gene will appear. You can now analyse the sequence or design primers for the polymerase chain reaction (PCR) Task 2: Return to the “Troponin C, skeletal muscle” website. Now click “MIM” (or OMIM). It stands for “(Online) Mendelian Inheritance in Man. Read the paragraph. It will inform you about research on the gene. The text on troponin is very short compared to other texts e.g. on major disease genes. Searching for gene information Task 3: Click “LocusLink”. The following bar will appear: Click on each window and produce the following information: a) Who has carried out a structural analysis of the human troponin C gene? b) There is an ion binding motif on the molecule. For what ion? c) Name a gene that is a neighbour on the chromosome. d) What is the percent homology (similarity of the DNA sequence) between the human and rat troponin C genes? Bioinformatics Part 3 Genetic variability Genetic variation By now, you may have asked yourself the following question: “How can they list one human genome sequence if we are all different? Surely, our genomes will be different?” Good question and yes, we are different. We differ because of nature and nurture and the nature bit is due to differences in the DNA between human beings. Most of these differences in the DNA sequence do not occur at random but at fixed positions approximately all 1300 base pairs (bp). They are called single nucleotide polymorphisms (SNPs, pronounced “snips”). There are roughly 2,500,000 SNPs in the human genome. Variation in the human species: mainly the result of SNPs. Genetic variation Worked example: I have used Ensembl and have picked the following SNP. During sequencing (each sequence is sequenced several times), the investigators note that there is a base pair which is sometimes sequenced as an adenine (A) or a thymine (T) with high variability. An ambiguity code W was used to indicate this in the final sequence: Alleles: A|T (ambiguity code: W) Sequence Region: CACAACTGCTTGGAWAAAACAGGATAG SNPs are not the only source of genetic variation. Here is an example for a deletion mutation with some bases missing: Deletion: Insertion: TCAAGGTATTCTTCA AAAAGGTCCCAACCC TCAAGGTATTCTTCAGATTCTAAAAGGTCCCAACCC Genetic variation Do all SNPs lead to a change in phenotype? No! Remember that only <2 % of human DNA encodes proteins and that a lot of DNA is non-coding or intergenic DNA. A SNP or deletion in a DNAsequence with “no” function will probably not have a noticeable effect. Which of the following SNPs (1-5) are likely to cause a change in the expression or structure of the protein encoded by the gene? Gene DNA Enhancer Promoter Start Exon Intron SNPs 1 2 3 4 5 Exon Termination Genetic variation Enhancer Promoter Start Exon DNA SNPs 1 2 3 4 Intron Exon Termination 5 Answer: SNP1. This SNP could affect the binding of transcription factors to the enhancer and thus the expression of the gene. SNP2. This SNP lies in a non-functional region and will probably have no effect. It could affect histone binding, though! SNP3. This SNP could affect the binding of the transcriptional machinery (esp. RNA polymerase II) to the promoter SNP4. This SNP is in an exon and will code an amino acid. However, it will only have an effect if the change triplet will encode a different amino acid (e.g. AGA and AGG both encode arginine). SNP5. This SNP will be spliced out and therefore it will not have an effect. Find a SNP! Worked example: Find SNPs that lie in the exons of the myostatin gene, whose protein product is a potent muscle growth inhibitor.First search for “myostatin”. There is another abbreviation for myostatin which is GDF-8. Click “view gene in genomic location”. Lower on the page you will find a features menu. Open, cross the SNPs box and close SNPs again. The following window opens and you see the coding, untranslated (UTR) and intronic SNPs. You can additionally open “human proteins” or “EMBL mRNAs” to see where the myostatin gene lies. There are two SNPs in the myostatin Exon. Find a SNP! Worked example: If you click on a snip, a new window appears. You will find the SNP in the genomic sequence GTAARGGCC where R stands for a A|G polymorphism. You also find the following figure: Myostatin (GDF8) gene (3 exons shown in dark red) Coding SNPs with R (A|G) ambiguity Find a SNP! Task: How many SNPs do you find in the exons and introns of the human histidine decarboxylase (EC 4.1.1.22) gene? By the way, what does the EC number stand for? How to detect genetic variation? So far, studies investigating the relation between genetic variation and e.g. disease have focussed on dramatic mutations like frameshift mutations, deletion/insertion mutations rather than the more subtle SNPs. Larger mutations are easier to detect and the effects are usually more dramatic. How to detect genetic variation? Method: DNA can be obtained from nuclear blood cells. The correct DNA will be excised and amplified using the polymerase chain reaction with so-called primers that will only amplify a specific DNA sequence. Here, a DNA fragment either with a deletion (D) or insertion (I) mutation of the Angiotensin-converting enzyme (ACE) gene has been amplified and electrophoresed. Angiotensin II is a known inducer of cardiac hypertrophy. Because we have two copies of each gene, the combinations DD, ID or II are possible. In this study, DD patients had a larger left ventricle (heart) than ID and II patients. (figure from Lechin et al. 1995) Genetic variation and performance Figure. Montgomery et al. (1998) measured the genotype of the angiotensin converting enzyme gene, where an insertion/deletion mutation exists. The left shows the PCR results for the three gentypes DD, ID and II (taken from Lechin et al. 1995). The right figure shows the relation between the genotype and the increase in repetitive elbow flexion in response to a specific 10 week training programme among British army recruits. The data suggest that a DD genotype is associated with low, and ID with medium and a II genotype with high trainability for this specific task. Actinin genotype and performance Actinin (ACTN) is an actin-binding protein and the two ACTN2 and ACTN3 isoforms are found in skeletal muscle. Yang et al. (2003) reported the association of a ACTN3-RR and ACTN3-RX genotype with power athletes (these athletes have more ACTN3). Bioinformatics Part 4 Homology searches Homologies Worked example: You have sequenced the following human DNA fragment and you want to know more about it: AAAACATCTATCTTGCTGTGTTTGGACAGGCCAGCCCCTGAAACATCTTGGGCAATGGAGGGTTAACTT CTCAAAGTTTAATAGGCAAGACCAGCAACCATGCAACAAGGTAAATTGTCCTCACGAGAACTCCAAAGA CTATTTTTCTCTCTCTTTTTTTGAGGCAGGGTCTCGCTATGTTACCCAGGCTGCTCTCGAACTCTTGGG CTCAAGCAATCCCCCCATCTTAACCTCCCCAGCAGCTGGGACTACAGCCACGCGCCACTGCACCCAGCT GACTTTTCCTTCTAAGCATCTTTGGCTGGGCGTGGTGGCTCATGCCTGTAATCCCTGCACTTTGGGAGG CCAAGGTGGGTAGATCACTGGAGGTCAGGAGTTCTAGACCAGCCTGGCCAACATGGTGAAACCTCATCT CTACTAAAAATACAAAAAAATTAGCTGGGCATGGTGGCAGGTGCCTGTAATCCTAGCTACTCGGGAGGC TGAAGCAGGAGAATTGCTTGAACCCAGGAGGTAGAGGTTGCAGTGACCCAAGATTGTGCCACTGCACTC CAGCCTGGGTACACAGCGAGTCTGTCTAAAAAAGAAAAAAAAAAAAGGAAGAGAGAGCATCTTTATCTT CATTTTCTAACCTTTAAGTGTTACTTTCTCCCAGTAACATTTTGCCCAGAAAGAGGTGATGAATATAGA TTTAAGAATAAGATTTTCCCCATGTTGCTGCCTTTCCAGAACAAGTGAGTTCATTCTCATTTGTCTTTC TTCAGAAATCTTTTATCTGTCTTTCTCCCATTAGCTGGAATGGGTGCTCCATGAGAATAAAGACTTGGG TTCCATTCTTCCTATTGTCCCCAGAGCCTACATACTGGCTGGCATTGAGTAGCAATTGAACAGTTTTCT GAATGAATGAATGAATGAATGCTCAAATAAGCACATGAATTAATTATCACTTTCCTTTGAATCTCTCCA TTCTTCTTCCTCACCCAATGGGGCTCGATCCTTATACACAGAAGATACTCTATAAATGATGATTCAATG AATGCCAAGCCCTGTTCTATGCACTGAAGACCAAAAGAAATAAAAGACATCATTCCTGCTCTGTAAGAA Homologies Worked example: To do so, you have to carry out a Blast search. Enter: http://www.ensembl.org/Homo_sapiens/blastview Paste the sequence into the large box, select “homo sapiens” as the database to search against and “blastn” for a nucleotide search. “Blastx” does searches for DNA against protein (amino acid sequence), “blastp” for protein against protein. Homologies Worked example: After you have started the search, click retrieve and the programme will display a “view” button. Click the “view” button and the programme will display a list of matches with a score and a % identity. There is one match with 100% identity on chromosome 10 (red arrow on the chromosome). Clicking “[A]” yields a graphical display of the homology: AAACATCTATCTTGCTGTGTTTGGACAGGCCAGCCCCTGAAACATCTTGGGCAAT ||||||||||||||||||||||||||||||||||||||||||||||||||||||| AAACATCTATCTTGCTGTGTTTGGACAGGCCAGCCCCTGAAACATCTTGGGCAAT If there is not 100% homology, then the alignment looks as follows: CTCATGCCTGTAATCCCTGCACTTTGGGAGGCCAAGGTGGGTAGATCACTGGAGG ||||||||||||||||xx|x|||||||||||||||||||x|x|||||||xx|||| CTCATGCCTGTAATCCTAGTACTTTGGGAGGCCAAGGTGAGCAGATCACCTGAGG The “x” indicate a difference between both sequences. Homologies Task: I have selected a mouse DNA sequence and your task is to see whether there is a homologous human sequence. GTGTCTTGCACAGTAATAGACCGCAGAGTCCTCAGATGTCAGGCTGCTGAGCTGCATGTA GGCTGTGCTGGAGGATGTGTCTACAGTCAATGTGGCCTTGCCCTTGAACTTTTGATTGTA GTTAGTATAGCTATCAGAAGGATCAATCTCTCCGATCCACTCAAGGCCCTGTCCAGGCCT CTGTTTTACCCACTGCATCCAGTAGCTGGTGAAGGTGTAGCCAGAAGCCTTGCAGGACAG CTTCACTGAAGCCCCAGGCTTCACAAGCTCAGCCCCAGGCTGCTGCAGTTGGACCTGAGA GTGGACACCTGTGGAGAGAAAGGCAGAGTGGATGTCATTGTCACTCAAGTGTATGGCCAG ACATCGAGCCTGCTACTGTGAGCCCCTTACCTGTAGCTGTTGCTACCAAGAAGAGGATGA TACAGCTCCATCCCATGGCGAGGTCCTGTGTGCTCAGTAACTGTAAAGAGAACAGTGATC TCATGTTTTTCTGTGTGTGGTATAGACAACCCTATATTTACCATGTAGACTCACAGGATT TGCATATTCATGAGCAGGATACATATTAGATGAGCACCTACTCCTGCAGGAGAAGAAGAG ACACCTGGGTCAGGAATCAGGATGCTGAAACCCAAGTCATAGTCTTGTCTGAGGTAATTC ATCCCATACCTCATCCCTGAACCTTGTGTTGAGGCTATGGATGTAACATTATAGCCTGTG CACTAAAAAGATTTGCATCCTGAGACAGTGGCCCCACTTGTGACACAGTTGACAGATGGA Bioinformatics Part 4 Microarrays Microarrays Microarrays or biochips are a technique increasingly used by leading research groups in exercise physiology. Microarrays are used to compare the mRNA levels in two samples, e.g. control (no exercise) versus exercise. Importantly, this comparison is done for nearly all mRNAs that can be found in a tissue (e.g. all genes expressed in skeletal muscle). Microarrays The method works by printing thousands DNA dots that code for the genes of the organism onto a slide. The experimenter then converts the mRNA into DNA that is labelled with a fluorescent marker, usually green for the control sample and red for the experimental sample. The labelled control and exercise samples are allowed to hybridise (stick to) the complimentary DNA that is printed onto the slide. If a dot appears green, then there was more control mRNA in the sample (mRNA goes down during exercise). If a dot is red, then the mRNA went up in response to exercise. Yellow dots mean that the amount of mRNA was roughly equal in the control and exercise sample; i.e. the gene’s expression is not affected by exercise. No fluorescence indicates that this gene is not expressed in muscle (e.g. brain gene). The following slide schematically shows what has just been said. Microarrays Normal mRNA Disease mRNA RT/PCR Label with fluorescent dye Labelled DNA from mRNA Combine equal amounts Hybridise probe to microarray Scan Informatics Image processing, DBMS, WWW, bioinformatics, data mining and visualization Microarray example No mRNA mRNA only expressed in control mRNA only expressed in response to disease/exercise Expression in control and disease/exercise. Microarray analysis Microarray experiments usually show the differential expression of hundreds or thousands of genes. Task: Assume the following two genes are expressed at higher levels in response to 1 h of cycling exercise. HSPD13982_i_at Cathepsin D (lysosomal aspartyl protease) NM_006457_r_at LIM protein (similar to rat protein kinase Cbinding enigma) a) What is the function of these genes? b) Is there any link to exercise (e.g. changes in similar proteins in response to exercise? The End