* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Do plants have human genes?
Zinc finger nuclease wikipedia , lookup
Human genetic variation wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Copy-number variation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Transposable element wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Human genome wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Pathogenomics wikipedia , lookup
Metagenomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genome evolution wikipedia , lookup
Genetic engineering wikipedia , lookup
Point mutation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene desert wikipedia , lookup
Public health genomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene therapy wikipedia , lookup
Gene nomenclature wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome editing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Introduction to Gene Mining Part A: BLASTn-off! After Part A you will demonstrate your ability to: Use the bioinformatics NCBI Gene and BLASTn tools to search for a human gene of interest in a plant model. Evaluate the significance of your search results to see how similar human and plant genes might be. 1 The Arabidopsis Information Portal is funded by a grant from the National Science Foundation (#DBI-1262414) and co-funded by a grant from the Biotechnology and Biological Sciences Research Council (BB/L027151/1). These lessons were developed during the summer of 2015 as education outreach for the www.Araport.org portal in conjunction with the J. Craig Venter Institute, Rockville, MD, 20850, USA. Contact information General information: [email protected] Jason Miller, Grant Co-Principal Investigator, JCVI [email protected] This lesson was prepared by Andrea Cobb, Ph.D. ([email protected]) with the help of Margot Goldberg ([email protected]) The images below are all examples of….? 3 What science models do you recall? Lipid bilayer model Lock and key model of enzymes Stickleback model of evolution Computer models Experimental model of osmosis 4 Why use models instead of the “real thing”? To simplify a complex system Example: Study an enzyme reaction in a test tube rather than in the whole organism which contains many enzymes. To better manipulate and measure an effect Example: Treat Drosophila with drug X and measure the drug’s effect on Drosophila life span. To predict (test the model) Example: Use a computer model to find protein coding regions in the DNA of a newly sequenced genome. Other ideas? 5 Thanks for volunteering for our study. Your chart says you have problems eating, facial weakness and overall poor muscle tone. Looks like your mother had the same symptoms. Your diagnosis is nemaline myopathy. I am sad to tell you that no known treatment exists, but my researchers and I are working hard to find a treatment. Thanks for your help, Doctor! You can find information on this genetic disorder in a website called Online Mendelian Inheritance in Man http://www.OMIM.org The OMIM database shows that you might have a mutation in your Actin alpha 1 gene. We won’t experiment on you! It is much faster, kinder and less expensive to use a plant model. Which plant will you use to study a version of my actin alpha 1 (ACTA1) gene? https://www.youtube.c om/watch?v=foHiKrlY9 Qc explains why scientists use a certain plant for a model 7 https://www.arabidopsis.org/portals/education /aboutarabidopsis.jsp 8 Can plants really be used as models for studying human diseases? 9 Xiang Ming Zu and Simon Geir Molier, Current Opinions in Biotechnology, 2011, 22, 300-307. 10 Before we find out whether plants have human muscle genes, it would be important to know if plants move! • http://www.bbc.co.uk/progra mmes/p00lx6cl • https://www.youtube.com/w atch?v=eDA8rmUP5ZM http://aboutlifting.com/music-helps-plantsgrow-and-will-help-muscles-grow/ 11 Why don’t you rest? I am going to search the OMIM database to find out more about your possible gene mutation. Use your computer and go to: http://www.OMIM.org and find out more about nemaline myopathy and the ACTA1 gene that may be involved. After you answer questions on your handout, type in any human disease that interests you and examine the results. 12 • Use your computer to find: http://www.OMIM.org and learn more about nemaline myopathy and the ACTA1 gene that may be involved. • After you answer questions on your handout, search for any human disease that interests you and examine the information. 13 Use your textbook, open access textbooks, videos and databases to begin to find information about muscle genes and proteins. https://www.boundless.com/biology/ 14 Usually, a general search engine will give you too many hits for the question below! 15 108 results Even a broad scientific database may provide too many unrelated hits! Why are there SO MANY results? 16 “BIG DATA” https://en.wikipedia.org/wiki/List_of_RNAs Biologists are increasingly able to quickly generate enormous amounts of data but their data analysis may take weeks or even years. Data transfer protocols are not interchangeable, data storage is expensive, queries can crash! 17 What scientific approach finds better information? • Bioinformatics is an interdisciplinary approach which uses computational, mathematical, and engineering methods to analyze and make discoveries from enormous data sets. 18 To address the problem of BIG DATA, scientists can share data and analysis with other scientists. This speeds analysis and adds expertise . Scientists can share their data in researchspecific portals. These research-specific portals usually have customized bioinformatics tools. 19 A few examples of how bioinformatics is used…. Use Use Questions Questions addressed: addressed: Basic Basic research research How How is is DNA DNA organized organized in in chromosomes? chromosomes? Are Are genes genes related related to to other other genes? genes? Given Given sequence sequence data, data, how how do do we we find find aa gene? gene? How How are are genes genes expressed expressed in in response response to to the the environment? environment? Biomedicine Will this drug work on this patient? Can we cure genetic diseases? Which genetic variations are associated with heart disease? Which pathogen proteins are best for vaccine development? Can microbes remove pollution? Can microbes decrease the impact of climate change? Where did a disease originate? Microbiology Agriculture Can drought resistant plants be identified, bred or engineered? Can insect resistant plants improve food supplies? Can more healthful food sources be developed? 20 Scientists are more likely to find useful information in bioinformatics portals that support their particular research. 21 Araport https://www.araport.org/ National Center for Biotechnology Information http://www.ncbi.nlm. nih.gov/gene FLOR-ID http://www.phytosystems.ulg.ac.be/florid/ An example of increasingly more specific research-centered portals 22 For our plant model to be useful for my research, I must find a similar plant version of the ACTA1 gene involved in nemaline myopathy. Since plants and animals both move, do they use the same types of proteins to move? Do they have the same genes coding for these proteins? 23 Begin your search on the NCBI portal to find names of human muscle genes. Use http://www.ncbi.nlm.nih.gov/ and enter information shown, use the pull- down menu to select Gene. (Note: Araport.org and similar genome browsers will also allow you to search for genes and proteins of interest.) 24 Could plant and animal versions of this gene have a function in common? 25 Actin subunits self-assemble to form filaments which have a role in cell structure. Check the “Inner Life of the Cell” video. https://www.youtube.com/watch?v=FzcTgrxMzZk (2:20 until 3:15) This is how your actin should work. https://www.youtube.com /watch?v=VVgXDW_8O4U is a video showing polymerization of G-actin, a protein similar to Alpha Actin. 26 If it is reasonable that plants might have a gene similar to human ACTA1, you will need to find the ACTA1 gene sequence. Click on FASTA to obtain the human ACTA1 gene sequence. 27 Copy, then paste the ACTA1 gene sequence to a new Word document or clipboard—we will use this to look for an Arabidopsis thaliana version of this gene. Save the Word document as “human ACTA1 DNA sequence”. 28 I want to search for a version of the human ACTA1 gene in Arabidopsis thaliana. What bioinformatics tool could I use? 29 30 BLAST Types BLASTn compares 2 or more DNA sequences BLASTp compares 2 or more protein sequences BLASTX reads a DNA sequence in the 6 possible reading frames then compares it to a protein sequence database tBLASTX compares 2 or more DNA sequence translated in 6 reading frames 31 32 If I have a known DNA sequence , how can I use BLASTn to look for an unknown similar sequence? http://www.ncbi.nlm.nih.gov/ There are several ways to access NCBI BLAST. Start at the URL and page, then select BLAST. Or just go to the BLAST page URL below. http://blast.ncbi.nlm.nih.gov/Blast.cgi Select nucleotide blast 33 You found a human gene to compare… Click on FASTA to obtain the human ACTA1 gene sequence. 34 And you’ve already copied and pasted the ACTA1 gene sequence to a Word document or clipboard—we will use this to look for an Arabidopsis thaliana version of this gene. 35 Paste in your copied ACTA1 sequence Steps to use Blastn Enter the name of the organism in which we are looking for the same gene (Arabidopsis thaliana) Select the program –use “Somewhat similar sequences” for the broadest search Check “show results” in a new window, then click on BLAST #4 push blast button 36 What information is provided in an NCBI BLASTn report? The Graphics Section shows the query sequence in the red bar (green arrow) and aligned sequences are shown in colored tracks below. Each “track” represents a sequence that the BLASTn tool discovered in the database that is similar to your query sequence. The colored sections in each track are blocks of DNA which align with varying similarity (score), shown by the colored bar above. The black lines connecting the colored blocks are poorly aligned sequences (less than 40% identity). Move the mouse over a block to see the definition and score for that sequence result (also called “hit”). By clicking on a colored box, you will jump to the actual DNA alignment farther down the page. 37 What information is provided in an NCBI BLASTn report? The Descriptions Section lists the aligned sequence names and provides information about the alignment. In this search, we are using one gene sequence to find a similar gene sequence. Look at the results that end in “gene”. 38 What is gene alignment? What BLASTn values tell us whether the alignment is meaningful? 39 https://www.youtube.com/watch?v=6Udqou3vmng Go to 31:13-40:15 for a more detailed explanation of alignment. Query Starting and ending nucleotides of your query Starting and ending nucleotide coordinates for this sequence in its database Subject (database used for search) 40 BLASTn seeks to maximize the score for aligning shorter stretches of Query compared to the database. Alignment of the entire query is not required by Local alignment. Matching nucleotides are given a score of +1 and mismatches are negative. There are penalties for gaps. There are different algorithms, but this is the general idea. 41 42 “Query cover” tells what percentage of the alignment is a good match to your input sequence (query). Note that the query is more than 2750 nucleotides long. 43 The query coverage is low here (20%) because you are comparing 2 DNA sequences which contain exons (conserved, thus aligned) and introns (not highly conserved, thus non-aligned or poorly aligned. 44 Although only 20% of the query aligns to a sequence in the Arabidopsis database, 80% of the aligned part is identical to the query (see the “Ident” value of 80% and the color-coded portions of the result track. ) 45 Access more info about the sequence by clicking on the sequence ID “Alignments” provides details about nucleotide locations, matches, gaps or mismatches. 46 The E-value indicates the number of alignments with an equivalent or better score from this database that would be expected just by chance. For example, a one-in- a million (1/1,000,000) chance is a very small chance and would be written 1e-6. The lower the E-value, the more significant the score (less likely due just to chance) . E-values are in scientific notation, ex: 3e-80 = 3 x 10-80 In general, an E-value of 1X10-5 or smaller is considered significant (not just aligned by chance). 47 This is from the Alignments Section and shows the details 48 Click on the accession number for more information about the gene that had the most significant alignment Results are arranged in a default setting from lowest E-value to highest. Compare the E-value, Query cover and % identity for the checked “hits”. 49 Which GENE is most similar to the human ACTA1 sequence query? Link for more info! Amino acid sequence 50 51 the process you used to find a version of the human ACTA1 gene in Arabidopsis thaliana. What information did you use to indicate that the plant version was a meaningful find? 52 1. Pick a human gene which you think is highly conserved between plants and animals. 2. Follow the procedure you just learned to see if a similar Arabidopsis version exists. 3. Record your info on the scorecard. 4. Repeat for a gene that you predict is unique to humans. 53 Gene Discovery Scorecard Human Gene Name Human Human Gene ID Gene Function Actin alpha 1 ACTA1 Arabid opsis Gene Name Cytoskele ACT7 tal structure Arabid opsis Gene ID Arabidopsis Gene Function Out-come Predicevidence : tion? Score, E-value, Similar Function, Actin 7 Cytoskeletal structure E value was 1e-80, not random, both have similar functions…. Yes 54 • What information so far indicates whether or not plants have animal muscle genes? • What additional information might you need to be more certain whether ACT7 is a plant version of human ACTA1? 55