* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download MBG305_LS_01
Transcriptional regulation wikipedia , lookup
Molecular cloning wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
List of types of proteins wikipedia , lookup
Gene regulatory network wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Protein moonlighting wikipedia , lookup
Gene expression wikipedia , lookup
Protein structure prediction wikipedia , lookup
Homology modeling wikipedia , lookup
Gene expression profiling wikipedia , lookup
Point mutation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Community fingerprinting wikipedia , lookup
Genome evolution wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction) Your Instructor • Education – BSc: University of Münster 1996 – MSc: University of Münster 2002 – PhD: University of Münster 2006 • Worked at – – – – – Izmir Institute of Technology (since 2008) Izmir University of Economics, Turkey (Feb 2007 – Aug 2008) University of Muenster, Germany (Jan 2006 – Feb 2007) University of Pennsylvania, USA (Jan 2004 – Dec 2005) University of Jena, Germany (Nov 2002 – Dec 2003) Areas of Interest • Bioinformatics – Sequences – Alignments • Mass Spectrometry – De novo sequencing – Pattern matching • Annotation – Integration – Automatic assessments • General Automation and Productivity Course Rules • Attendance – Is essential and will be monitored strictly – if(absence > 12h) Then NA; • Make-up Work – None Course Rules • Lecture starts on time – if late enter QUIETLY – if more then 5 min late DO NOT ENTER wait for break • Breaks are 10 min max – if late after break enter QUIETLY – if more then 5 min late DO NOT ENTER wait for next break • Early leave – Announce before course and leave if granted Course Rules • Project – Parts to be performed published on the website and/or as slides – Deadline 6pm on the day before the next class (you may submit early of course) – No extention – No make-up – No extra work • Must be electronicly submitted to: [email protected] – Must be named ????_first_last.eee or will not be accepted – Formats include: doc, ppt, odx, txt, html, ... – Not allowed are formats that may not be edited by me like pdf, and similar formats that are not widespread – Must be significantly different from your classmates – Otherwise everyone involved will obtain zero for that assignment Grading • All information available on class website • Grading individualized – – – – – Quizzes Mind Maps Midterm 1 Midterm 2 Project 15% 10% 25% 25% 25% Project • Group Formation 0% (08.10. 18:00) – Group Size: 4 • • • • • First Draft Results Second Draft Presentation Final Version 25% 15% 20% 10% 25% (22.10. 18:00) (19.11. 18:00) (03.12. 18:00) (25.12. 18:00) (31.12. 18:00) Grading • I am responsible to evaluate you – I am not responsible to pass everyone or give great grades • Make it easy for me 1. Show up and participate 2. Do homeworks and pre-course preparations 3. Midterm and Final will be easy for you if you adhere to 1. and 2. Course Structure – – – – – – – – – – Start 10 min quiz 35 min lecture 5 min mind mapping 10 min break 50 min practice 10 min break 40-50 min lecture 10 min break 30 min practice Textbooks Primary audience Junior bio majors Course home page: http://www.biolnk.com/habf ISBN: 978-605-133-297-0 http://www.idefix.com/kitap/biyoenformatik-1-dizi-kiyaslamalarijens-allmer/tanim.asp?sid=GUFFOI44R7FJ9CIR6STU Textbooks Everything you currently need to know about Applied Bioinformatics in regard to practical problems you will encounter during everyday research. Bioinformatics Chemistry Biology Molecular biology Mathematics Statistics Bioinformatics Computer Science Informatics Medicine Physics Bioinformatics is Multidisciplinary Genomics Drug Design Computer Science Molecular Life Sciences Phylogenetics Structural Biology Math Statistics BIOINFORMATICS The Pyramid of Life (2000) Metabolomics 1400 Chemicals Proteomics 3,000 Enzymes Genomics 30,000 Genes The Pyramid of Life Protein Interactions? 100,000 Proteins 30,000 Genes 1400 Chemicals Bioinformatics (or Computational Biology) • Not just the study of DNA or protein sequence data • Inclusive definition – concerns the storage, display, reduction, management, analysis, extraction, simulation, modeling, fitting or prediction of biological, medical or pharmaceutical data Basis of molecular life sciences • Hierarchy of relationships (some exceptions): Genome Gene 1 Gene 2 Gene 3 Gene X Protein 1 Protein 2 Protein 3 Protein X Function 1 Function 2 Function 3 Function X How can one use bioinformatics to link diseases to genes? • Disease Map Gene Function Positional cloning of genes 1. Find genetic markers associated with disease 2. Sequence DNA next to the markers 3. Compare DNA from afflicted individuals to DNA of normal individuals (database) 4. Find abnormalities 5. Predict gene function from sequence information Bioinformatics in the old days • Close to Molecular Biology: – (Statistical) analysis of protein and nucleotide structure – Protein folding problem – Protein-protein and protein-nucleotide interaction • Many essential methods were created early on – Protein sequence analysis (pairwise and multiple alignment) – Protein structure prediction (secondary, tertiary structure) • Evolution was studied and methods created – Phylogenetic reconstruction (clustering – e.g., Neighbor Joining (NJ) method) – Nowadays also part of Datamining But then the big bang…. The Human Genome - 26 June 2000 Dr. Craig Venter Celera Genomics -- Shotgun method Francis Collins (USA)/Sir John Sulston (UK) Human Genome Project Human DNA • There are at least 3bn (3 109) nucleotides in the nucleus of almost all of the trillions (3.2 1012 ) of cells of a human body (an exception is, for example, red blood cells which have no nucleus and therefore no DNA) – a total of ~1022 nucleotides! • Many DNA regions code for proteins, and are called genes (1 gene codes for 1 protein as a base rule, but the reality is a lot more complicated) – Name examples • Human DNA may contain ~27,000 expressed genes – Problems? • Deoxyribonucleic acid (DNA) comprises 4 different types of nucleotides: adenine (A), thiamine (T), cytosine (C) and guanine (G). These nucleotides are sometimes also called bases – Ambiguities? Y-Chromosome • 50% of the sequence consists of NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN • Not very meaningful – Explanation .... Same as in x chromosome – What about the N’s in chr 1? Human DNA (Cont.) • All people are different • but the DNA of different people only varies for 0.2% or less • So, only up to 2 letters in 1000 are expected to be different. • Evidence in current genomics studies (Single Nucleotide Polymorphisms or SNPs) imply that • on average only 1 letter out of 1400 is different between individuals. • Over the whole genome, this means that 2 to 3 million letters would differ between individuals. Modern bioinformatics is closely associated with genomics • The aim is to solve the genomics information problem • Ultimately, this should lead to biological understanding how all parts fit (DNA, RNA, proteins, metabolites) and how they interact (gene regulation, gene expression, protein interaction, metabolic pathways, protein signaling, etc.) Functional Genomics From gene to function Genome Expressome Proteome Interactome? TERTIARY STRUCTURE (fold) TERTIARY STRUCTURE (fold) Metabolome How much of the genome is defined? Unknown Function What is bioinformatics? Math Physics English Bio Comp sci Chem Bioinformatics Stats • • • • • • • • • • • • • Machine learning Database systems Data mining Image processing Modeling Graph theory Statistical analysis Sequence Structure Interactions Regulation Genomes Evolution • E.g. Process the spots on a microarray, determine which genes are differentially expressed, link spots to sequence via a database, analyze the sequence using predictive tools, link the genes to related genes to form a network What is a bioinformatician? • Somebody who knows everything What is a bioinformatician? • A facilitator – Typically has background in biology or CS, but is comfortable with concepts from other disciplines – Bring together ideas (or researchers) from different domains to solve a biological problem • Conceptualize the problem – Use language appropriate to the domain • Identify potential solutions – Understanding of different fields helps to identify possible approaches at a broad level • Guide the development process – Create in-house or find potential collaborators to work on approaches in-depth • Integrate results into overall solution – Software/method, results of biological analysis How is Bioinformatics Used? Bioinformatics is used to help “focus” the scientist on the bench top experiments Bioinformatics isn’t going to replace lab work anytime soon Experimental proof is still the “Gold Standard”. Bioinformatics • Is application of computational tools in Biology Bioinformatics? • Not really! • In this course we will however only go into algorithmic details rarely (like today ;) Mind Mapping • Have you ever studied a subject or brainstormed an idea, only to find yourself with pages of information, but no clear view of how pieces fit together? • Mind mapping – – – – – – Learn more effectively Improves memorization Enhances creativity Speeds up analyses Gives structure to complex ideas Records information for future use Source: http://www.mindtools.com/pages/article/newISS_01.htm An Example Mind Map for MicroRNAs How to Mind Map 1. Identify the central topic write in center 2. Write major parts of the topic on lines in all directions 3. Repeat 2. with ever finer level of detail until satisfied Source: http://www.mindtools.com/pages/article/newISS_01.htm Note Taking with Mind Maps • Capture ideas organized into topics – What if the central topic which I chose is not the central topic? – Make a new mind map which captures the topic correctly • Uses Cases – – – – Note taking in class Recapitulization after lecture Analysis of a new topic Structuring of any intended writing • When – During acquisition of new knowledge (faster than writing) – For review 5m, 1h, 6h, 1d, 7d, 1m after note taking Mind Mapping Tips 1. Use single words or very short phrases 2. Write clearly and readable 3. Use color! 4. Seperate ideas (color, lines, shading) 5. Draw symbols and images 6. Draw links among elements A More Elaborate Mind Map Source: http://www.mindtools.com/pages/article/newISS_01.htm At the Heart of Bioinformatics Genomic >scaffold_1152 GGTGCGGCCGTCCTCCAGCTGCTTGCCGGCGAAGATCAGGCGCTGCTGGT CCGGGGGGATGCCTGCATCCGGTGAGGAAACGCTCGTGTCAGACAAAGTG GGTGGGCGCAGGAAGCAGCAATCAACACAGCCCAGTGCAGCTGCAAAGCG CCCGCCTTACCACTGACCCGCCTGGCCACCCACCCCTACCCCCCGTAAGG AAAGAGCCCCGACTCACCCTCCTTGTCCTGAATCTTGGCCTTCACGTTCT CAATGGTGTCCGAAGACTCCACCTCGAGCGTGATGGTCTTGCCCGTCAGG GTCTTGACGAAGATCTGCATGCCACCGCGCAGGCGCAGCACCAGGTGCAG … Translated >RF1_scaffold_1152 GAAVLQLLAGEDQALLVRGDACIR$GNARVRQSGWAQEAAINTAQCSC KAPALPLTRLATHPYPP$GKSPDSPSLS$ILARDVAHDFAKSSPR$YA PLIPQNLRC$SIEMKQPASLLSPIGEGACASHLQCLEKCLLP$GAIVY MIS$GSGRR$TSWVGIGGCNDGTEKRSEVDSRRGGKGNIHD >RF2_scaffold_1152 VRPSSSCLPAKIRRCWSGGMPASGEETLVS AATAAKPQTWSPTAWEF KVGGRRKQQSTQPSAAAKRPPYH$PAWPPTPTPRKERAPTHPPCPESW SRSQWCPKTPPRA$WSCPSGS$RRSACHRAGAAPGAGSTPSGCCSQPG CGRPPAACRRRSGAAGPGGCLCVGGGGEGACASHLQCLEGE … Try it for yourself Sequence ACGGTAGTATGTGATGTATGATCGCGAAAGAGG Pattern TGATGT Your Your Task Task You You may may only only compare compare 11 character character at at aa time time You You may may create create helpful helpful structures structures You You should should find find the the location location of of the the Pattern pattern in in the the Sequence Sequence with with aa minimal minimal number number of of comparisons comparisons Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 1 Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 2 Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 3 Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 4 Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 6 Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 7-16 Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 17-22 Boyer-Moore Algorithm •Preprocessing •Good suffix matrix •Bad character matrix (m+1) (m+1) ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 1 Boyer-Moore Algorithm ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 2 Boyer-Moore Algorithm ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 3-7 Boyer-Moore Algorithm ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 8 Boyer-Moore Algorithm ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 9-15 Questions Define Algorithm Website • http://mbg305.allmer.de • Slides • Homework • Additional materials and challenges • Grades Website • To see your grades you need to login • Some material may need login as well • Currently – UserID = StudentID – Password = StudentID • Change now – UserID = working email address – Password = whatever you will remember Login to mbg305.allmer.de • We will now assist you to log in and to add your email address and change your password. Assignments – Research about Mind Maps • E.g.: http://en.wikipedia.org/wiki/Mind-map • IYTE library – Make sure to read the lecture notes for next week (Available online on Wednesday) – Read Chapters 1 and 2 from our textbook