* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Synthetic biology wikipedia , lookup
DNA barcoding wikipedia , lookup
Exome sequencing wikipedia , lookup
Gene expression wikipedia , lookup
Transcriptional regulation wikipedia , lookup
DNA sequencing wikipedia , lookup
History of molecular evolution wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Molecular cloning wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Genome evolution wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Homology modeling wikipedia , lookup
Non-coding DNA wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Community fingerprinting wikipedia , lookup
Computational Genomics Lecture 1, Tuesday April 1, 2003 Biology in One Slide: 2 Paradigms Molecular Paradigm Evolution Paradigm High Throughput Biology DNA Sequencing Gene Expression …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT… Biology is becoming an information science Goals of this course • Introduction to Computational Biology Basic biology for computer scientists Breadth: mention many topics & applications • In-depth coverage of Computational Genomics Algorithms for sequence analysis Current applications, trends, and open problems • Coverage of useful algorithms Hidden Markov models Dynamic Programming String algorithms Applications of AI techniques Topics in CS262 Part 1: In-depth coverage of basic computational methods for analysis of biological sequences Sequence Alignment & Dynamic Programming Hidden Markov models These methods are used heavily in most genomics applications: DNA sequencing Comparison of DNA and proteins across organisms Discovery of genes, promoters, regulatory sites Topics in CS262 Part 2: Topics in computational genomics, more algorithms, and areas of active research DNA sequencing & assembly: reading a complete genome such as the human DNA Gene finding: marking genes on the DNA sequence Large-scale comparative genomics: comparing whole genomes from multiple organisms Microarrays & regulation: understanding the regulatory code, and potential disease-causing genes RNA structure: predicting the folding of RNA Phylogeny and evolution: quantifying the evolution of biological sequences Course responsibilities • Homeworks [72%] 4 challenging problem sets, 4-5 problems/pset Collaboration allowed – please give credit Hws due Thursday, solutions explained Friday Two worst problems in all hws do not count • Final [18%] Takehome, 1 day Collaboration not allowed Basic questions – much easier than homeworks • Scribing [10%] Due one week after the lecture, except special permission Reading material • Books “Biological sequence analysis” by Durbin, Eddy, Krogh, Mitchinson • Chapters 1-4, 6, (7-8), (9-10) “Algorithms on strings, trees, and sequences” by Gusfield • Chapters (5-7), 11-12, (13), 14, (17) • Papers • Lecture notes Topic 1. Sequence Alignment Complete genomes Evolution Evolution at the DNA level C …ACGGTGCAGTCACCA… …ACGTTGCAGTCCACCA… SEQUENCE EDITS REARRANGEMENTS Evolutionary Rates next generation OK OK OK Changes in non-functional sites are OK, so will be propagated X X Still OK? Most changes in functional sites are deleterious and will be rejected Sequence conservation implies function 100% 40% Interleukin region in human and mouse Sequence Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings x = x1x2...xM, y = y1y2…yN, an alignment is an assignment of gaps to positions 0,…, M in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gap in the other sequence What is a good alignment? Alignment: The “best” way to match the letters of one sequence with those of the other How do we define “best”? Alignment: A hypothesis that the two sequences come from a common ancestor through sequence edits Parsimonious explanation: Find the minimum number of edits that transform one sequence into the other Scoring Function • Sequence edits: AGGCCTC Mutations AGGACTC Insertions AGGGCCTC Deletions AGG.CTC Scoring Function: Match: +m Mismatch: -s Gap: -d Score F = (# matches) m - (# mismatches) s – (#gaps) d