Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Synthetic biology wikipedia , lookup

DNA barcoding wikipedia , lookup

Exome sequencing wikipedia , lookup

Gene expression wikipedia , lookup

Transcriptional regulation wikipedia , lookup

DNA sequencing wikipedia , lookup

Mutation wikipedia , lookup

History of molecular evolution wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Molecular cloning wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Genome evolution wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Gene wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Homology modeling wikipedia , lookup

RNA-Seq wikipedia , lookup

Non-coding DNA wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Community fingerprinting wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Transcript
Computational Genomics
Lecture 1, Tuesday April 1, 2003
Biology in One Slide: 2 Paradigms
Molecular Paradigm
Evolution Paradigm
High Throughput Biology
DNA Sequencing
Gene Expression
…ACGTGACTGAGGACCGTG
CGACTGAGACTGACTGGGT
CTAGCTAGACTACGTTTTA
TATATATATACGTCGTCGT
ACTGATGACTAGATTACAG
ACTGATTTAGATACCTGAC
TGATTTTAAAAAAATATT…
Biology is becoming an information science
Goals of this course
• Introduction to Computational Biology
 Basic biology for computer scientists
 Breadth: mention many topics & applications
• In-depth coverage of Computational Genomics
 Algorithms for sequence analysis
 Current applications, trends, and open problems
•
Coverage of useful algorithms




Hidden Markov models
Dynamic Programming
String algorithms
Applications of AI techniques
Topics in CS262
Part 1: In-depth coverage of basic computational methods
for analysis of biological sequences


Sequence Alignment & Dynamic Programming
Hidden Markov models
These methods are used heavily in most genomics
applications:



DNA sequencing
Comparison of DNA and proteins across organisms
Discovery of genes, promoters, regulatory sites
Topics in CS262
Part 2: Topics in computational genomics, more algorithms,
and areas of active research

DNA sequencing & assembly: reading a complete genome such
as the human DNA

Gene finding: marking genes on the DNA sequence

Large-scale comparative genomics: comparing whole genomes
from multiple organisms

Microarrays & regulation: understanding the regulatory code, and
potential disease-causing genes

RNA structure: predicting the folding of RNA

Phylogeny and evolution: quantifying the evolution of biological
sequences
Course responsibilities
• Homeworks




[72%]
4 challenging problem sets, 4-5 problems/pset
Collaboration allowed – please give credit
Hws due Thursday, solutions explained Friday
Two worst problems in all hws do not count
• Final
[18%]
 Takehome, 1 day
 Collaboration not allowed
 Basic questions – much easier than homeworks
• Scribing
[10%]
 Due one week after the lecture, except special permission
Reading material
• Books
 “Biological sequence analysis” by Durbin, Eddy, Krogh,
Mitchinson
• Chapters 1-4, 6, (7-8), (9-10)
 “Algorithms on strings, trees, and sequences” by Gusfield
• Chapters (5-7), 11-12, (13), 14, (17)
• Papers
• Lecture notes
Topic 1. Sequence Alignment
Complete genomes
Evolution
Evolution at the DNA level
C
…ACGGTGCAGTCACCA…
…ACGTTGCAGTCCACCA…
SEQUENCE EDITS
REARRANGEMENTS
Evolutionary Rates
next generation
OK
OK
OK
Changes in non-functional sites are OK, so will be propagated
X
X
Still OK?
Most changes in functional sites are deleterious and will be rejected
Sequence conservation implies function
100%
40%
Interleukin region in human and mouse
Sequence Alignment
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
Definition
Given two strings
x = x1x2...xM, y = y1y2…yN,
an alignment is an assignment of gaps to positions
0,…, M in x, and 0,…, N in y, so as to line up each
letter in one sequence with either a letter, or a gap
in the other sequence
What is a good alignment?
Alignment:
The “best” way to match the letters of one sequence with
those of the other
How do we define “best”?
Alignment:
A hypothesis that the two sequences come from a
common ancestor through sequence edits
Parsimonious explanation:
Find the minimum number of edits that transform one
sequence into the other
Scoring Function
• Sequence edits:
AGGCCTC
 Mutations
AGGACTC
 Insertions
AGGGCCTC
 Deletions
AGG.CTC
Scoring Function:
Match:
+m
Mismatch: -s
Gap:
-d
Score F = (# matches)  m - (# mismatches)  s – (#gaps)  d