* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Dot plot - TeachLine
Nucleic acid analogue wikipedia , lookup
Genomic library wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Molecular cloning wikipedia , lookup
Gene expression programming wikipedia , lookup
Pathogenomics wikipedia , lookup
Genetic code wikipedia , lookup
Minimal genome wikipedia , lookup
Designer baby wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Transposable element wikipedia , lookup
Primary transcript wikipedia , lookup
Gene expression profiling wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
History of genetic engineering wikipedia , lookup
DNA vaccination wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Genome evolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Microevolution wikipedia , lookup
Human genome wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome editing wikipedia , lookup
Microsatellite wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Metagenomics wikipedia , lookup
Helitron (biology) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Point mutation wikipedia , lookup
Multiple sequence alignment wikipedia , lookup
Sequence Similarity Searching 75321 Class 4 March 2010 Why Compare Sequences? Identify sequences found in lab experiments What is this thing I just found? Compare new genes to known ones Compare genes from different species information about evolution Guess functions for entire genomes full of new gene sequences Are there other sequences like this one? 1) Huge public databases - GenBank, Swissprot, etc. 2) Sequence comparison is the most powerful and reliable method to determine evolutionary relationships between genes 3) Similarity searching is based on alignment 4) BLAST and FASTA provide rapid similarity searching a. rapid = approximate (heuristic) b. false + and - scores Similarity ≠ Homology 1) 25% similarity ≥ 100 AAs is strong evidence for homology 2) Homology is an evolutionary statement which means “descent from a common ancestor” common 3D structure usually common function homology is all or nothing, you cannot say "50% homologous" How to Compare Sequences? GATGCCATAGAGCTGTAGTCGTACCCT <— —> CTAGAGAGC-GTAGTCAGAGTGTCTTTGAGTTCC Manually line them up and count? an alignment program can do it for you or a just use a text editor Dot Plot shows regions of similarity as diagonals Global vs Local similarity 1) Global similarity uses complete aligned sequences - total % matches GCG GAP program, Needleman & Wunch algorithm 2) Local similarity looks for best internal matching region between 2 sequences GCG BESTFIT program, Smith-Waterman algorithm, BLAST and FASTA 3) dynamic programming optimal computer solution, not approximate Search with Protein, not DNA Sequences 1) 4 DNA bases vs. 20 amino acids - less chance similarity 2) can have varying degrees of similarity between different AAs - # of mutations, chemical similarity, PAM matrix 3) protein databanks are much smaller than DNA databanks Similarity is Based on Dot Plots 1) two sequences on vertical and horizontal axes of graph 2) put dots wherever there is a match 3) diagonal line is region of identity (local alignment) 4) apply a window filter - look at a group of bases, must meet % identity to get a dot Simple Dot Plot GA TC AA CTGAC GTA G T T C A G C T G C G T A C Dot plot filtered with 4 base window and 75% identity GA TC AA CTGAC GTA G T T C A G C T G C G T A C Dot plot of real data Global vs. Local Alignments