Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Identifying sequences with … Speaker : S. Gaj BioInformatics Lunch Meeting Date 04-03-2005 Annotation Annotation Background • Best possible description available for a given sequence at the current time. How to annotate? • Combining • • • Alignment Tools Databases Datamining (scripts) Microarrays Part I: Sequence Alignment Introduction Global alignment Background • Optimal alignment between two sequences containing as many characters of the query as possible. Ex: predicting evolutionary relationship between genes, … Local alignment • Optimal specific alignment between two sequences identifying identical area(s) Ex: Identifying key molecular structures (S-bonds, ahelices, …) Global vs Local Alignment Global Alignment: -42 Score: at (seq1)[1..90] : (seq2)[1..90] 1 MA-----STVTSCLEPTEVFMDLWPEDHSNWQELSPLEPSDPLNPPTPPRAAPSPVVPST | || | | | | | | | | 1 MSHGIQMSTIKKRRSTDEEVFCLPIKGREIYEILVKIYQIENYNMECAPPAGASSVSVGA • Includes total sequence • The highest score 56 EDYGGDFDFRVGFVEAGTAKSVTCTYSPVLNKVYC | | | 61 TEAEPTEVFMDLWPEDHSNWQELSPLEPSD----- Local Alignment: Score: 148 at (seq1)[10..36] : (seq2)[64..90] 10 EPTEVFMDLWPEDHSNWQELSPLEPSD ||||||||||||||||||||||||||| 64 EPTEVFMDLWPEDHSNWQELSPLEPSD • The highest score • Stop the alignment extension if it is not profitable BLAST Introduction Basic Local Alignment Search Tool • Aligning an unknown sequence (query) against all sequences present in a chosen database based on a score-value. • Aim : Obtaining structural or functional information on the unknown sequence. Programs BLAST • • Different BLAST programs available Protein Nucleic BlastN BlastX Protein - BlastP Parameters: • • Nucleic Maximum E-Value, Gap Opening Penalty (GOP), Gap Extension Penalty (GEP), … Terms • • • Query Subject Hit Sequence which will be aligned Sequence present in database Alignment result. BLAST: Matrices Substition Matrices – What? Estimates the rate at which each possible residue in a sequence changes to each other residue over time. For example, hydrophobic residue is more likely to stay hydrophobic than not. Each matrix is tailored to look for certain types of sequences – KNOW WHAT YOU ARE LOOKING FOR! BLAST: Matrices Substition Matrices – Why? 1. Determine likelihood of homology between two sequences 2. Substitutions that are more likely should get a higher score 3. Substitutions that are less likely should get a lower score. Matrices - PAM • BLAST: Matrices • Point Accepted Mutation Mostly used in global amino acid alignments • PAM1 represents 1% of change PAM250 = (PAM1)^250 • PAM1 • Applied for a time period over which we expect 1% of the amino acids to undergo accepted point mutations within the species of interest. BLOSUM • BLAST: Matrices • • • Mostly used in local AA alignments Based on observed alignments, not predicted ones. BLOSUM 80, BLOSUM 62, BLOSUM 45 Default: BLOSUM 62 Matrix calculated from comparisons of sequences with no less than 62% divergence. PAM vs BLOSUM • Closely related: BLAST: Matrices • • • High PAM Low BLOSUM Distantly related: • • Low PAM High BLOSUM BLAST BlastN Example BLAST BlastN Example Common BLAST problems • BlastN BLAST C GA T A C GC C A GG - A T A T A C C | | | | | | | | | | | | | | | | | | | C GA T A C GC C A GGGA T A T A C C Sequencing Error • Solution: Low penalty for GOP and GEP = 1 Clone seq mRNA Translation Problems • 6-Frame translation BLAST >embl|J03801|HSLSZ Human lysozyme mRNA, complete cds with an Alu repeat in the 3' flank. +1 L A L * P S S Q H E G S H C S G A ctagcactctgacctagcagtcaacatgaaggctctcattgttctggggct... Translation Problems • 6-Frame translation >embl|J03801|HSLSZ Human lysozyme mRNA, complete cds with an Alu repeat in the 3' flank. BLAST +3 S +2 +1 0 * L T H A L T S * D L * Q L P A S V S S T * N R M Q H L K E A G S L L I S H F W V L G G C S G A ctagcactctgacctagcagtcaacatgaaggctctcattgttctggggct... V L C Q C V S K A K L A * F P L L N E * I R C N L V M F Y C F L I V F Y F F H H I A M T S C * H -1 -2 I -3 http://searchlauncher.bcm.tmc.edu/cgi-bin/seq-util/sixframe.pl Common BLAST problems intron exon BLAST Gene X Translation full mRNA Splicing mRNA Common BLAST problems Coding region Non-coding region BLAST mRNA Clones derived from mRNA BlastX against protein sequence 3 possible hit-situations Common BLAST problems Coding region Non-coding region JO K E R Yields no protein hit B A T MA N R O B I N BLAST B A T MA N R O B I N | | | | | | | | | | | B A T MA N R O B I N or JO K E R B A T M A N | | | | | | B A T MA N R O B I N Aligns with protein in 1 of the 6 frames. Part perfect alignment End Questions?