Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tutorial 3 BLAST What is BLAST? • Basic Local Alignment Search Tool • Is a set of similarity search programs designed explore sequence databases. to What are similarity searches good for? • One sequence by itself is not informative; it must be analyzed by comparative methods against existing sequence databases to develop hypothesis concerning relatives and function Query BLAST program Database BLAST Databases Name Query type Database blastn Genomic Genomic blastp Protein Protein blastx Translated genomic Protein tblastn Protein tblastx Translated genomic Translated genomic Translated genomic http://www.ncbi.nlm.nih.gov/BLAST/ Place Query Choose Database ? BLASTN Databases GenBank, EMBL, DDBJ, PDB and NCBI Gene collection reference sequences (RefSeq) Genomic + Transcript EST mito vector month Envi Complete human and mouse genome + transcriptome Expressed sequence tags Mitochondrial sequences Vector subset of GenBank GenBank, EMBL, DDBJ, PDB from 30 days Environmental samples http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#nucleotide_databases Place Query Choose Database Optimize similarity level of the search ? Limit output size Threshold for results significance Primary word match (16-64 nt) Reward and penalty for matching and mismatching bases Cost to create and extend a gap Remove low information content Limit search to specific organism Search for homologous to chick “olfactory receptor 6” gene Global Alignments Local Alignments Query sequence Matched Areas of database sequences Sequence description Sequence Identifier Score(bits) Coverage E value Identity Score and E value Identities and gaps Strand Multiple hits on a same subject Design of the BLAST survey Consider your research question: •Are you looking for an particular gene in a particular species?: BLAST against the genome of that species. •Are you looking for additional members of a gene family across all species? : BLAST against the gene collection database. •Are you looking for exact motif matches? : increase gap penalty or use megablast. Score and E-value Score (S): (identities + mismatches)-gaps Bit Score (S’): Score Depends on search space Query length(bp) Depends on scoring system Database length(bp) Score and E-value •The score is a measure of the similarity of the query to the sequence shown. •The E-value is a measure of the reliability of the score. •The definition of the E-value is: The probability due to chance, that there is another alignment with a similarity greater than the given S score. Score and E-value The Size of the E-value •The typical threshold for a good E-value from a BLAST search is E=10-6≈e-6 or lower. •The reason for such low values is that an E=0.001 in a million entry database would still leave 1000 entries due to chance. An E=e-6 would only leave one entry due to chance. Exercise Calculate the S, S’ and E for the following BLAST hit: ACGTCGATCGAGCT |||||||| ||||| AGGTCGTC-GAGGT Given the following parameters: Query length: 150 =1.37 K=0.711 Average Sequence length in database: 270 Number of sequences in database: 4,554,026 S = S’= S’= S’= 13-1 = 12 (1.37*12 – ln(0.711))/ln(2) 16.44 + 0.341 /0.693 24.2 S: (Id+MM)-GP Exercise Calculate the S, S’ and E for the following BLAST hit: ACGTCGATCGAGCT |||||||| ||||| AGGTCGTC-GAGGT Given the following parameters: Query length: 150 =1.37 K=0.711 Average Sequence length in database: 270 Number of sequences in database: 4,554,026 E= 0.711x150x270x4,554,026xe-1.37*12 E= 131135455683x7.24e-8 E= 9504.27 Exercise What will be the minimal score in order to achieve a significant E value (e-6~10-6)? 131135455683e-1.37S=10-6 ln (131135455683e-1.37S)=ln(10-6) ln (131135455683)+ln(e-1.37S)=-13.81 25.6-1.37S=-13.81 S= =-13.81-25.6/-1.37 S≈ 28.76 .1חיפוש רצפים הומולוגיים לגן CFTRבאדם .2חברי משפחה נוספים לגן CFTRהנמצאים ביצורים אחרים .3חיפוש של גנים נוספים חברי משפחת ABC transporters