* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Gapped Blast and PSI
Extrachromosomal DNA wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Metagenomics wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Deoxyribozyme wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
DNA vaccination wikipedia , lookup
Non-coding DNA wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Expanded genetic code wikipedia , lookup
Microsatellite wikipedia , lookup
Genetic code wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Helitron (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Point mutation wikipedia , lookup
Multiple sequence alignment wikipedia , lookup
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle The BLAST Topics • • • • • Exactly What is BLAST? A Quick Recap of Profiles A Few Statistics Behind the BLAST Program The Progression to Gapped BLAST The advancements in PSI BLAST Exactly What Is BLAST? • • • • • Blast Programs are used for searching both protein and DNA databases for sequence similarities. BLAST programs can compare protein to protein, DNA to DNA, Protein to DNA, or DNA to protein. The DNA sequences used in comparison are usually conceptually transcribed before comparison. BLAST programs use a threshold value which can be adjusted to alter speed and probability. A higher value of T will give greater speed, but also a larger probability of missing weaker similarities. Can use various substitution matrices such as Blosum(62) or PAM 250. A Quick Recap of Profiles • • • • A sequence profile is a position specific scoring matrix generated from a group of aligned sequences and a basic scoring matrix. A profile will have L rows and 22 columns or vice versa. Amino acid matrix scores are multiplied by the ratio of that amino acid in the sequences being compared over the entire number of amino acid possibilities in the matrix. A consensus sequence or profile is then derived and used in future comparisons. A Few Statistics Used in BLAST • • • • • Firstly we require that the expected score for two random amino acids ΣPiPjSij to be negative. Now we can calculate two parameters λ and K. These two variables allow for a normalized scoring system through the equation S‘ = (λS – ln K) / (ln 2). S’ can now be plugged into the equation E = N/2^s’. E-Value > 0.01 = will return more loosely related similarities. E-Value <= 1*10^-5 will return more strictly related similarities. The Progression to Gapped BLAST • • • • • Original BLAST program did not take gaps into account. BLAST used to look for single alignments of at least length T. Each positive alignment “hit” was then extended. Gapped BLAST now allows for two non-overlapping alignments of length T within distance A of one another. These alignments “hits” are then extended. Gapped BLAST allows for gap initiation and extension. ABCDE ABCDE ACD - A–CD– (Original Blast) (Gapped Blast) PSI BLAST • • • • Position-Specific Iterated BLAST Incorporates position specific matrices “profiles” Often much better at detecting weak similarities Before PSI BLAST the same techniques were used, but a large degree of expertise and human intervention was required Score Matrix Architecture • Profiles very similar to scoring matrix – Protein or nucleotide aligns to profile position – New profile created with every iteration • • • Gap costs may be position-specific with profiles. How position specific protein score matrices draw their power – – • Profiles created in turn i used in turn i+1 Improved estimation of the probabilities with which amino acids occur at various pattern positions Relatively precise definition of the boundaries of important motifs Every matrix constructed has a length exactly the same as the original query sequence Multiple Alignment Construction & Sequence Weights • All database sequences whose aligned E-value is below a specific threshold are added to the query Any row (or column) which is >= 98% identical to a previously added alignment is kept out of the profile • – • • Allows for better searching on later iterations Poor restrictions could lead to large scale profile sequence insertion Sequences are given different weights depending on evolutionary importance PSI BLAST Overview • Start off with query and initial score matrix (BLOSUM 62) – – • A profile(p1) is constructed from the passing sequences and score matrix – – • Homologs are found using BLAST (align DB to query) E-Value is used as criteria for sequence insertion into profile Once again search for homologs using BLAST(align DB to profile) Once again use E-Value as criteria for insertion into profile A profile(p2) is constructed from the approved sequences and score matirx.