* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Biology and computers
Interactome wikipedia , lookup
Non-coding DNA wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Western blot wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Homology modeling wikipedia , lookup
Biosynthesis wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Point mutation wikipedia , lookup
Genetic code wikipedia , lookup
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between a research article and review article. Understand the concept of sliding window programs. Understand difference between identity, similarity and homology. Appreciate that proteins can be modular Workshop-Learn how to use OMIM and obtain DNA and proteins sequences associated with diseases. Perform sliding window to compute %(G+C) as a function of position in sequence. Homework due Tuesday, Oct. 2nd. Primary public domain bioinformatics servers Public Domain Bioinformatics Facilities National Center For Biotechnology Information (NCBI) United States Databases Analysis Tools European Bioinformatics Institute (EBI) United Kingdom Databases Analysis Tools Genome Net (KEGG & DDBJ) Japan Databases Analysis Tools NCBI ENTREZ A platform that provides access to and links to databases with biological information ENTREZ PubMed MedLine GenBank Protein Genomes databases PopSet Taxonomy OMIM NCBI ENTREZ MedLine OMIM Literature Database Database of human genes and genetic disorders GenBank Database of all publicly available DNA sequences Protein databases Database of amino acid sequences from Uniprot, Protein Research Foundation, PDB. Genomes Database of genomes from organisms and viruses PopSet Taxonomy Database of DNA sequences that have been collected to analyze the evolutionary relatedness of a population. Database of names of organisms with sequences in GenBank. Literature Databases Medline/Pubmed OMIM CSULA Library Bookshelf (from NCBI) Melvyl (Books at UC Libraries) Other molecular life science databases Science Direct Pub Med Central Free Medical Journals LinkOut Journals Wiley InterScience OMIM-Online Mendelian Inheritance in Man A catalog of human genes linked to diseases Victor A. McKusick at Johns Hopkins University A good place to start when you want to research a certain disease or biological molecule This database is cross-referenced to PubMed and other NCBI-based databases Sliding window A sliding window-gathers information about properties of nucleotides or amino acids. 4 GCATATGCGCATATCCCGTCAATACCA 5 GCATATGCGCATATCCCGTCAATACCA 6 GCATATGCGCATATCCCGTCAATACCA A simple example is to calculate the %(G+C) content within a window. Then move the window one nucleotide and repeat the calculation. Sliding window If the window is too small it is difficult to detect the trend of the measurement. If too large you could miss meaningful data. Small window size %(G+C) Sequence number Large window size %(G+C) Sequence number Sliding window Adapted from Zhao et al, BMC Genomics. 2007 Nov 7;8:403. Amino acid characteristics Amino Acid Hydrop. VALUE A 1.8 C 2.5 D -3.5 E -3.5 F 2.8 G -0.4 H -3.2 I 4.5 K -3.9 L 3.8 M 1.9 N -3.5 P -1.6 Q -3.5 R -4.5 S -0.8 T -0.7 V 4.2 W -0.9 Y -1.3 Four levels of protein structure AGHIPLLQ 1) Primary Linear sequence- 2) Secondary Initial folding patterns-AGHIPLLQ aaaTTTbb 3) Tertiary 4) Quaternary Complex folding patterns- Interactions between polypeptides Kyte-Doolittle Hydropathy – A sliding window software program [J. Mol. Biol. 157:105-132 (1982)]. The seven known membrane-spanning regions are numbered 1-7 in red on the plot. Note that this particular software program averaged the hydropathy values in the window (http://www.vivo.colostate.edu/molkit/hydropathy/index.html). The original program by Kyte and Doolittle summed the hydropathy values. Dot Plot with window = 1 A A T G T A G ● Window = 1 ● ● ● ● G C ● ● ● ● C G C ● T T A C ● ● ● ● ● ● Note that 25% of the table will be filled due to random chance. 1 in 4 chance at each position Dot Plot with window = 3 A AG { T G C C T A G T ● G ● C C ● ● T ● A G ● Window = 3 The larger the window the more noise can be filtered What is the percent chance that you will receive a match randomly? One in (four)3 chance. (¼)3 * 100 = 1.56% Do workshop #2 Answer questions 1-3 Evolutionary Basis of Sequence Alignment 1. Identity: Quantity that describes how much two sequences are alike in the strictest terms. 2. Similarity: Quantity that relates how much two amino acid sequences are alike. 3. Homology: A conclusion drawn from data suggesting that two genes share a common evolutionary history. Purpose of finding differences and similarities of amino acids in two proteins. Infer structural information Infer functional information Infer evolutionary relationships Modular nature of proteins Proteins possess local regions of similarity. Proteins can be thought of as assemblies of modular domains. Two proteins that are similar in certain regions Tissue plasminogen activator (PLAT) Coagulation factor 12 (F12). Baxevanis and Ouellette, Bioinformatics, Wiley-Interscience, New York, 2001 The Dotter Program • Program consists of three components: •Sliding window •A table that gives a score for each amino acid match •A graph that converts the score to a dot of certain density (the higher the dot density the higher the score) Dot plot of sequence alignment highlighting Kringle domain alignments. Adapted from Baxevanis, Ouellette: Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 2nd Edition.