* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
List of types of proteins wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Protein design wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Structural alignment wikipedia , lookup
Western blot wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein purification wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Protein structure prediction wikipedia , lookup
Protein–protein interaction wikipedia , lookup
InterPro/prosite UCSC Genome Browser Exercise 3 Turning information into knowledge The outcome of a sequencing project is masses of raw data The challenge is to turn this raw data into biological knowledge A valuable tool for this challenge is an automated diagnostic pipe through which newly determined sequences can be streamlined From sequence to function Nature tends to innovate rather than invent Proteins are composed of functional elements: domains and motifs Domains are structural units that carry out a certain function The same domains are shared between different proteins Motifs are shorter sequences with certain biological activity http://www.ebi.ac.uk/interpro/ InterPro An integrated documentation resource for protein families, domains and sites Groups signatures describing the same protein family or domain Combines a number of databases that use different methodologies to derive protein signature: UniProt: UniProtKB Swiss-Prot, TrEMBL, UniRef,UniParc prosite: documented DB on domains, families and functional sites. Pfam: a DB of protein families represented by MSAs Member databases Sequence-motif methods: Protein signature DBs with different focus Sequence-cluster methods: Hierarchically clustered sequence/structure DBs InterPro search http://www.expasy.ch/prosite/ prosite A method for determining the function of uncharacterized translated protein sequences Consists of a DB of annotated biologically important sites/patterns/motifs/signature/fingerprints prosite Entries are represented with patterns or profiles profile pattern [AC]-A-[GC]-T-[TC]-[GC] 1 2 3 4 5 A 0.66 1 0 0 . T 0 0 0 1 . C 0.33 0 0.66 0 . G 0 0 0.33 0 . Profiles are used in prosite when the motif is relatively divergent, and it is difficult to represent as a pattern Scanning prosite Query: sequence Result: all patterns found in sequence Query: pattern Result: all sequences which adhere to this pattern Patterns with a high probability of occurrence Entries describing commonly found posttranslational modifications or compositionally biased regions. Found in the majority of known protein sequences High probability of occurrence prosite sequence query prosite pattern query UCSC Genome Browser UCSC Genome Browser Gateway Reset all settings of previous user UCSC Genome Browser Gateway UCSC Genome Browser Gateway UCSC Genome Browser query results UCSC Genome Browser Annotation tracks Base position UCSC Genes UTR RefSeq mRNA (GenBank) Vertebrate conservation Single species compared SNPs Repeats Intron Exon Gene Direction USCS Gene UCSC Genome Browser - movement Zoom x3 + Center UCSC Genome Browser – Base view Annotation track options dense squish pack full Annotation track options Another option to toggle between ‘pack’ and ‘dense’ view is to click on the track title Sickle-cell anemia distr. Malaria distr. BLAT BLAT = Blast-Like Alignment Tool BLAT is designed to find similarity of >95% on DNA, >80% for protein Rapid search by indexing entire genome. Good for: 1. Finding genomic coordinates of cDNA 2. Determining exons/introns 3. Finding human (or chimp, dog, cow…) homologs of another vertebrate sequence BLAT on UCSC Genome Browser BLAT on UCSC Genome Browser BLAT Results BLAT Results Match Non-Match (mismatch/indel) Indel boundaries BLAT Results BLAT Results on the browser Getting DNA sequence of region Getting DNA sequence of region