Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Neuronal ceroid lipofuscinosis wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene expression profiling wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene nomenclature wikipedia , lookup
Point mutation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Tutorial 9 Protein and Function Databases Protein and Function Databases -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology -DAVID Glossary Domain A structural unit which can be found in multiple protein contexts. Glossary Repeat A short unit which is unstable in isolation but forms a stable structure when multiple copies are present. Family A collection of related proteins. UniProt http://www.uniprot.org/ The Universal Protein Resource (UniProt) is a central repository of protein sequence, function, classification and cross reference. It was created by joining the information contained in swiss-Prot and TrEMBL. Protein search Reviewed protein Uniprot input Sequence download Uniprot output Accession number Protein status organism length Information for one protein General information annotations General keywords GO annotation (MF, BP, CC) Alternative splicing isoforms Features in the sequence Sequences References Alignment for two or more proteins MSA Blast Pfam • http://pfam.sanger.ac.uk/ • Pfam is a database of multiple alignments of protein domains or conserved protein regions. What kind of domains can we find in Pfam? Trusted Domains Repeats Fragment Domains Nested Domains Disulfide bonds Important residues (e.g active sites) Trans membrane domains What kind of domains can we find in Pfam? Context domains: are those that despite not scoring above the family threshold are expected to be real, based on the other domains found in the protein. Signal peptides: (indicate a protein that will be secreted) Low complexity regions Coiled Coils: (two or three alpha helices that wind around each other) Pfam input Domains Domain range and score Description Structure info Gene Ontology Links Prosite • http://www.expasy.org/tools/scanprosite • ProSite is a database of protein domains and motifs that can be searched by either regular expression patterns or sequence profiles. Search Results Domains architecture Gene Ontology (GO) http://www.geneontology.org/ • It is a database of biological processes, molecular functions and cellular components. • GO does not contain sequence information nor gene or protein description. • GO is linked to gene and protein databases. •The GO database is structured as a tree Search by AmiGO Three principal branches http://www.geneontology.org/amigo/ GO structure is a Directed Acyclic Graph GO sources ISS IDA IPI TAS NAS IMP IGI IEP IC ND IEA Inferred from Sequence/Structural Similarity Inferred from Direct Assay Inferred from Physical Interaction Traceable Author Statement Non-traceable Author Statement Inferred from Mutant Phenotype Inferred from Genetic Interaction Inferred from Expression Pattern Inferred by Curator No Data available Inferred from electronic annotation Results for alpha-synuclein DAVID Functional Annotation Bioinformatics Microarray Analysis • Identify enriched biological themes, particularly GO terms • Discover enriched functional-related gene/protein groups • Cluster redundant annotation terms • Explore gene names in batch annotation classification ID conversion Functional annotation Upload Annotation options