Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Expression vector wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Gene expression wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Interactome wikipedia , lookup
Point mutation wikipedia , lookup
Western blot wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Protein structure prediction wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Protein–protein interaction wikipedia , lookup
UniProtKB/Swiss-Prot: Questions, Answers and a few Tips UniProtKB: Questions and answers Fortaleza 31.VII.2006 Everything you always wanted to know about UniProtKB/Swiss-Prot… and others were not afraid to ask ! UniProtKB: Questions and answers Fortaleza 31.VII.2006 Two main contact points: [email protected] [email protected] UniProtKB: Questions and answers Fortaleza 31.VII.2006 Some have problems finding a protein… UniProtKB: Questions and answers Fortaleza 31.VII.2006 Troubles finding a protein… I cannot find the IgG protein from Lama pacas in your server. “Lama pacas” = Lama guanicoe pacos (Alpaca) (Lama pacos) UniProtKB: Questions and answers Fortaleza 31.VII.2006 Troubles finding a protein… I cannot find the IgG protein from Lama pacas in your server. “Lama pacas” = Lama guanicoe pacos (Alpaca) (Lama pacos) 40 entries in UniProtKB (5 Swiss-Prot, 35 TrEMBL), but no IgG; 98 entries at the EMBL database, no IgG; In addition: Ig are not annotated in UniProtKB/Swiss-Prot (currently many Ig sequences are stored only in UniParc); Lama pacos is not an annotation priority. UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB/Swiss-Prot annotation priorities (see poster SP106) Model-organism oriented annotation 1. Complete microbial proteomes and plastid–encoded proteins (HAMAP) 2. 3. 4. 5. 6. 7. 8. 9. Human proteins and their orthologs in other mammals (HPI) (SP129) Plant proteins (A.thaliana and rice) (PPAP) (SP133) Fungal proteomes (FPAP) (SP134) Proteomes of representative subsets of viral strains (SP135) Toxins and anti-microbial peptides (ToxProt) (SP139) Drosophila proteome (SP137) C.elegans proteome (SP138) Xenopus proteome (SP136) … (SP131&132) Priorities shared by all organisms 1. 2. 3. Post-Translational Modifications (PTMs) (SP126) 3D structures (SP128) Protein-protein interactions (SP) … UniProtKB: Questions and answers Fortaleza 31.VII.2006 Troubles finding a protein… Dear Folks, I cannot find an entry for human apolipoprotein B100 in Swiss-Prot/TrEMBL. Am I doing something wrong? UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB: Questions and answers Fortaleza 31.VII.2006 In the future, our search engines will cope with dashes, Roman/Arabic figures, etc. In the annotation process, we try to add all synonyms found for a given protein/gene in the literature and other databases. UniProtKB: Questions and answers Fortaleza 31.VII.2006 Troubles finding a protein… I am trying to locate the entry ofr the human beta-2 adrenoreceptor protein, but I don't seem to get any entries. Can you help me to locate this entry, please? The missing synonym was added UniProtKB: Questions and answers Fortaleza 31.VII.2006 Troubles finding a protein… I could not find the information of protein gi/34906958 From the NCBI documentation: => 1. restricted to GenBank (not agreed upon with EMBL and DDBJ) 2. not stable identifiers Of note, cross-references to RefSeq soon available from UniProtKB UniProtKB: Questions and answers Fortaleza 31.VII.2006 http://www.pir.uniprot.org/search/idmapping.shtml UniProtKB: Questions and answers Fortaleza 31.VII.2006 … eventually they find it ! UniProtKB: Questions and answers Fortaleza 31.VII.2006 help #12995 This is my new question: DO all the Swiss-Prot proteins of human and Arabidopsis have CDS nucleotide sequences in database? What should I do to get them ? UniProtKB: Questions and answers Fortaleza 31.VII.2006 From EMBL to TrEMBL CDS From EMBL to TrEMBL CDS From EMBL to TrEMBL Ref. CDS From EMBL to UniProtKB/TrEMBL Ref. CDS In the current UniProt release (8.4 – 25-Jul-2006), there are 8’133 UniProtKB/Swiss-Prot entries without cross-references to EMBL/GenBank/DDBJ (over a total of 230’133 entries – 3.5%). UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB: Questions and answers Fortaleza 31.VII.2006 Fortaleza http://www.ebi.ac.uk/swissprot/Submissions/submissions.html UniProtKB: Questions and answers 31.VII.2006 help #12995 This is my new question: DO all the Swiss-Prot proteins of human and Arabidopsis have CDS nucleotide sequences in database? What should I do to get them ? UniProtKB: Questions and answers Fortaleza 31.VII.2006 I found that the UNIPROT entry for human MAPKAKK3 is still a TREMBL entry (since 1996) and could not be found in SWISSPROT. Is there a specific reason why certain entries do not enter the SWISSPROT section and get an'correct UNIPROT ID' ? UniProtKB: Questions and answers Fortaleza 31.VII.2006 I found that the UNIPROT entry for human MAPKAKK3 is still a TREMBL entry (since 1996) and could not be found in SWISSPROT. Is there a specific reason why certain entries do not enter the SWISSPROT section and get an'correct UNIPROT ID' ? MAPKAKK3 is not a valid gene name; the corresponding TrEMBL entry was not found and could not be annotated. Please use the update request form (or cite accession numbers)! UniProtKB: Questions and answers Fortaleza 31.VII.2006 I found that the UNIPROT entry for human MAPKAKK3 is still a TREMBL entry (since 1996) and could not be found in SWISSPROT. Is there a specific reason why certain entries do not enter the SWISSPROT section and get an'correct UNIPROT ID' ? UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB: From TrEMBL to Swiss-Prot and ~60 uperannotators at SIB and EBI supported by a dedicated programming team UniProtKB: From TrEMBL to Swiss-Prot Sequence merge & analysis High performance bioinformatics tools Sequence annotation 1 gene / 1 species = 1 Swiss-Prot entry Alternative splicing ? Same gene ? Polymorphisms ? Alternative initiation ? RNA editing ? Usage of an alternative promoter ? Fragment ? Sequencing errors ? Selenocysteine ? -> Annotation and documentation of all the differences UniProtKB: From TrEMBL to Swiss-Prot Sequence merge & analysis High performance bioinformatics tools Literature information (>1’700 journals cited) Databases and external scientific expertise Annotation and sequence check X In order to avoid redundancy, once manually annotated and integrated into Swiss-Prot, the entry is deleted from TrEMBL Dear Curator, I am the main author of the paper describing two new phopshorylation sites for human growth hormone (P01241) published in Proteomics 4:587-598(2004). One of two phosphorylation sites, ser 176 described by us in the paper is not listed in the expasy web site. If the curator simply missed the site, please make the necessary update. If ser 176 was not included in the table feature for other reasons, please let us know. UniProtKB: Questions and answers Fortaleza 31.VII.2006 www.expasy.org The reference has been added… … and the modifications described UniProtKB: Questions and answers Fortaleza 31.VII.2006 Searching UniProtKB/Swiss-Prot I wish to retrieve separately, all the bacteria and viruses protein sequences with virulence factors, but what I manage to get when i type "virulence" as a keyword are all the protein sequences with virulence as a keyword. Are the sequences i got here only from bacterial and virus? Any other organisms have this virulence factors? How could I specified the sequences,based on viral and bacterial virulense factors? I ll be really appreciated if you could help me. Thank you. UniProtKB: Questions and answers Fortaleza 31.VII.2006 Currently: Sequence Retrieval System (SRS) UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB: Questions and answers Fortaleza 31.VII.2006 (PR#6943) Dear Sir/Madame, I have a question concerning selection of data from UniProt protein database. I wonder if there are any examples of two or more protein entries, which concern exactly the same protein of two or more individuals representing the same species. In other words, I would like to know, if each protein of a given species is represented by exactly one amino acids sequence. If there are some proteins of a given species which are represented by more than one amino acids sequence, which line of the entry should I use to group such entries together? UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB/Swiss-Prot is non-redundant: One Swiss-Prot entry All protein products encoded by one gene in one species (including fragments, variations/polymorphisms, splice variants, sequencing errors…) UniProtKB: Questions and answers Fortaleza 31.VII.2006 Proteome Genome ~ 1'000'000 human proteins ~ 25’000 human genes (with polymorphisms) Post-translational modifications (PTMs) alternative promoter usage alternative splicing mRNA editing etc. Transcriptome ~ 100’000 human transcripts Increase in complexity UniProtKB: Questions and answers Fortaleza 31.VII.2006 - 13 sequences (complete or partial) - derived from mRNA (n=6) or genomic DNA (n=7) Multiple alignment of the C-terminus of available GCR sequences Annotation of the sequence differences Sequencing error (frameshift) ? Alternative splicing ? Polymorphism ? Disease mutation ? Sequencing error (conflict) ? RNA editing ? UniProtKB: Questions and answers Fortaleza 31.VII.2006 Multiple alignment of C-terminus of the available GCR sequences UniProtKB: Questions and answers Fortaleza 31.VII.2006 Where to find the annotation about alternative splicing in UniProtKB/Swiss-Prot ? UniProtKB: Questions and answers Fortaleza 31.VII.2006 Identifier & accession nr. View « by default » on the ExPASy server (ID, AC, DT) Protein and gene names Taxonomy (DE, GN, OC, OS, OG) Cross-references (DR) References (RN, RP, RC, RX, RA, RL) Keywords (KW) Sequence description (Feature Table) Comments (CC) Sequence (SQ) UniProtKB: Questions and answers Fortaleza 31.VII.2006 Identifier & accession nr. View « by default » on the ExPASy server (ID, AC, DT) Protein and gene names Taxonomy (DE, GN, OC, OS, OG) Cross-references (DR) References (RN, RP, RC, RX, RA, RL) Keywords (KW) Sequence description (Feature Table) Comments (CC) Sequence (SQ) UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB: Questions and answers Fortaleza 31.VII.2006 P04150 (GCR_HUMAN) … All the alternative sequences are available for Blast searches and protein identification tools (on the ExPASy server). UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB: Questions and answers Fortaleza 31.VII.2006 Currently in UniProtKB/Swiss-Prot, for Homo sapiens, 14’445 entries (~ as many genes) 7’975 alternative splicing isoforms -> 22’420 human sequences described not taking into account other diversity generating events… UniProtKB: Questions and answers Fortaleza 31.VII.2006 How to download the sequences ? UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB: Questions and answers Fortaleza 31.VII.2006 UniProtKB: Questions and answers Fortaleza 31.VII.2006 And if bioinformatics is not funded properly, we could start a new business… Dear Sirs, We need deacetylase for the following purposes: 1. Deacetylation of fiber obtained from chitin. 2. Chitin deacetylation for obtaining chitosan oligosaccahrides. Evidently, it will be different types of deacetylase, because in case of the fiber decrease of molecular weight is not allowed, while in case of chitin deacetylation it is allowable and even desirable for oligomerisation of the product during deacetylation. We ask you to send us the example Deacetylase for chitin and its price. Dear, At this moment I am looking for : bovine TGF beta1 I saw in web that you have this product with part# P18341 Could you inform me the price and delivery time ? UniProtKB: Questions and answers Fortaleza 31.VII.2006