Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
NCBI Molecular Biology Resources part 2 NCBI Nov 6, 2001 WWW BLAST NCBI Web BLAST NCBI Protein Databases nr Non-redundant GenBank CDS translations+PDB+ SwissProt+SPupdate+PIR 567,860 sequences; 178,533,065 letters swissprot pdb Non-redundant SwissProt sequences 88,934 sequences; 32,001,993 letters NCBI PDB protein sequences 22,726 sequences; 5,068,254 letters Nucleotide Databases GenBank+EMBL+DDBJ+PDB sequences 759,631 sequences; 2,714,918,430 letters dbest Expressed Sequence Tags (EST Division) 7,309,361 sequences; 3,100,444,103 letters htgs High-Throughput Genome Sequences (HTG Division) 84,374 sequences; 4,355,661,355 letters NCBI nr(nt) Protein BLAST Page Identifier or sequence >Mutated in Colon Cancer IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILER VQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMVKSTTSLTSSSTSGSS DKVYAHQMVRTDSREQKLDAFLQPLSKPLSS Protein database NCBI swissprot BLAST Formatting Page NCBI BLAST Output: Graphic NCBI mouse over BLAST Output: Descriptions Taxonomy Reports Sorted by E value Link to record in Entrez TaxBLAST Score (bits) Sequences producing significant alignments: sp|P40692|MLH1_HUMAN sp|P38920|MLH1_YEAST sp|P44494|MUTL_HAEIN sp|P23367|MUTL_ECOLI sp|P14161|MUTL_SALTY sp|P49850|MUTL_BACSU sp|P14160|HEXB_STRPN sp|P70754|MUTL_AQUPY sp|P54280|PMS1_SCHPO sp|O67518|MUTL_AQUAE sp|P54278|PMS2_HUMAN sp|P54279|PMS2_MOUSE sp|P54277|PMS1_HUMAN sp|P02239|LGB1_LUPLU sp|P14242|PMS1_YEAST MUTL PROTEIN HOMOLOG 1 (DNA MUTL PROTEIN HOMOLOG 1 (DNA DNA MISMATCH REPAIR PROTEIN DNA MISMATCH REPAIR PROTEIN DNA MISMATCH REPAIR PROTEIN DNA MISMATCH REPAIR PROTEIN DNA MISMATCH REPAIR PROTEIN DNA MISMATCH REPAIR PROTEIN DNA MISMATCH REPAIR PROTEIN DNA MISMATCH REPAIR PROTEIN PMS1 PROTEIN HOMOLOG 2 (DNA PMS1 PROTEIN HOMOLOG 2 (DNA PMS1 PROTEIN HOMOLOG 1 (DNA LEGHEMOGLOBIN I DNA MISMATCH REPAIR PROTEIN MISMATCH MISMATCH MUTL MUTL MUTL MUTL HEXB MUTL PMS1 MUTL MISMATCH MISMATCH MISMATCH REPAIR... REPAIR... 2 x 10-68 PMS1 REPAIR... REPAIR... REPAIR... 255 67 52 45 43 38 38 36 35 35 33 33 31 28 28 e value cut-off = 10 E Value 2e-68 2e-11 6e-07 8e-05 2e-04 0.006 0.010 0.031 0.069 0.069 0.20 0.27 1.0 6.8 9.0 TaxBLAST: Taxonomy Reports Homo sapiens (human) sp|P40692|MLH1_HUMAN sp|P54278|PMS2_HUMAN sp|P54277|PMS1_HUMAN 1459 168 132 0.0 3e-41 2e-30 Saccharomyces cerevisiae (baker's yeast) [fungi] taxid 4932 sp|P38920|MLH1_YEAST MUTL PROTEIN HOMOLOG 1 (DNA MISMATCH ... sp|P14242|PMS1_YEAST DNA MISMATCH REPAIR PROTEIN PMS1 487 152 5e-137 2e-36 Escherichia coli [enterobacteria] taxid 562 sp|P23367|MUTL_ECOLI DNA MISMATCH REPAIR PROTEIN MUTL 208 3e-53 Haemophilus influenzae [g-proteobacteria] taxid 727 sp|P44494|MUTL_HAEIN DNA MISMATCH REPAIR PROTEIN MUTL 208 4e-53 Salmonella typhimurium [enterobacteria] taxid 602 sp|P14161|MUTL_SALTY DNA MISMATCH REPAIR PROTEIN MUTL 200 9e-51 Streptococcus pneumoniae [low GC Gram+] taxid 1313 sp|P14160|HEXB_STRPN DNA MISMATCH REPAIR PROTEIN HEXB 189 1e-47 Bacillus subtilis [low GC Gram+] taxid 1423 sp|P49850|MUTL_BACSU DNA MISMATCH REPAIR PROTEIN MUTL 187 7e-47 Rickettsia prowazekii [a-proteobacteria] taxid 782 sp|Q9ZC88|MUTL_RICPR DNA MISMATCH REPAIR PROTEIN MUTL 178 3e-44 NCBI [mammals] taxid 9606 MUTL PROTEIN HOMOLOG 1 (DNA MISMATCH ... PMS1 PROTEIN HOMOLOG 2 (DNA MISMATCH ... PMS1 PROTEIN HOMOLOG 1 (DNA MISMATCH ... BLAST Output: Alignments >sp|P40692|MLH1_HUMAN MUTL PROTEIN HOMOLOG 1 (DNA MISMATCH REPAIR PROTEIN MLH1) Length = 756 Score = 255 bits (645), Expect = 2e-68 low complexity sequence filtered Identities = 126/140 (90%), Positives = 126/140 (90%) <alignment edited for brevity> Query: 61 RMYFTQTLLPGLAGPSGEMVKXXXXXXXXXXXXXXDKVYAHQMVRTDSREQKLDAFLQPL 120 RMYFTQTLLPGLAGPSGEMVK DKVYAHQMVRTDSREQKLDAFLQPL Sbjct: 341 RMYFTQTLLPGLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPL 400 >sp|P23367|MUTL_ECOLI DNA MISMATCH REPAIR PROTEIN MUTL Length = 615 Score = 44.5 bits (103), Expect = 8e-05 Identities = 25/59 (42%), Positives = 33/59 (55%), Gaps = 8/59 (13%) LPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHF-----LHE---ESILERVQQHIESKL 54 L + P L LEI P VDVNVHP KHEV F +H+ + +L +QQ +E+ L Sbjct: 280 LGADQQPAFVLYLEIDPHQVDVNVHPAKHEVRFHQSRLVHDFIYQGVLSVLQQQLETPL 338 NCBI Query: 4 Results from nr Sequences producing significant alignments: Score (bits) E Value NCBI gb|AAA85687.1| (U17857) hMLH1 gene product [Homo sapiens] 238 2e-62 gb|AAA17374.1| (U07418) human homolog of E. coli mutL gene ... 238 2e-62 ref|NP_000240.1| mutL homolog 1 >gi|730028|sp|P40692|MLH1_H... 238 2e-62 gb|AAB38506.1| (U80054) mismatch repair protein [Rattus nor... 217 4e-56 >ref|NP_000240.1| mutL homolog 1 gb|AAF64514.1|AF250844_1 (AF250844) MutL homolog 1 protein ... 217 4e-56 sp|P40692|MLH1_HUMAN MUTL PROTEIN HOMOLOG 1 (DNA MISMATCH REPAIR PROTEIN MLH1) gb|AAF59117.1| (AE003838) Mlh1 gene product [Drosophila mel... 129 1e-29 pir||S43085 DNA mismatch repair protein MLH1 - human gb|AAC19117.1| (AF068257) mutL homolog [Drosophila melanoga... 129 1e-29 gb|AAC50285.1| (U07343) hMLH1 [Homo sapiens] emb|CAA10163.1| (AJ012747) MLH1 protein [Arabidopsis thalia... 84 4e-16 gb|AAA82079.1| (U40978) DNA mismatch repair protein homolog [Homo sapiens] emb|CAB66448.1| putative DNA mismatch repair pro... 73 1e-12 prf||2007430A DNA(AL136536) mismatch repair protein [Homo sapiens] ref|NP_013890.1| 72 2e-12 Length = MutL 756 homolog, forms a complex with Pms1p a... gb|AAA16835.1| (U07187) Mlh1p [Saccharomyces cerevisiae] 71 4e-12 sp|P44494|MUTL_HAEIN DNA Expect MISMATCH REPAIR PROTEIN MUTL >gi|1... 55 2e-07 Score = 238 bits (601), = 2e-62 gb|AAB09596.1| (U71053) DNAPositives mismatch =repair protein 49 1e-05 Identities = 117/131 (89%), 117/131 (89%) [Thermo... pir||H72427 DNA mismatch repair protein - Thermotoga mariti... 49 1e-05 Query: 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLL 60 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLL Sbjct: 276 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLL 335 tblastn Results Against ESTs gb|N32729|N32729 yx75d09.r1 Homo sapiens cDNA clone 267569 5' similar to SW:MLH1_HUMAN P40692 MUTL PROTEIN HOMOLOG 1 ;. Length = 537 Score = 221 bits (557), Expect(2) = 4e-60 Identities = 120/146 (82%), Positives = 122/146 (83%), Gaps = 3/146 (2%) Frame = +3 Query: 384 VRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDISSGRARQQDEEMLELPAPAEVAAK 443 VRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDISSGRARQQDEEMLELPAPAEVAAK Sbjct: 3 VRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDISSGRARQQDEEMLELPAPAEVAAK 182 Query: 444 NQSLEGDTTKGTSEMSEKRGPTSSNPRKRHRXXXXXXXXXXXXRKEMTAACTPRRRIINL 503 NQSLEGDTTKGTSEMSEKRGPTSSNPRKRHR RKEMTAACTPRRRIINL Sbjct: 183 NQSLEGDTTKGTSEMSEKRGPTSSNPRKRHREDSDVEMVEDDSRKEMTAACTPRRRIINL 362 Query: 504 TSVLSL-QEEINEQG--HEVLREMLHNHS 529 T + QEEIN G + LHNHS Sbjct: 363 T*CFGVSQEEIN*AGXMRVLPGRXLHNHS 449 Query: 533 CVNPQWALAQHQTKLYLLNTTKL 555 C +P WAL QH T+ L NTTKL Sbjct: 463 CESPSWALEQHPTQFXLFNTTKL 531 NCBI Score = 35.2 bits (79), Expect(2) = 4e-60 Identities = 14/23 (60%), Positives = 16/23 (68%) Frame = +1 Results against PDB - Finding a model template Sequences producing significant alignments: pdb|1B62|A pdb|1BKN|A pdb|1B63|A pdb|2GDM| (bits) Value Chain A, Mutl Complexed With Adp 45 Chain A, Crystal Structure Of An N-Terminal 40kd..45 Chain A, Mutl Complexed With Adpnp 43 Leghemoglobin (Oxy) >gi|999936|pdb|1GDJ| Leg..27 1e-05 1e-05 4e-05 2.0 NCBI Cn3D BLAST Alignment Alignment by BLAST 2 Sequences NCBI PSI-BLAST Confirming relationships of purine nucleotide metabolism proteins NCBI PSI BLAST >gi|113340|sp|P03958|ADA_MOUSE ADENOSINE DEAMINASE (ADENOSINE AMINOH MAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGFLAKFDYY VIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVDLVNQGLQ EQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAYEGAVKNG RTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKENMHFEVCPWSSYLTGAWDPKTTH VRFKNDKANYSLNTDDPLIFKSTLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKKELLERLY NCBI PSI RESULTS: Initial BLAST Run NCBI First PSSM Search NCBI Other purine nucleotide metabolizing enzymes not found by ordinary BLAST Third PSSM Search: Convergence Just below threshold, another nucleotide metabolism enzyme NCBI PHI BLAST >gi|231729|sp|P30429|CED4_CAEEL CELL DEATH PROTEIN 4 MLCEIECRALSTAHTRLIHDFEPRDALTYLEGKNIFTEDHSELISKMSTRLERIANFLRIYRRQASE LIDFFNYNNQSHLADFLEDYIDFAINEPDLLRPVVIAPQFSRQMLDRKLLLGNVPKQMTCYIREYHV IKKLDEMCDLDSFFLFLHGRAGSGKSVIASQALSKSDQLIGINYDSIVWLKDSGTAPKSTFDLFTDI LKSEDDLLNFPSVEHVTSVVLKRMICNALIDRPNTLFVFDDVVQEETIRWAQELRLRCLVTTRDVEI ASQTCEFIEVTSLEIDECYDFLEAYGMPMPVGEKEEDVLNKTIELSSGNPATLMMFFKSCEPKTFEK [GA]xxxxGK[ST] NCBI Conserved Domain Search NCBI >gi|7290263|gb|AAF45724.1| CG3954 gene product [alt 2] [Drosop MSSRRWFHPTISGIEAEKLLQEQGFDGSFLARLSSSNPGAFTLSVRRGNEVTHIKIQNNGDF FDLYGGEKFATLPELVQYYMENGELKEKNGQAIELKQPLICAEPTTERWFHGNLSGKEAEKL ILERGKNGSFLVRESQSKPGDFVLSVRTDDKVTHVMIRWQDKKYDVGGGESFGTLSELIDHY KRNPMVETCGTVVHLRQPFNATRITAAGINARVEQLVKGGFWEEFESLQQDSRDTFSRNEGY Drosophila Corkscrew CDS KQENRLKNRYRNILPYDHTRVKLLDVEHSVAGAEYINANYIRLPTDGDLYNMSSSSESLNSS VPSCPACTAAQTQRNCSNCQLQNKTCVQCAVKSAILPYSNCATCSRKSDSLSKHKRSESSAS SSPSSGSGSGPGSSGTSGVSSVNGPGTPTNLTSGTAGCLVGLLKRHSNDSSGAVSISMAERE from genome CDD Results Sequences producing significant alignments: Score E (bits) value 3e-63 1e-08 3e-63 4e-13 9e-23 9e-23 1e-18 4e-15 2e-15 1e-13 NCBI gnl|Pfam|pfam00102 Y_phosphatase, Protein-tyrosine phosphatase 236 gnl|Pfam|pfam00102 Y_phosphatase, Protein-tyrosine phosphatase 55.4 gnl|Smart|DSPc Dual specificity phosphatase, catalytic domain 236 gnl|Smart|DSPc Dual specificity phosphatase, catalytic domain 70.2 gnl|Smart|PTPc Protein tyrosine phosphatase, catalytic domain 102 gnl|Smart|PTPc_DSPcProtein tyrosine phosphatase, catalytic domain, un...102 gnl|Smart|SH2 Src homology 2 domains; Src homology 2 domains bi...88.2 gnl|Smart|SH2 Src homology 2 domains; Src homology 2 domains bind 76.9 gnl|Pfam|pfam00017 SH2, Src homology domain 2 78.0 gnl|Pfam|pfam00017 SH2, Src homology domain 2 71.8 Specialized BLAST Pages Microbial Genomes Trace Archive NCBI Human Genome Microbial Genomes BLAST >APE0122 MVGVFGRLSRHVWVKRWYSILWAPWRMKYIKQAGSREGCVFCEAPSMGDDAKAYNSGHIMVTPYRH VAELEDLTMDEIVEMAKLVRASVKALKRVYAPHGFNIGVNVPRWRGDSNFMLTVGGTKVIPESLED TFKKLKPAVEEEARKEGV Hits to Unfinished Genome NCBI Human Genome BLAST NCBI >gi|11877232|emb|AJ289857.1|HSA289857 Homo sapiens mRNA for adracalin (ADRACALA AATCTAGCCCGGGAACCGAGTTGCGGGAGTGCGGTCTGTGCCGTTCCGGCCAGGAGTTTGCCGACTGCAG ACGTCCTGCGAACCGGCAAGATGTGCTCTCTGGGGTTGTTCCCTCCTCCACCGCCTCGGGGTCAAGTCAC CCTATATGAGCACAATAACGAGCTGGTGACGGGCAGTAGCTATGAGAGCCCGCCCCCCGACTTCCGGGGC CAGTGGATCAATCTTCCTGTCCTACAACTGACAAAGGATCCCCTAAAGACCCCTGGAAGGCTGGACCATG GCACAAGAACTGCCTTCATCCATCACCGGGAGCAAGTGTGGAAGAGATGCATCAACATTTGGCGTGATGT GGGCCTTTTTGGGGTGCTAAATGAAATTGCAAACTCAGAAGAAGAGGTGTTTGAGTGGGTGAAGACGGCA TCCGGCTGGGCCCTGGCACTCTGTCGATGGGCCTCTTCCCTCCATGGGTCCCTGTTCCCCCATCTGTCTC Genomic Context of BLAST Hits NCBI Shotgun Reads NCBI Trace Archive BLAST - MEGABLAST >gi|563511|emb|X81593.1|MMFKHN M.musculus mRNA for winged CAGACGGTCGGAGCTCCTGGCCCCCCAGACCCAGGCCCCCACGCCGACCTGCTTCAC TTCTTCGAGGCCAGGACTGGGTGATGGTGTCGCTACTCCCTCCGCAGTCTGACGTCA CACCCGACTGGAGGGCGAACCCCAAGGGGACCTCATGCAGGCTCCGGGCCTCCCAGA CAGAACAAGCATGCTAACTTCAGCTGCTCGTCGTTTGTGCCTGACGGCCCTCCAGAG NCBI Whole Genome Shotgun NCBI NCBI Genomic Resources Microbial Genomes NCBI The Draft Human Genome Microbial Genomes in GenBank Viruses >650 Archaea 11 Bacteria 50 Eukaryotae 1 Sept. 26 2001 Bacterial Genomes NCBI M. tuberculosis Complete Genome NCBI Coding Regions NCBI Genome Annotations NCBI M. tuberculosis vs. E.coli COGS NCBI Complex Genomes in GenBank • Caenorhabditus elegans • Drosophila melanogaster • Homo sapiens • Arabidopsis thaliana The Human Genome The NCBI annotation effort NCBI The Draft Human Genome NCBI Human Genome Resources LocusLink: a central resource Human Genome BLAST Human Maps UniGene: Expressed Sequences What Data is Available? NCBI assembled annotated genomic contigs • • Genome project data Other primary data Reference sequences - mRNA, proteins, transcripts Genome Scan gene models Mapped variation data Integrated maps - RH, genetic, cytogenetic, and sequence Clustered and mapped expressed sequences Links to outside data sources NCBI How to access it? Resource Human genome BLAST LocusLink Map Viewer UniGene NCBI Type of query Sequence Similarity Gene name Map Location Database ID LocusLink A single query interface to … UniGene HomoloGene PubMed •Sequences - RefSeqs Map Viewer GenBank OMIM Full inositol report polyphosphate 1 phosphatase •Maps – the Human Genome Map RefSeq Available for - RH Hs human - Cytogenetic GenBank Accessions Mm mouse Rn rat -Assembled Genomic Sequence Dr zebrafish dbSNP •Genome annotations Dm fruit fly •Entrez links What is UniGene? A gene-oriented view of sequence entries •MegaBlast based automated sequence clustering •Nonredundant set of gene oriented clusters •Each cluster a unique gene •Information on tissue types and map locations •Includes well-characterized genes and novel ESTs reagents http://www.ncbi.nlm.nih.gov/UniGene/ NCBI •Useful for gene discovery and selection of mapping EST hits INPP1 mRNA INPP1 mRNA NCBI Hs UniGene Statistics 67,109 1,145,547 1,088,566 + 631,105 ---------2,932,237 mRNAs + gene CDSs EST, 3'reads EST, 5'reads EST, other/unknown UniGene Build 140 Sept 17th, 2001 total sequences in clusters 20,200 95,289 19,010 sets contain at least one known gene sets contain at least one EST 80% uncharacterized transcripts sets contain both genes and ESTs NCBI Final Number of Clusters (sets) =============================== 96,479 sets total UniGene Collections Sept 26, 2001 Sequences Clusters Animals human 2,932,237 96,479 Mus musculus Rattus norvegicus Danio rerio Bos taurus Xenopus laevis mouse rat zebrafish cow frog 1,825,043 298,003 56,938 87,310 58,133 89,242 59,265 10,642 7,367 11,984 Plants Arabidopsis thaliana Oryzia sativa Triticum aestivum Hordeum vulgare Zea mays thale cress rice wheat barley maize (corn) 131,068 47,841 31,826 34,812 69,231 25,997 12,836 2,744 4,041 7,161 NCBI Homo sapiens Cluster Hs.32309 Links and Homology NCBI Cluster Hs.32309 Mapping Data NCBI Cluster Hs.32309 Expression Data NCBI Cluster Hs.32309 Sequences NCBI Human Genome Map Viewer SEQUENCE MAPS •Clone •Contig •UniGene (EST) •GenBank •GenomeScan •Gene_Sequence •STS •Variation CYTOGENETIC MAPS •Ideogram •FISHClone •Genes_Cytogenetic •Mitelman •Morbid GENETIC_LINKAGE_MAPS •Genethon •Marshfield RADIATION_HYBRID_MAPS •GeneMap99_G3 •GeneMap99_GB4 •NCBI_RH •Stanford_G3 •Whitehead_RH Fanconi Renal Syndrome Finding a candidate gene NCBI Fanconi Syndrome D15S182 and D15S143 NCBI Finding Map Location D15S182 OR D15S143 ePCR- sequence map NCBI Genetic Map Map Viewer Display ePCR results (bp) Marshfield map (cM) NCBI Annotated Genes (bp) Sequence Maps Annotated Genes - master EST hits (UniGene) Genome Scan models Genomic contigs Finding Candidate Gene LocusLink Disease Gene Candidate Supported by •EST hits •Gene predictions NCBI LocusLink Entry But what could it do? BLink BLink Results Hit to yeast Entrez: Saccharomyces RefSeq Protein Saccharomyces Genome Database Yeast Homologue Function Human Function? Regulator of membrane pump expression in renal tubules? Polymorphisms Mouse Genome Resources - Sequencing BLAST htgs Trace Archive NCBI Service Addresses •General Help •Updates to records •Questions about BLAST •Sequin submissions •Batch Submissions [email protected] [email protected] [email protected] [email protected] [email protected] E-mail Servers [email protected] [email protected] NCBI •BLAST Server •Query Server