* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download center - University of California, Santa Cruz
Genomic imprinting wikipedia , lookup
Oncogenomics wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Transposable element wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Public health genomics wikipedia , lookup
Gene therapy wikipedia , lookup
Genomic library wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene desert wikipedia , lookup
Gene nomenclature wikipedia , lookup
Human genome wikipedia , lookup
Minimal genome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Microevolution wikipedia , lookup
Helitron (biology) wikipedia , lookup
Human Genome Project wikipedia , lookup
Genome (book) wikipedia , lookup
Genome editing wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
UCSC Genome Tools and Databases Quick Time™a nd a TIFF ( Uncomp res sed) deco mpre ssor are n eede d to s ee this picture . QuickTi me™ and a T IFF (Uncom pressed) decom pressor are needed to see t his pict ure. Jim Kent - Genome Bioinformatics Group University of California Santa Cruz Behind the Genome Browser • ‘Genome’ database, one for each assembly of each genome. – hg17 (human genome assembly 17) – mm6 (mus musculus 6) – canFam1 (canis familiaris 1) • hg17 has 1616 tables, but not really – Some tables split across chromosomes for speed – 228 logical tables – Only ~30 different types of tables Selected fields from related tables results: Ensemble Gene (ensGene) and Superfamily Description (sfDescription). Custom Track Output • Useful for visualizing results of queries in genome browser • The way to produce more complex queries. 681/3329 (20%) of Ensemble not known also not conserved 1728/33,666 (5%) of Ensembl in general not conserved Meta-data behind Table Browser • The trackDb table describes each track. • Table and field descriptions in AutoSql .as files, which also generate SQL code and C code to load/save from database and tabseparated files. • Descriptions of how tables are connected in all.joiner file, which along with joinerCheck program checks database integrity. .as Files - table and field docs table cpgIsland "Describes the CpG Islands" ( string chrom; "Human chromosome or FPC contig" uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome" string name; "CpG Island" uint length; "Island Length" uint cpgNum; "Number of CpGs in island" uint gcNum; "Number of C and G in island" float perCpg; "Percentage of island that is CpG" float perGc; "Percentage of island that is C or G" ) autoSql generates code from these. They also help document. all.joiner - basic example identifier softberryGeneName "Link together Fshgene++ gene structure, peptide, and homolog" $gbd.softberryGene.name $gbd.softberryPep.name $gbd.softberryHom.name • The central concept is an identifier that appears in fields in multiple table, sometimes even multiple databases. • $gbd is a variable that contains a comma-separated list of databases. • An identifier record ends with a blank line. # Genbank/trEMBL Accessions and meaningful subsets thereof identifier genbankAccession external=genbank "Generic Genbank Accession. More specific Genbank accessions follow $gbd.seq.acc identifier bacEndAccession typeOf=genbankAccession "Genbank accession of a BAC end read." $gbd.all_bacends.qName dupeOk $gbd.bacEndPairs.lfNames comma $hg.fishClones.beNames comma minCheck=0.70 typeOf - allows joins between parent and child, but not between siblings. dupeOk - allows more than one row with same identifier in primary table comma - indicates field is comma separated list of identifiers minCheck - indicates only a portion identifiers in field is in the primary table identifier hugoName external=HUGO fuzzy "International Human Gene Identifier" $hg.refLink.name $hg.atlasOncoGene.locusSymbol $hg.kgAlias.alias $hg.kgXref.geneSymbol $hg.refFlat.geneName $hg.jaxOrtholog.humanSymbol hg13,hg15.geneBands.name “Biological” names for human genes are so messy, no validation is done (note ‘fuzzy’ keyword). Other Databases • Genome databases - one for each assembly of each organism: hg17, mm6, canFam1, etc. • hgCentral - home to dbDb and user settings info. One database shared by all web servers. • hgFixed - mostly microarray data. • uniProt - Relationalized SwissProt/trEMBL database. • go - Gene ontology terms and term/gene associations. • genePix - gene image database Gene Pix • Image browser for in-situ and other geneoriented pictures • Hopefully in the long run will have a million images covering almost all vertebrate genes. • (Needs new name, Gene Pix is a microarray analysis program. VisiGene?) Data Sets • Paul Gray - ~1000 mouse transcription factor genes - whole embryo & sections. These are in the database now. • Other potential sources: – – – – German AxelDB frog in situs Japanese NIBB frog in situs (have nice browser) Genepaint.org - mouse stuff EMAGE and Jackson Lab mouse images • From development and other journals, copyright issues. – Nathaniel Heintz BAC expression constructs – Eddy Rubin lab mouse embryos – UCSF cell-localization stuff? Types of images • Whole animal vs. sectioned tissues, vs. single cell. • Single vs. multiple probes within same image. • Single image vs. image series (movies even). • RNA, Antibody, Fusion protein. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Mitotic cell 3 stains Gene Pix Programs • genePixLoad - loads SQL database from a well defined format involving a .ra file and a tab separated file. See genePixLoad.doc • loadMahoney - converts Paul Gray (Mahoney center) spreadsheet and image directory into genePixLoad format • Hg/lib/genePix.c - interface with SQL database. • hgGenePix - cgi script to display images • knownToGenePix - makes table in mm5 (or other) genome database to connect known genes to genePix Ids. Gene Pix Database • Just a single database for all assemblies of all organisms. • A knownToGenePix table in the assembly database. GenePix tables • • • • • • • fileLocation - directory bodyPart - whole, brain etc. sliceType - transverse, sagital treatment - tech details contributor - who done it Journal - scientific journal submissionSet - info about a whole set of images from one author • sectionSet - links together separate sections of same specimen. • Gene - gene info • geneSynonym • Antibody - info on an antibody • probeType - antibody, RNA, fusion protein • Probe - links gene, primers, sequence Ab. • probeColor - color probe is • imageFile - file containing image • Image - a single image. • imageProbe links image and probe Some Anatomy Required QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Especially with slices QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Edinburgh mouse atlas QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Theiler Stages QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Later Stages QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. NIBB Japanese Frog Site QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Earlier Stages QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Who you gonna call? Angie Hinrichs - developer of 2nd and 4th versions of Table Browser. Genome browser hacker extraordinaire. Hiram Clawson - main mouse man at the moment. Developed ‘wiggle’ tracks. Kate Rosenbloom - ENCODE project and multiple alignment display. Bob Kuhn - Software and database quality assurance. David Haussler - Ideas. Money. Comparative genomics. More Acknowledgements • UCSC - Robert Baertsch, Gill Bejerano, Galt Barber, Ron Chao, Mark Diekhans, Jorge Garcia, Patrick Gavin, Rachel Harte, Fan Hsu, Yontoa Lu, Crystal Lynch, Donna Karolchik, Jennifer Jackson, Ann Pace, Jacob Pedersen, Andy Pohl, Katie Pollard, Ali Sultan-Qurraie, Brian Raney, Krishna Roskin, Adam Siepel, Chuck Sugnet, Paul Tatarsky, Daryl Thomas, Heather Trumbower • Penn State - Scott Schwartz, Laura Elnitski, Belinda Giardine, Ross Hardison, Minmei Hou, Webb Miller, Anton Nekrutenko • Funding - NHGRI, HHMI, NCI, UCSC