* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction to biological databases
Vectors in gene therapy wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Public health genomics wikipedia , lookup
Gene desert wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Pathogenomics wikipedia , lookup
Microevolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Genome evolution wikipedia , lookup
Metagenomics wikipedia , lookup
Protein moonlighting wikipedia , lookup
Designer baby wikipedia , lookup
Point mutation wikipedia , lookup
Genome editing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene expression profiling wikipedia , lookup
Helitron (biology) wikipedia , lookup
Basic Genomic Characteristic AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○ EMBL Nucleotide Sequence Database ○ DDBJ For Protein sequences ○ UniProtKB NCBI Reference Sequence (RefSeq) Nucleotide sequence DB The 3 databases form an international collaboration. Each of the three groups collects a portion of the total sequence data reported worldwide, and all new and updated database entries are exchanged between the groups on a daily basis. You do not need to check all of them! Nucleotide sequence DB Nucleotide sequence DB Nucleotide sequence DB Nucleotide sequence DB Nucleotide sequence DB NCBI Entrez Present all the information available at NCBI for a gene. Entrez is a integrated searching tool across all the databases Genome Browsers NCBI Sequence Viewer UCSC Genome Browser ENSEMBL NCBI Sequence Viewer This is an example view of the human beta globin region on chr11 UCSC Genome Browser ENSEMBL ENSEMBL – genome view ENSEMBL – Gene tree NCBI OMIM database Nucleotide databases and Genome Browser provide information on the gene nucleotide sequence (exon, intron, alternative splicing sites…) but give you very few information on gene function OMIM database provide a summary of all the literature concerning a gene. NCBI OMIM database Protein Databases Protein databases provide useful information about the function of gene: e.g. conserved protein domains,… UniProt is the reference database Interpro offer automatic protein annotation based on conserved domains RefSeq Protein databases - UniProt Protein databases - UniProt Protein databases - UniProt Protein databases - UniProt Similarity search If your gene has no protein information Protein sequence available BLASTP against a non redundant protein database Protein sequence unavailable BLASTX against a non redundant protein database Protein 3D structure Many proteins have the 3D structure determined. Biggest databases are: PDB NCBI Structure Group Dali They offer tools for the visualization PDB database The visualization tools allows you to see the structure and the ligands (if presents), rotate the image and zoom-in 3D structure prediction Structure still available for a limited number of proteins Effort to predict protein structures based on sequences similarities Still not very accurate! SwissModel PSIPRED PredictProtein Swiss-Model Protein interaction databases AIM: find proteins that interact with your target IntAct: EBI resource to find interctors BioGRID: is a freely available interaction database from model organisms and humans. IntAct Regulatory and metabolic pathways the classic “KEGG”: miRNA specific resources Databases: miRNAMap: it present several useful information such as secondary structure, tissue specific expression and predicted target gene HMDD: is specific for disease-miRNA association MiRbase: is a searchable database of published miRNA sequences and annotation. Target Prediction tools: miRecords: is a good repository that shows confirmed target genes and predictions from several other software C. Elegans specific tools WormBase: is the main resource of information on C. elegans. Expression pattern databaseHope lab Expression Pattern Database The Nematode Expression Pattern DataBase Caenorhabditis elegans Genetics and Genomics: provides links to many useful resources for C. elegans Expression databases Allows exploratory analyses of multiple experiments Experiments need to be linked Require much information about how experiments where conducted = sources of variation Very different to genomic databases MIAME standard MIAME Experimental design Microarray design Extraction, preparation and labelling Hybridisation conditions Measurements: images, quantifications, parameters Systematic error adjustments and transformations MIAME Gene Expression Omnibus NCBI administered ~280,000 samples >100 organisms >1,000,000,000 measurements Gene Expression Omnibus Gene Expression Omnibus Gene Expression Omnibus Gene Expression Omnibus Gene Expression Omnibus ArrayExpress EBI administered >7000 experiments Provide p-values Bioconductor package ArrayExpress ArrayExpress ArrayExpress ArrayExpress ArrayExpress GEO and ArrayExpress Databases provide: The raw data for each hybridization (e.g., CEL or GPR files) The final processed (normalized) data for the set of hybridizations in the experiment (study) (e.g., the gene expression data matrix used to draw the conclusions from the study) The essential sample annotation including experimental factors and their values (e.g., compound and dose in a dose response experiment) The experimental design including sample data relationships (e.g., which raw data file relates to which sample, which hybridizations are technical, which are biological replicates) Sufficient annotation of the array (e.g., gene identifiers, genomic coordinates, probe oligonucleotide sequences or reference commercial array catalog number) The essential laboratory and data processing protocols (e.g., what normalization method has been used to obtain the final processed data) Problems: Difficult compare experiments Significant genes not highlighted Poor results visualization ArrayExpress is trying with its Atlas to solve this problems Genevestigator It is JAVA visualization tool that summarizes results from thousands of high quality transcriptomic experiments Much easier to compare samples Open access to only some of the data and 1 probeset/gene Genevestigator ONCOMINE