Download Introduction to biological databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Vectors in gene therapy wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Public health genomics wikipedia , lookup

Gene desert wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Pathogenomics wikipedia , lookup

Microevolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome evolution wikipedia , lookup

Metagenomics wikipedia , lookup

Protein moonlighting wikipedia , lookup

Designer baby wikipedia , lookup

Point mutation wikipedia , lookup

Genome editing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genomics wikipedia , lookup

NEDD9 wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Helitron (biology) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Basic Genomic Characteristic

AIM: to collect as much general
information as possible about your gene:
 Nucleotide sequence Databases
○ NCBI GenBank
○ EMBL Nucleotide Sequence Database
○ DDBJ
 For Protein sequences
○ UniProtKB
 NCBI Reference Sequence (RefSeq)
Nucleotide sequence DB
The 3 databases form an international
collaboration. Each of the three groups
collects a portion of the total sequence
data reported worldwide, and all new
and updated database entries are
exchanged between the groups on a
daily basis.
 You do not need to check all of them!

Nucleotide sequence DB
Nucleotide sequence DB
Nucleotide sequence DB
Nucleotide sequence DB
Nucleotide sequence DB
NCBI Entrez
Present all the
information available
at NCBI for a gene.
Entrez is a integrated
searching tool across
all the databases
Genome Browsers

NCBI Sequence Viewer

UCSC Genome Browser

ENSEMBL
NCBI Sequence Viewer
This is an example view of the human beta globin region on chr11
UCSC Genome Browser
ENSEMBL
ENSEMBL – genome view
ENSEMBL – Gene tree
NCBI OMIM database
Nucleotide databases and Genome
Browser provide information on the gene
nucleotide sequence (exon, intron,
alternative splicing sites…) but give you
very few information on gene function
 OMIM database provide a summary of
all the literature concerning a gene.

NCBI OMIM database
Protein Databases

Protein databases provide useful
information about the function of gene:
e.g. conserved protein domains,…
UniProt is the reference database
 Interpro offer automatic protein
annotation based on conserved domains
 RefSeq

Protein databases - UniProt
Protein databases - UniProt
Protein databases - UniProt
Protein databases - UniProt
Similarity search

If your gene has no protein information

Protein sequence available
 BLASTP against a non redundant protein
database

Protein sequence unavailable
 BLASTX against a non redundant protein
database
Protein 3D structure

Many proteins have the 3D structure
determined. Biggest databases are:
 PDB
 NCBI Structure Group
 Dali

They offer tools for the visualization
PDB database
The visualization tools
allows you to see the
structure and the
ligands (if presents),
rotate the image and
zoom-in
3D structure prediction
Structure still available for a limited
number of proteins
 Effort to predict protein structures based
on sequences similarities
 Still not very accurate!

SwissModel
 PSIPRED
 PredictProtein

Swiss-Model
Protein interaction databases

AIM: find proteins that interact with your
target

IntAct: EBI resource to find interctors

BioGRID: is a freely available interaction
database from model organisms and
humans.
IntAct
Regulatory and metabolic pathways

the classic “KEGG”:
miRNA specific resources

Databases:
 miRNAMap: it present several useful information
such as secondary structure, tissue specific
expression and predicted target gene
 HMDD: is specific for disease-miRNA
association
 MiRbase: is a searchable database of published
miRNA sequences and annotation.

Target Prediction tools:
 miRecords: is a good repository that shows
confirmed target genes and predictions from
several other software
C. Elegans specific tools
WormBase: is the main resource of
information on C. elegans.
 Expression pattern databaseHope lab
 Expression Pattern Database
 The Nematode Expression Pattern
DataBase
 Caenorhabditis elegans Genetics and
Genomics: provides links to many useful
resources for C. elegans

Expression databases
Allows exploratory analyses of multiple
experiments
 Experiments need to be linked
 Require much information about how
experiments where conducted = sources
of variation
 Very different to genomic databases
 MIAME standard

MIAME
Experimental design
 Microarray design
 Extraction, preparation and labelling
 Hybridisation conditions
 Measurements: images, quantifications,
parameters
 Systematic error adjustments and
transformations

MIAME
Gene Expression Omnibus
NCBI administered
 ~280,000 samples
 >100 organisms
 >1,000,000,000
measurements

Gene Expression Omnibus
Gene Expression Omnibus
Gene Expression Omnibus
Gene Expression Omnibus
Gene Expression Omnibus
ArrayExpress
EBI administered
 >7000 experiments
 Provide p-values
 Bioconductor
package

ArrayExpress
ArrayExpress
ArrayExpress
ArrayExpress
ArrayExpress
GEO and ArrayExpress Databases
provide:






The raw data for each hybridization (e.g., CEL or GPR files)
The final processed (normalized) data for the set of hybridizations in the
experiment (study) (e.g., the gene expression data matrix used to draw the
conclusions from the study)
The essential sample annotation including experimental factors and their
values (e.g., compound and dose in a dose response experiment)
The experimental design including sample data relationships (e.g., which
raw data file relates to which sample, which hybridizations are technical,
which are biological replicates)
Sufficient annotation of the array (e.g., gene identifiers, genomic
coordinates, probe oligonucleotide sequences or reference commercial
array catalog number)
The essential laboratory and data processing protocols (e.g., what
normalization method has been used to obtain the final processed data)
Problems:
Difficult compare experiments
 Significant genes not highlighted
 Poor results visualization


ArrayExpress is trying with its Atlas to
solve this problems
Genevestigator

It is JAVA visualization tool that
summarizes results from thousands of
high quality transcriptomic experiments

Much easier to compare samples

Open access to only some of the data
and 1 probeset/gene
Genevestigator
ONCOMINE