Download Bioinformatics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Copy-number variation wikipedia , lookup

Transposable element wikipedia , lookup

NEDD9 wikipedia , lookup

Genomic imprinting wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene therapy wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Human genome wikipedia , lookup

Genome (book) wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Point mutation wikipedia , lookup

Gene desert wikipedia , lookup

Genome evolution wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

Pathogenomics wikipedia , lookup

Genomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Metagenomics wikipedia , lookup

Microevolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome editing wikipedia , lookup

Designer baby wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Biological Databases
Notes adapted from lecture notes of Dr. Larry
Hunter at the University of Colorado
What can be discovered about a gene
by a database search?

A little or a lot, depending on the gene





Evolutionary information: homologous genes, taxonomic
distributions, allele frequencies, synteny, etc.
Genomic information: chromosomal location, introns,
UTRs, regulatory regions, shared domains, etc.
Structural information: associated protein structures, fold
types, structural domains
Expression information: expression specific to particular
tissues, developmental stages, phenotypes, diseases, etc.
Functional information: enzymatic/molecular function,
pathway/cellular role, localization, role in diseases
Using a database

How to get information out of a database:
Browsing: no targeted information to retrieve
 Search: looking for particular information


Searching a database:

Must have a key that identifies the element(s) of the
database that are of interest.
Name of gene
 Sequence of gene
 Other information


Helps to have particular informational goals
Searching for information
about genes and their products

Gene and gene product databases are often organized
by sequence




Genomic sequence encodes all traits of an organism.
Gene products are uniquely described by their sequences.
Similar sequences among biomolecules indicates both similar
function and an evolutionary relationship
Macromolecular sequences provide biologically
meaningful keys for searching databases
Searching sequence databases


Start from sequence, find information about it
Many kinds of input sequences




Could be amino acid or nucleotide sequence
Genomic or mRNA/cDNA or protein sequence
Complete or fragmentary sequences
Exact matches are rare (even uninteresting in many
cases), so often goal is to retrieve a set of similar
sequences.

Both small (mutations) and large (required for function)
differences within “similar” can be interesting.
What might we want
to know about a sequence?

Is this sequence similar to any known genes? How close
is the best match? Significance?

What do we know about that gene?




Genomic (chromosomal location, allelic information,
regulatory regions, etc.)
Structural (known structure? structural domains? etc.)
Functional (molecular, cellular & disease)
Evolutionary information:


Is this gene found in other organisms?
What is its taxonomic tree?
NCBI and Entrez
NCBI and Entrez

One of the most useful and comprehensive sources of
databases is the NCBI, part of the National Library of
Medicine.

NCBI provides interesting summaries, browsers for
genome data, and search tools

Entrez is their database search interface
http://www.ncbi.nlm.nih.gov/Entrez

Can search on gene names, sequences, chromosomal
location, diseases, keywords, ...
BLAST: Searching with a sequence

Goals is to find other sequences that are more similar
to the query than would be expected by chance (and
therefore are homologous).

Can start with nucleotide or amino acid sequence, and
search for either (or both)

Many options


E.g. ignore low information (repetitive) sequence, set
significance critical value
Defaults are not always appropriate: READ THE NCBI
EDUCATION PAGES!

Major choices:
Translation
 Database
 Filters
 Restrictions
 Matrix

Close hit: Rat ADH alpha
Distant hit:
Human sorbitol dehydrogenase
Parameters (at bottom!)
Click on:
Taxonomy report
(link from “Results of BLAST” page)
What did we just do?

Identify loci (genes) associated with the sequence.
Input was Alcohol Dehydrogenase

For each particular “hit”, we can look at that
sequence and its alignment in more detail.

See similar sequences, and the organisms in which
they are found.

But there’s much more that can be found on
these genes, even just inside NCBI…
More from Entrez Gene
And more…
PubMed
Gene Expression
Detailed expression information
NCBI is not all there is...

Links to non-NCBI databases




Other important gene/protein resources not linked to:





UniProt (most carefully annotated)
PDB (main macromolecular structure repository)
Other key biological data sources


Reactome & KEGG for pathways
HGNC for nomenclature
UCSC Human Genome Browser
Gene Ontology/Open Biological Ontologies
Enzyme
Scientific society: iscb.org
Journals, Conferences…
Gene Names:
Harder than you think…
Take home messages




There are a lot of molecular biology databases,
containing a lot of valuable information
Not even the best databases have everything (or
the best of everything)
These databases are moderately well crosslinked, and there are “linker” databases
Sequence is a good identifier, maybe even better
than gene name!