Download Lab

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Copy-number variation wikipedia , lookup

Genomic library wikipedia , lookup

Transposable element wikipedia , lookup

Genetic code wikipedia , lookup

Exome sequencing wikipedia , lookup

Genomic imprinting wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Proteolysis wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Multilocus sequence typing wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene expression wikipedia , lookup

Gene regulatory network wikipedia , lookup

Molecular ecology wikipedia , lookup

Gene desert wikipedia , lookup

Gene wikipedia , lookup

Non-coding DNA wikipedia , lookup

Two-hybrid screening wikipedia , lookup

RNA-Seq wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Homology modeling wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Structural alignment wikipedia , lookup

Community fingerprinting wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Transcript
Project Proposals
• Due Monday Feb. 12
• Two Parts:
•Background—describe the question
•Why is it important and interesting?
•What is already known about it?
•Proposed Work
•What will you do?
•How will you do it?
• Include references and figures as needed
Phylogeny
• Reread background papers from weeks 3 &4
•Desulc et al
•Holder and Lewis
The twenty amino acids
Protein Weight Matrices
Two Main kinds of weight matrices
PAM (Point Accepted Mutation) Based on explicit evolutionary model.
Based on mutations observed thoughout a global alignment (includes
both highly conserved and highly mutable regions) of a small protein
dataset
BLOSUM (Blocks Substitution Matrix) Based only on highly
conserved regions in series of alignments forbidden to contain gaps.
Sensitive for local alignment of related sequences. Based on larger
dataset than PAM.
BLOSUM45
BLOSUM62
BLOSUM90
PAM 250
PAM 160
PAM100
More Divergent
Less Divergent
BLOSUM62 is the BLASTP default
Other Types of BLAST
• MegaBLAST (nt)
– Mega BLAST uses the greedy algorithm for nucleotide
sequence alignment search. Optimized for aligning
sequences that differ slightly as a result of sequencing
or other similar "errors". Also able to efficiently handle
much longer DNA sequences than the blastn program
of traditional BLAST algorithm.
• Discontinous MegaBLAST (nt)
– Designed specifically for comparison of diverged
sequences, especially sequences from different
organisms, which have alignments with low degree of
identity, where the original Mega BLAST is not very
effective.
• See Also, MUMMER at TIGR
Other BLAST options
• Search for short nearly exact matches
– (nt or aa)
– Special page with altered parameters
• Expect value has been increased
• word size decreased to optimise for short hits
which generally score a large E value
• For proteins a different scoring matrix used,
optimized for smaller evolutionary distances
• Low complexity sequence
– Regions of biased composition including
homopolymeric runs, short-period repeats, and
more subtle overrepresentation of one or a few
residues
– Examples: AAATAAAAAAAATAAAAAAT or
PPCDPPPPPKDKKKKDDGPP
– Filters are used to remove low-complexity sequence
because it can cause artifactual hits
• Filters result in strings of Ns or Xs substituted in your query
– Without a filter• Some hits may be reported with high scores only because of
the presence of a low-complexity region.
• Usually not the result of homology shared by the sequences.
• Rather, it is as if the low-complexity region is "sticky" and is
pulling out many sequences that are not truly related.
Phylogenetic Profiling
Pattern of presence or absence of genes across genomes
Idea: proteins that function in the same cellular context frequently
have similar phylogenetic profiles
Environmental Genomic Datasets
Sargasso Sea
Station Aloha
Acid Mine Drainage
Whale Fall
sludge
soils
marine viromes
Human Gut
Global Ocean Survey: phase I
Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis
CAMERA
Online since Jan. 23rd!
Today’s Lab
Use IMG (http://img.jgi.doe.gov/cgi-bin/pub/main.cgi) to explore
precomputed homologs for your gene of interest
genomic neighborhoods for your gene of interest
phylogenetic profile of your gene of interest
genes that fit a specific phylogenetic profile of a subset of genomes
of interest to you
Register as a CAMERA user http://cameradev.calit2.net/index.php
See if you can find homologs of your gene of interest in one of available
databases