Download Genomics of sensory systems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Biochemistry wikipedia , lookup

Magnesium transporter wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Biosynthesis wikipedia , lookup

Gene expression wikipedia , lookup

Genomic imprinting wikipedia , lookup

Metalloprotein wikipedia , lookup

Interactome wikipedia , lookup

Non-coding DNA wikipedia , lookup

Multilocus sequence typing wikipedia , lookup

RNA-Seq wikipedia , lookup

Proteolysis wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Ridge (biology) wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Homology modeling wikipedia , lookup

Genetic code wikipedia , lookup

Protein structure prediction wikipedia , lookup

Gene expression profiling wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Gene wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Molecular evolution wikipedia , lookup

Transcript
Lecture #4 : Comparing genes
9/14/09
This week
 Homework
#2 due on Wed
Email with questions
Email me answers or hand in in class
 Wed
- I will be at Dept of Biology retreat
Lecture will be given by Kelly O’Quin - expert
in phylogenetics
He will go over homework so it must be done
before class
Questions for today
0. More BLAST
1. Where do we get high quality gene
sequences?
2. How do genes evolve?
3. How do we compare genes?
How to find genes
 Start
with genes which are known from
model organisms
 Use these to pull out genes from
genomes
 Compare genes to learn about sensory
evolution
Blast - Genbank
 What
database do you want to search?
 What
do you want to compare?
 What
program do you want to do the
searching?
Types of blast queries
Query
Database
Type
Nucleotide
Nucleotide
Blastn, Megablast,
Discont megablast
Protein
Protein
Blastp, Psi-blast, Phiblast
Translated
nucleotide
Protein
Protein
Blastx
Translated
nucleotide
Translated
nucleotide
Tblastn
Translated
nucleotide
Tblastx
Defaults
Database
Program
Confirm
Nucleotide BLAST = DNA nucleotide query vs nucleotide database
Choices for programs
 Megablast
Highly similar sequences >95%
Word length 28
 Discontiguous
megablast
Pretty similar seqs
Word length 11
 Blastn
Dissimilar seqs
Word length 11
Translated blast = protein query vs translated database
BLAST a genome
Request ID
AWJ4D4B7012
BLASTing is fun
 This
is meant to be enjoyable
 Be a genome explorer
Find out what kind of data is out there
Find out what kind of data isn’t there
QUESTIONS?????
Q1.
 There
is so much data in Genbank.
How do you find GOOD data?
 Example
Bovine rhodopsin - 1st G protein coupled
receptor to be sequenced
Search Genbank with text
49 entries
Bovine opsin
Bovine rhodopsin
Searching for genes
 Searching
by text is fraught with peril
Genbank has too many links
Pull up many things that are not what you
want
 BLAST
is better approach
 NCBI has also made records which
combine all similar sequences into one
NCBI has done some of the
work

They have hand-curated data for some
species to make a set of reference
sequences
Nucleotide sequences NMxxxxxxx
Protein sequences NPxxxxxx
For human rhodopsin
NM000539
NP000530

These are the gold standard for sequences
Homologene
Homologs
 Two
genes which arise in the common
ancestor of two organisms and are
passed down
 Implies genes perform same function in
two organisms
 Therefore they can be compared to
learn about evolution
These 4 primates have
many genes which are
homologs
and have been passed down
from primate ancestor
Human
Chimp
Macaque
Bushbaby
Homologene search for
rhodopsin
Homologene
Three primary sequence
portals: 1. NCBI
3. DNA database of Japan
2. Ensembl - European
Bioinformatics Institute (EBI)
Select just genes
Scroll down to find the gene
you want
Location
Links to transcript and protein
Orthologues are predicted and linked
OMIM - Online mendelian
inheritance in man
Good places to find genes
Model organisms: NCBI homologene
 Genes from models and other organisms:
Sanger Ensembl gene families

NOTE: These are often predicted from genome
sequences
If there is a sequence in NCBI homologene, it may
be different (and more accurate) than Sanger
predictions

OMIM is a good reference
Q2. How do genes change
through time?
 Change
in actual sequence
Mutation
Recombination
 Change
in frequency of a sequence
Selection - “survive” better
Drift - get passed on by chance
Migration - move between populations
Mutation vs selection
 Mutation
= sequence change
ATGCCGTGACGT
ATGCCTTGACGT
 Selection/drift/migration
= sequence
frequency changes across a number of
individuals
ATGTG
ATGTG
ATGTG ATGTG
ATGTG ATGTG
ATGTG
ATGTG ATGTG
ATGTG
ATGTT
ATGTG ATGTG
ATGTG ATGTG

ATGTG
ATGTG
ATGTG ATGTG
ATGTG ATGTT
ATGTT
ATGTT ATGTT
ATGTT
ATGTT
ATGTT ATGTT
ATGTT ATGTT
Evolution as tinkerer
 Changes
are typically small
 Mutation is source of new sequence
Not all mutations are created equal
Some occur more often than others
 Other
forces shift frequency of particular
sequence
Triplet amino acid code
F, phe
F, phe
L, leu
L, leu
TTT
TTC
TTA
TTG
S,
S,
S,
S,
ser
ser
ser
ser
TCT
TCC
TCA
TCG
Y, tyr
Y, tyr
O, stop
B, stop
TAT
TAC
TAA
TAG
C, cys
C, cys
J, stop
W, trp
TGT
TGC
TGA
TGG
L,
L,
L,
L,
leu
leu
leu
leu
CTT
CTC
CTA
CTG
P,
P,
P,
P,
pro
pro
pro
pro
CCT
CCC
CCA
CCG
H, his
H, his
Q, gln
Q, gln
CAT
CAC
CAA
CAG
R,
R,
R,
R,
arg
arg
arg
arg
CGT
CGC
CGA
CGG
I, ile
I, ile
I, ile
M, met
ATT
ATC
ATA
ATG
T,
T,
T,
T,
thr
thr
thr
thr
ACT
ACC
ACA
ACG
N,
N,
K,
K,
asn
asn
lys
lys
AAT
AAC
AAA
AAG
S,
S,
R,
R,
ser
ser
arg
arg
AGT
AGC
AGA
AGG
V,
V,
V,
V,
GTT
GTC
GTA
GTG
A,
A,
A,
A,
ala
ala
ala
ala
GCT
GCC
GCA
GCG
D,
D,
E,
E,
asp
asp
glu
glu
GAT
GAC
GAA
GAG
G,
G,
G,
G,
gly
gly
gly
gly
GGT
GGC
GGA
GGG
val
val
val
val
Mutation causes nucleotide
change
 What
about AA sequence?
 Synonymous change
Syn = same
AA stays same
 Nonsynonymous
Not same
AA changes
change
Amino acid code
F, phe
F, phe
L, leu
L, leu
TTT
TTC
TTA
TTG
S,
S,
S,
S,
ser
ser
ser
ser
TCT
TCC
TCA
TCG
Y, tyr
Y, tyr
O, stop
B, stop
TAT
TAC
TAA
TAG
C, cys
C, cys
J, stop
W, trp
TGT
TGC
TGA
TGG
L,
L,
L,
L,
leu
leu
leu
leu
CTT
CTC
CTA
CTG
P,
P,
P,
P,
pro
pro
pro
pro
CCT
CCC
CCA
CCG
H, his
H, his
Q, gln
Q, gln
CAT
CAC
CAA
CAG
R,
R,
R,
R,
arg
arg
arg
arg
CGT
CGC
CGA
CGG
I, ile
I, ile
I, ile
M, met
ATT
ATC
ATA
ATG
T,
T,
T,
T,
thr
thr
thr
thr
ACT
ACC
ACA
ACG
N,
N,
K,
K,
asn
asn
lys
lys
AAT
AAC
AAA
AAG
S,
S,
R,
R,
ser
ser
arg
arg
AGT
AGC
AGA
AGG
V,
V,
V,
V,
GTT
GTC
GTA
GTG
A,
A,
A,
A,
ala
ala
ala
ala
GCT
GCC
GCA
GCG
D,
D,
E,
E,
asp
asp
glu
glu
GAT
GAC
GAA
GAG
G,
G,
G,
G,
gly
gly
gly
gly
GGT
GGC
GGA
GGG
val
val
val
val
Amino acid (AA) types
 Non-polar
A, F, G, I, L, M, P, V, W
 Polar
N, Q, S, T, Y
 Charged, + H, K, R
 Charged, D, E
 Other
C
Often changing AA within a group does not
affect protein function
Selection
 Stabilizing
selection - Acts to keep
protein function the same
Synonymous change more frequent than
nonsynonymous
 Amino
acid changes occur within group
much more common than between
Non polar  nonpolar
Polar
 polar
Similarity matrix
A = alanine
C = cysteine
D = aspartic acid
E = glutamic acid
F = phenylalanine
G = glycine
H = histidine
Comparing sequences
 Can
do at either nucleotide or AA level
 Gather sequences from a bunch of
different organisms
 Need to align them so that sites which
perform the same function can be
compared
Aligning sequences
 Sequences
may differ in length
Often have differences at amino- or carboxyterminus of the protein
Need a way to align parts of protein that are
performing the same function
Example - RH2 opsin in fishes
Goldfish MNGTEGNNFYVPLSNR
Medaka
MENGTEGKNFYIPMNNR
Zebrafish MNGTEGSNFYIPMSNR
Killifish MGYGPNGTEGNNFYIPMSNK
TroutMQNGTEGSNFYIPMSNR
Halibut
MVWDGGIEPNGTEGKNFYIPMSNR
Cod
MRMEANGTEGKNFYIPMSNR
Tetraodon MVWDGGIEPNGTEGKNFYIPMSNR
Align sequences
Zebrafish
Trout
Medaka
Cod
Halibut
Tetraodon
Goldfish
Killifish
* identical
: conserved
. semi-conserved
M--------NGTEGSNFYIPMSNR
M------Q-NGTEGSNFYIPMSNR
M------E-NGTEGKNFYIPMNNR
M----RMEANGTEGKNFYIPMSNR
MVWDGGIEPNGTEGKNFYIPMSNR
MVWDGGIEPNGTEGKNFYIPMSNR
M--------NGTEGNNFYVPLSNR
M---GYG-PNGTEGNNFYIPMSNK
*
*****.***:*:.*:
Amino acid (AA) types
 Non-polar
A, F, G, I, L, M, P, V, W
 Polar
N, Q, S, T, Y
 Charged, + H, K, R
 Charged, D, E
 Other
C
Often changing AA within a group does not
affect protein function