Download Gene!

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Zinc finger nuclease wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Molecular cloning wikipedia , lookup

Transformation (genetics) wikipedia , lookup

DNA supercoil wikipedia , lookup

Gene therapy wikipedia , lookup

Gene desert wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Gene expression wikipedia , lookup

Gene regulatory network wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Point mutation wikipedia , lookup

Transposable element wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Genetic engineering wikipedia , lookup

Genomic library wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Community fingerprinting wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene wikipedia , lookup

Non-coding DNA wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Bioinformatics Practical
for
Biochemists
!
Andrei Lupas, Birte Höcker, Steffen Schmidt
WS 2013/2014
!
01. DNA & Genomics
1
Description
•
Lectures about general topics in
Bioinformatics & History
•
Tutorials will provide you with a toolbox of
bioinformatics programs to analyse data
•
Hands-On sessions will give you the
opportunity to use these tools
2
Course Outline
•
•
•
•
•
Mon
– DNA & Genomics
Tue
– Introduction to Proteins
Wed
– Annotation of Sequence Features
Thr
– Protein Classification
Fri
– Evolution & Design
Course Material:
eb.mpg.de/research/departments/protein-evolution/teaching
3
Course Outline
•
•
13:00-14:00
14:15-17:30
Presentation
Tutorial (2 x 30min) & hands-on practical
!
•
You will need to keep an electronic lab
notebook
•
Fri afternoon: Test Exercises
4
Software Requirements
•
•
•
Browser (e.g. Firefox)
“Advanced” Word Processor
PyMOL (www.pymol.org – free for teaching)
5
DNA & Genomics
1953 Model of DNA (F. Crick)
6
What is the “genetic material”?
•
1865 Gregor Mendel
•
•
1869 Friedrich Miescher
•
•
discovery of ‘nuclein’ (DNA), Hoppe-Seyler repeated all experiments
1881 Edward Zacharias
•
•
basic rules of heredity
chromosomes are composed of nuclein
1899 Richard Altmann
•
renaming nuclein to nucleic acid
wikipedia.org
7
DNA is the “transforming material”
•
1928 Frederick Griffith
•
•
“transforming principle” - Str. pneumoniae experiment
1944 Avery & McCarty
•
Griffith’s “transforming principle”
is DNA
history.nih.gov / wikipedia.org
8
DNA is the genetic material
•
1950 Erwin Chargaff
•
A/T, C/G same amount in different tissues
!
!
•
1952 Hershey & Chase
•
DNA is the genetic material using 32P/35S Phage/E. coli experiment
bacteriophagetherapy.info / www.lifesciencesfoundation.org
9
Solving the DNA structure
•
1952/53 Linus Pauling
•
beat Cavendish Lab in discovery of α-helix
•
Cavendish Lab (Cambridge) Watson & Crick allowed
to work full-time on DNA
!
!
•
Pauling shared manuscript
with Cavendish Lab before publication
(via his son Peter Pauling)
http://osulibrary.oregonstate.edu/specialcollections/coll/pauling/dna/notes/1952a.22-ms-01.html
10
Solving the DNA structure
•
1951/1952 Franklin & Wilkins
•
1951 Lecture with Watson attending
•
•
•
A-DNA / B-DNA
periodicity, phosphates are outside
1953 X-ray of B-DNA (Photo 51)
- Wilkins showed image to Watson - Perutz showed a confidential committee report to Watson & Crick
11
ature.com/nature
Solving the DNA structure
Nature, 1953
© 2003 Nature Publishing Group
12
397
DNA structure
13
Getting the “code”
•
1953 George E. Palade
•
•
“RNA organelles” (ribosomes)
1957 Crick et.al
•
•
•
suggest non-overlapping triplets
only 20 out of 64 triplet code for an amino acid
“comma-free code”
14
(d) The code is probably
‘degenerate’;
that is, in
general, one particular
ammo-acid
can be coded by
one of several tripieta of bases.
The Reading ofthe
the Code“code”
Getting
‘report
hers our work ,on the mutant
P 13 (now
renamed
FC 0) in the Bl segment of the B cistron.
Thie mutant
was originally
produced
by the action
of proflavins.
We@ have previously
argued that acridines
such
aa pro5vin
act as mutagens
because they add or
dslsts a base or bases. The most striking evidence in
favour of this is that mutants
produced by a&dines
are seldom ‘leaky’ ; they are almost always completely
Since our note
lacking in the function
of the gene.
was published,
experimental
data from two eourcsa
have been added to 0u.1: previous
evidence:
(1) we
have examined
a set of 126 pn mutants
made with
polyF
acridine protein
yellow; of these only 6 are IeaLT- (typically
about half the mutants
made with base analogues
are leaky) ; (2) Streisinger lo has found that whereas
mutants
of the lysozyme of phage T4 produced
by
all lysozyme
baas-analogues
are usually
leaky,
mutants
produced by proflavin
are negative, that is,
the function
is completely
lacking.
If an acridine mutant i,3 produced by, say, adding a
base, it should revert to ‘lvild-type’
by deleting a bass.
Our work on revertants
of FC-0 shows that it-usually
The evidence that the genetic cods is not overlapping (see Fig. 1) doss not come from our work.
but from that, of Wittmannl
and of Tsugita
and
Frasnkel-Conrat
on the mutants
of tobacco mosaic
virus produced
by nitrous asid.
In an overlapping
triplet code, an alteration
to one baas will in general
change three adjacent amino-acids in the polypeptide
produces
chain. Their work on the polyU
alterationsmRNA
produced
in the
protein
of the virus show that usually
only one
amino-acid at a time is changed
a8 a result
of treating
complete
genetic
code
the ribonuclsic
acid (RNA) of the virus with nitrous
acid.
In the rarer cases where two amino-acids
are
altered (owing presumably
to ! two separate deammations by the nitrous
acid on one piece of RNA), the
altered amino-acids
ars not in adjacent
positions
in
the polypeptide
chain.
Brsnnera had previously
shown that, if the code
were universal
(that is, the same throughout
Nature),
then all overlapping
triplet
codes were impossible.
no
overlapping
codes
Starlinq point
Moreover,
all the abnormal
human
hremoglobins
3
,, ;$I
Overlappirq
code
studied in detail4 show only single amino-acid changes.
The newer experimental
rssulta ssssntially
rule out
concept
of mRNA
+7
all simple codes of the overlapping
type.
NUCLEIC ACID *
I’
’ ’ ’ ’ ’ ’ --If the code is not overlapping,
then there must be
,-J+-~---triplet
Code
Borne arrangement
to show
how to
select the correct
ETC.
1
triplets (or quadruplets,
or whatever it may be) along
(Crick,
Brenner,
Barnett, 3
'
the continuous
sequence
of bases.
One obvious
Non-overlapplnq
Code
Watts-Tobin)
suggestion is that, say, every
fourth baas is a ‘comma’.
Fig. 1. To show the difference
between
an overlapping
code and
&other
idea is that certain triplets
make ‘sense’,
a non-overlappinu
code.
The short
wrticnl
lines represent
the
whereas others make ‘nonsense’, as in the comma-free
bases of the nucleic acid.
The czw illustrated
is for a triplet
code
•
1961 Nirenberg & Matthaei
•
•
•
1961 Sydney Brenner
•
•
•
15
Getting the “code” – incl. start & stop codons
•
Alternative start codon
•
•
•
AUG (83%)
GUG (14%)
UUG (3%)
!
•
Alternative stops
•
•
•
UAA (63%, ‘ochre’)
UGA (29% ‘opal’) / or Sec (Seleoncys)
UAG (8%, ‘amber’)
E. coli
16
Gene Structure
•
1977 Sharp & Roberts
•
•
1982 Cech
•
•
pre-mRNA is processed
ribo(nucleic en)zymes
1980 Joan A. Steitz
•
role of snRNPs in splicing
wikipedia.org / yale.edu
17
Gene Structure – Eurkayotes / Prokaryotes
lac Operon
1: Regulatory gene
3: ß-galactosidase
4: ß-gal permease
8: ß-gal transacetylase
Promotor region
18
Gene Structure – Polysomes in Prokaryotes
•
EM picture of polysomes on a chromosome
mRNA with
Ribosomes
Transcription DNA
initiation
Miller, O. L. et al. Visualization of bacterial genes in action. Science 169, 392–395
19
Gene Structure – Prokayotes
u-tokyo.ac.jp
20
Gene Structure – Prokaryotic Operons
lac Operon
1: Regulatory gene
3: ß-galactosidase
4: ß-gal permease
8: ß-gal transacetylase
Promotor region
Griswold, A. (2008) Nature Education 1(1)
Understanding Bioinformatics, Zvelebil & Baum, 2007
21
Gene Structure – Eukaryotes / Prokaryotes
lac Operon
1: Regulatory gene
3: ß-galactosidase
4: ß-gal permease
8: ß-gal transacetylase
Promotor region
22
Gene Structure – Eukaryotes
zazzle.com
23
00,000
Gene Structure – Gene density in Eukaryotes
10 Mb
20,000,000
25,000,000
RefSeq Genes
hg19
30,000,000
100 vertebrates Basewise Conservation by PhyloP
Repeating Elements by RepeatMasker
zoom in to <= 10,000,000 bases to view items
24
35,000,000
Gene Structure – Comparison
Eukaryote!
!
Prokaryote!
• Often&have&introns&
• Intraspecific&gene&order&and&number&
generally&relatively&stable&&
Genes!
• many&non8coding&(RNA)&genes&
• There&is&NOT&generally&a&relationship&
between&organism&complexity&and&gene&
number&
Gene!regulation!
• Promoters,&often&with&distal&long&range&
enhancers/silencers,&MARS,&transcriptional&
domains&
• Generally&mono8cistronic&
Repetitive!sequences!
Organelle!
(subgenomes)!
• No&introns&
• Gene&order&and&number&may&
vary&between&strains&of&a&species&
• Promoters&
• Enhancers/silencers&rare&&
• Genes&often&regulated&as&
polycistronic&operons&
• Generally&highly&repetitive&with&genome&wide& • Generally&few&repeated&
sequences&
families&from&transposable&element&
propagation&
• Relatively&few&transposons&
• Mitochondrial&(all)&
• Absent&
• chloroplasts&(in&plants)&
25
Genomic era
•
1975 Frederick Sanger
•
•
•
dideoxy sequencing
1986 Human Genome Initiative
Genomes
•
•
•
•
•
1995
H. influenca
1.8 Mb
1.7k
genes
1997
E. coli
4.6 Mb
4.3k
genes
1996
S. cerevisiae
12.5 Mb
5.7k
genes
1998
C. elegans
100 Mb
21.7k
genes
2000
D. melanogaster
121 Mb
17k
genes
26
Prokaryotic Genome
•
E. coli
•
•
6 Mbp
1 by 2 µm cell size
Kavanoff, Nature Education : Supercoiled chromosome of E. coli.
27
The human genome
•
2001
Draft H. sapiens
2.9 Bb
20-30k genes
Science (2001), Nature (2001)
28
The human genome
29
Gene content
30
Genome Structure – Comparison
Eukaryote!
!
Prokaryote!
• Large&(10&Mb&–&100,000&Mb)&
Size!
Content!
• There&is&not&generally&a&
relationship&between&organism&
complexity&and&its&genome&size&
(many&plants&have&larger&
genomes&than&human!)&
• Most&DNA&is&nonLcoding&
• Complexity&(as&measured&by&#&of&genes&
and&metabolism)&generally&proportional&
to&genome&size&
• DNA&is&“coding&gene&dense”&
• Circular&DNA,&doesn't&need&telomeres&
Telomeres/!
Centromeres!
• Present&(Linear&DNA)&
Number!of!
chromosomes!
• More&than&one,&(often)&including&
those&discriminating&sexual&
identity&
Chromatin!
• Generally&small&(<10&Mb;&most&<&5Mb)&
• Don’t&have&mitosis,&hence,&no&
centromeres.&
• Often&one,&sometimes&more,&Lbut&
plasmids,&not&true&chromosome.&
• Histone&bound&(which&serves&as&a& • No&histones&
genome&regulation&point)&
• Uses&supercoiling&to&pack&genome&
&
31
Gene content
32
Human Genome Content
LTR retrotransposons
DNA transposons
Simple sequence
8.3%
repeats
2.9%
3%
Segmental
duplications
5%
Miscellaneous
heterochromatin
SINEs
13.1%
20.4%
8%
LINEs
1.5%
11.6%
Miscellaneous
unique sequences
25.9%
Protein-coding
genes
Introns
Gregory (2005), Nature
33
Gene Structure – Eukaryotic Gene
Scale
chr1:
SMG5
4_
hg19
10 kb
156,225,000 156,230,000 156,235,000 156,240,000 156,245,000 156,250,000
UCSC Genes (RefSeq, UniProt, CCDS, Rfam, tRNAs & Comparative Genomics)
Placental Mammal Basewise Conservation by PhyloP
Mammal Cons
-4 _
Simple Nucleotide Polymorphisms (dbSNP 135) Found in >= 1% of Samples
Common SNPs(135)
Repeating Elements by RepeatMasker
RepeatMasker
34
Human Genome Content
LTR retrotransposons
DNA transposons
Simple sequence
8.3%
repeats
2.9%
3%
Segmental
duplications
5%
Miscellaneous
heterochromatin
SINEs
13.1%
20.4%
8%
LINEs
1.5%
11.6%
Miscellaneous
unique sequences
25.9%
Protein-coding
genes
Introns
Gregory (2005), Nature
35
Transposable Element - Mobile Elements /
Jumping genes
•
Barbara McClintock (1902 - 1992)
•
studies in the 40’s & 50’s of spotted kernels in
maize
•
•
discovery of “controlling elements”
•
Nobel prize in 1983
initially thought to be unique to maize but later
also found in eukaryotes, bacteria, viruses,
phages & plasmids
wikipedia.org
36
Transposable Element - Mobile Elements /
Jumping genes
•
•
DNA Transposons
•
transposase cuts out transposon
& inserts it at the target site
•
•
“cut-and-paste” mechanism
prokaryotes & eukaryotes
Retrotransposons
•
•
•
•
transposon DNA transcribed to RNA
insertion to genome by reverse transcription
LTR, LINEs, SINEs
eukaryotes only
wikipedia.org
37
38