Download Genome Anatomy - K

Document related concepts

Cre-Lox recombination wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Protein adsorption wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene expression profiling wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Protein moonlighting wikipedia , lookup

RNA-Seq wikipedia , lookup

List of types of proteins wikipedia , lookup

Gene regulatory network wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Genomic library wikipedia , lookup

Gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Point mutation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Gene expression wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Genome evolution wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
General Introduction to the
Genome
An Outlines
•
•
•
•
•
•
Molecular Biology Major Events
DNA, RNA
Protein Synthesis(Transcription & Translation)
Genome Anatomy
Bioinformatics
Genomics Signal Processing
2
• Molecular Biology Major Events
•
•
•
•
•
DNA, RNA
Protein Synthesis(Transcription & Translation)
Genome Anatomy
Bioinformatics
Genomics Signal Processing
3
Molecular Biology Major Events
1865
Mendel
Inheritance is
controlled by unit
factors
1869
1881
Johann Friedrich
DNA
Discovery
Chromosomes
are composed
of DNA
Molecular Biology Major Events
1881
1911
Thomas Hunt
Chromosomes
are composed
of DNA
Genes on
chromosomes are
the discrete units
of heredity
1941
George
Beadle
Edward
Tatum
Identify that genes
make proteins
The Central Dogma
1
2
3
Nucleus
Book shelves
Book
Target
What is Life made of?
715
Eukaryotes vs Prokaryotes
DNA
8
Prokaryotes
Eukaryotes
Single cell
Single or multi cell
No nucleus
Nucleus
No organelles
Organelles
One piece of circular DNA
Chromosomes
No mRNA post
transcriptional
modification
Exons/Introns splicing
915
The Cell: Chemical Composition
– 70% Water
– 7% Small molecules
• Salts
• Amino acids (Protein)
• Nucleotides (DNA, RAN)
– 23% macromolecules
• Proteins
• Polysaccharides
• Lipids
10
The Cell: The 3 Critical Molecules
RNA
DNA
Hold Genetic
information
PROTEIN
m-RNA r-RNA
t-RNA
Form enzymes
Transfer Information
Synthesize Protein
Form body’s
components
• Molecular Biology Major Events
• DNA, RNA
•
•
•
•
Protein Synthesis(Transcription & Translation)
Genome Anatomy
Bioinformatics
Genomics Signal Processing
12
DNA: the Nucleotide
Sugar
Phosphate
A
Nitrogenous base
13
DNA: Nitrogenous base
Purines
A
Pyrimidines
G
T
C
14
DNA: Polymerization reaction
5 P’
3OH’
T
A
C
G
5
3
A
T
G
C
A
T
G
C
DNA: hydrogn bounds
T
C
G
T
No of base pairs= Genome Size
HG= 3200 Mbp (Mb)
A
C
G
C
A
G
T
G
C
T
A
A
C
C
G
A
T
T
A
G
T
A
G
C
C
G
T
A
SugarPhosphate
Back bone
DNA: Watson Crick Model
1951
DNA: Watson - Crick Model
SugarPhosphate
Back bone
No of base pairs= Genome Size
HG= 3200 Mbp (Mb)
RNA versus DNA
Sugar "Ribose”
Phosphate
Nitrogenous base
G, A ,C,T
Sugar” deoxyRibose”
Phosphate
Nitrogenous base
G, A ,C,U
19
Protein structure
• 1902 - Emil Hermann
Fischer wins Nobel prize:
showed amino acids are
linked and form proteins
A
F
G
N
S
T
D
K
G
S
A
20
Amino acid: Basic unit of protein
R
NH3
Amino group
+
C
H
Different side chains,
COO R, determine the
properties of 20
Carboxylic
acid group
amino acids.
An amino acid
21
22
Protein structure
• Primary structure
• Secondary structure
• Super-secondary structure
• Tertiary structure
• Quaternary structure
Protein Structure: Predication Problem
A
F
G NS
T
Protein
sequence
Protein 3D
structure
Protein
Function
The Central Dogma:
Genes is protein’s blueprint,
Genome
DNA
Gene
Protein
Gene
Gene
Gene
Gene
Gene
Gene
Gene
Gene
Gene
Gene Gene Gene
Gene
Gene
Protein
Protein
Protein
Protein
Protein Protein
Protein
Protein
Protein
Protein Protein
Protein
Protein
Protein
•
•
•
•
•
•
Molecular Biology Major Events
DNA, RNA
Protein Synthesis(Transcription & Translation)
Genome Anatomy
Bioinformatics
Genomics Signal Processing
26
Protein Synthesis:
DNA, RNA, and the Flow of Information
Replication
Transcription
Translation
27
Protein Synthesis: Gene Expression
28
Pre-mRNA
1
mRNA
1
Transcription
Translation
2
2
3
3
Splicing
Pre-mRNA
Alternative Splicing
mRNA
1
1
Transcription
Translation
2
3
2
3
m-RNA Editing
Pre-mRNA
1
mRNA
1
Transcription
Translation
2
2
3
3
32
Translation
Pre-mRNA
Start Codon
1
S
A
K
3
A
U
G
A
U
A
A
C
U
C
A
V
G
M
2
mRNA
Stop Codon
Protein Synthesis: The Genetic Code
Start
Stop
34
Gene Regulation
1
1
2
2
3
3
Regulatory protein
Gene Regulation
Regulatory
protein Gene 1
We have a little
knowledge about
regulatory
mechanisms
Gene 2
Gene 1
Regulatory
protein Gene 2
What a big Genome Size?
• The 12 font size enables approximately 60
nucleotides of DNA sequence to be written in
a line 10 cm in length.
• Genome size = total number of nucleotide
base pairs.
– typically in millions of base pairs, or megabases
[abbreviated Mb or Mbp])
37
• Molecular Biology Major Events
• DNA, RNA
• Protein Synthesis(Transcription & Translation)
• Genome Anatomy
• Bioinformatics
• Genomics Signal Processing
38
the human genome sequence would stretch for 5000 km, the distance from
Montreal to London, Los Angeles to Panama, Tokyo to Calcutta, Cape Town to Addis
Ababa, or Auckland to Perth
The sequence would fill about 3000 books the size of book
600 pages size.
39
Genome size of organism are different
40
Genome size is not good indicator for
genes number
41
• Space is saved in the genomes of less complex
organisms because the genes are more closely
packed together.
42
C-value paradox
• Correlation between the complexity of an
organism and the size of its genome was
looked on as a bit of a puzzle.
•
43
Genome Anatomy
Human Genome Anatomy
Human genome Nuclear genome
 Mitochondrial genome
45
Human Mitochondrial Genome Anatomy
• it is much smaller than
the nuclear
genome(~17 kB), and it
contains just 37 genes.
• 13 code proteins and 24
specify non-coding RNA.
• do not contain intron.
• is typical of the
mitochondrial genomes
of other animals
46
47
Nuclear Human Genome Anatomy
62%
48
Nuclear Human Genome Anatomy:
Protein Coding Genes
Nuclear Human Genome Anatomy: Protein Coding Genes
five exons, separated by four introns.
average exons= nine exons per gene
50
Two gene segments (V28 and V29-1)
51
Nuclear Human Genome Anatomy:
pseudogene
Non functional genes
52
Nuclear Human Genome Anatomy: genome-wide repeat
Nuclear Human Genome Anatomy:
genome-wide repeat
•Tandemly repeated DNA
•Minisatellite DNA
•Microsatellite DNA
•Interspersed genome-wide
repeats
•SINE
•LINES
•LTR
•DNA transposons
54
Nuclear Human Genome Anatomy:
genome-wide repeat Minisatellite DNA
• we are familiar with because of its association
with structural features of chromosomes.
• Telomeric DNA, which in humans comprises
hundreds of copies of the motif 5′-TTAGGG-3′.
TTAGGG TTAGGG TTAGGG ………………………..
AATCCC AATCCC AATCCC ………………………..
55
The content of the human nuclear genome:
genome-wide repeat Microsatellite DNA
• microsatellites with a CA repeat, such as:
make up 0.25% of the genome, 8 Mb in all.
• Single base-pair repeats such as:
make up another 0.15%.
56
Nuclear Human Genome Anatomy:
genome-wide repeat Interspersed repeat
57
Gene Classification: Gene function
• This system has the advantage that the fairly
broad functional categories used in can be
further subdivided to produce a hierarchy of
increasingly specific functional descriptions for
smaller and smaller sets of genes.
• The weakness :
functions have not yet been assigned to
many eukaryotic genes.
58
Gene Classification: Gene function
• The gene catalog
couldn’t tell us why we
are human?
• it may still not be
possible simply from
genome comparisons
with the chimpanzee
genome to determine
what makes us human
59
Gene Classification: Gene function
• The major categories of protein coding genes
represent the most studied areas of cell
biology, which means that many of the
relevant genes can be recognized because
their protein products are known.
• Genes whose products have not yet been
identified are more likely to be involved in the
less well studied areas of cellular activity.
60
Gene classification: Protein Domain
• A more powerful method is to base the
classification not on the functions of genes
but on the structures of the proteins that they
specify.
• A protein molecule is constructed from a
series of domains, each of which has a
particular biochemical function.
61
Gene classification: Protein Domain
62
•
•
•
•
Molecular Biology Major Events
DNA, RNA
Protein Synthesis(Transcription & Translation)
Genome Anatomy
• Bioinformatics
• Genomics Signal Processing
63
What is Bioinformatics?
• Integration of computational and biological methods
to convert biological information into general theories.
aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcgg
ctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatga
caatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccga
tgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgc
ggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaag
ctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggat
ccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatg
ctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggct
atgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctggg
aatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcgg
ctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgcta
agctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaat
gcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcgg
ctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttg
gaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttac
cttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctggg
atccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgacta
tgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcgg
aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgca
tgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgg
gatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgc
taagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttacct
tggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggct
atgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatg
ctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatg
catgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttacct
tggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgg
gatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctat
gctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaag
ctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagct
catgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgaca
atgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaa
gctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttg
ggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatga
atggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgc
atgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcgg
ctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggcta
tgctaagctcatgcgg
64
Data structures
Software engineering
(C, C++,PERL)
Computer Science
Cell structure
Genome, genes
DNA, RNA
Biology
Bioinformatics
Chemistry
Statistics
Protein structure
Molecular bounds
Markof Model
Neural Network
65
Bioinformatics Subareas
• The subareas within bioinformatics include
Genomics and Proteomics.
Genome comparison
evolutionary tree
Microarray Analysis
Gene predication
Gene classification
Gene regulation
Protein 3D predication
Protein protein interaction
Protein alignment
66
•
•
•
•
•
Molecular Biology Major Events
DNA, RNA
Protein Synthesis(Transcription & Translation)
Genome Anatomy
Bioinformatics
• Genomics Signal Processing
67
What is GSP?
 Analysis
 Processing
aatgcatgcggctatgctaatgcatgcggctatgct
aagctgggatccgatgacaatgcatgcggctatgct
aatgcatgcggctatgcaagctgggatccgatgact
atgctaagctgggatccgatgacaatgcatgcggct
atgctaatgaatggtcttgggatttaccttggaatgc
taagctgggatccgatgacaatgcatgcggctatgc
taatgaatggtcttgggatttaccttggaatatgcta
atgcatgcggctatgctaagctgggatccgatgaca
atgcatgcggctatgctaatgcatgcggctatgcaa
gctgggatccgatgactatgctaagctgcggctatg
ctaatgcatgcggctatgctaagctgggatccgatg
acaatgcatgcggctatgctaatgcatgcggctatg
caagctgggatcctgcggctatgctaatgaatggtc
ttgggatttaccttggaatgctaagctgggatccgat
gacaatgcatgcggctatgctaatgaatggtcttgg
gatttaccttggaatatgctaatgcatgcggctatgc
taagctgggaatgcatgcggctatgctaagctggg
atccgatgacaatgcatgcggctatgctaatgcatg
cggctatgcaagctgggatccgatgactatgctaag
ctgcggctatgctaatgcatgcggctatgctaagct
catgcggctatgctaagctgg
Using Theory and Methods of
Signal Processing
 To gain global understanding of
Genome.
GSP Labs
• The Genomic Signal
Processing Laboratory at
Texas A&M University.
• The Computational Biology
Division of the
Translational Genomics
Research Institute in
Phoenix, Arizona.
To model Genomic Regulatory Mechanisms
for the purposes of diagnosis and therapy.
Edward R. Dougherty
GSP Labs
• Columbia's Genomic
Information Systems
Laboratory
at Columbia University
Dimitris Anastassiou
GSP Labs
• DSP Group, Department
of Electrical Engineering,
California Institute of
Technology
P. P. Vaidyanathan
Mapping Character String to Numerical
Sequences
AAAA
TTTT
CCCG
GGTA
GCTT
TCCC
GGGT
0001
1101
0101
0101
1111
1111
1000
Research Area of GSP
• Gene Predication
• Genes Predication
– Hidden Markov Models (HMM)
– Fourier Transform
– Wavelet Transform
• Resonant Recognition Model (RRM)
To identify the common hot spots of many protein
molecules using Fourier transform methods.
•
References
• http://biology.ucok.edu/bidlack/biology/notes
.htm
• http://www.ncbi.nlm.nih.gov/books/bv.fcgi?ri
d=genomes
• http://www.estrellamountain.edu/faculty/fara
bee/biobk/biobooktoc.html
• http://www.werathah.com/
• http://lectures.molgen.mpg.de/online_lecture
s.html
74
References
• http://www.biology.lsu.edu/webfac/jmoroney
/BIOL3090/
75
THANKYOU
FOR YOUR
ATTENATION