Download Slides #5B (Green)

Document related concepts

Genome evolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Mutation wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Community fingerprinting wikipedia , lookup

Expanded genetic code wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Protein wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Western blot wikipedia , lookup

Epitranscriptome wikipedia , lookup

Protein moonlighting wikipedia , lookup

Non-coding DNA wikipedia , lookup

Cyclol wikipedia , lookup

Interactome wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein adsorption wikipedia , lookup

List of types of proteins wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Biochemistry wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Genetic code wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Biosynthesis wikipedia , lookup

Proteolysis wikipedia , lookup

Gene expression wikipedia , lookup

Protein structure prediction wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Molecular evolution wikipedia , lookup

Transcript
BIOM5010
Intro to Molecular Biology
James Green
Systems and Computer Engineering
Carleton University
References/sources

*1: “Introduction
To Molecular Biology” by Salwa Hassan

Teama (M.D.) slideshare.net
D.O.E. Human Genome Program,
http://www.ornl.gov/hgmis
BIOC3101 slides, Prof Bill Willmore (CU)
*3: BIOC3102 slides, Prof Bill Willmore (CU)

*2: “Molecular



Biology for Computer Scientists” by
Lawrence Hunter in Artificial Intelligence &
Molecular Biology
*4: http://bix.ucsd.edu/bioalgorithms/slides.php
2
Overview
Central Dogma (DNA --> RNA --> Protein)
 DNA
 RNA
 Protein


Quick intro:
http://www.genome.gov/Pages/EducationKit/video/qt/3D.mov
3
*1
Central Dogma of Molecular Biology
4
DNA
DNA as a molecule (bases)
 DNA sequencing
 Transcription
 Exon/Intron
 Gene finding
 Chromosomes (centromeres)

Chromosomal structure and impact on expression
 Epigenomics (Hamilton-Stelco story)

5
25K??
Courtesy of U.S. D.O.E. Human Genome Program, http://www.ornl.gov/hgmis
The human genome
6
*1
7
*1

There are four different types of nucleotides found in DNA,
differing only in the nitrogenous base: A is for adenine; G is for
guanine; C is for cytosine and T is for thymine.

These bases are classified based on their chemical structures
into two groups: adenine and guanine are double ringed
structure termed purine , thymine and cytosine are single ring
structures termed pyrimidine.

The bases pair in a specific way: Adenine A with thymine T
(two hydrogen bonds) and guanine G with cytosine C (three
hydrogen bonds).

Within the structure of DNA, the number of thymine is always
equal to the number of adenine and the number of cytosine is
always equal to guanine.

In contrast to DNA; RNA is a single stranded, the pyrimidine
base uracil (U) replaces thymine and ribose sugar replaces
deoxyribose.
*1
DNA as a molecule
*1
Genomic DNA Organization
Sequencing a genome
Sanger sequencing:
http://www.youtube.com/view_play_list?p=F070
1633C91835BF

•
Now using next-generation-sequencing
•
Fragment DNA, then sequence indiv frags
•
Re-assemble into contigs to get full seq
11
RNA
Creation of mature mRNA
 Translation (genetic code)
 Other types of RNA

12
*3
13
*3
14
Courtesy of U.S. D.O.E. Human Genome Program, http://www.ornl.gov/hgmis
15
*4
Splicing to produce mature mRNA
*1
The Genetic Code

The purine and pyrmidine bases of the DNA molecule are the
letters or alphabet of the genetic code. All information
contained in DNA represented by four letters: A,T,C,G.

Three nucleotides of DNA (1st, 2nd and 3rd) form triplet
codons, there are 64 possible codons, most amino acids have
more than one possible codon. Out of the 64 possible 3-base
codons, 61 specify amino acids; the other three are stop
signals (UAG, UAA, or UGA).

The sequence of codons in the mRNA defines the primary
structure of the final protein.
*1
http://www.accessexcellence.org/RC/VL/GG/genetic.php
*1
Series of codons in part of a mRNA molecule. Each codon consists of three
nucleotides, representing a single amino acid.
Other RNAs
RNA has structure
 Can be functional
on its own (e.g.
microRNA,
ribozymes,
aptamers)

Hammerhead rybozyme
20
Protein
Chain of amino acids
 Protein folding/structure
 PTMs (cleavage, localization, AA
modifications e.g. hydroxylation, etc)
 Sequence evolution/MSA
 MS for identifying proteins in a mixture
 Protein interactions
 Important types of proteins

21
*1
The Protein

Proteins are the basic building materials of a cell, made by cell
itself; the final product of most genes.

Proteins are chain like polymers of a few or many thousands of
amino acids. Amino acids are represented by codons, which
are 3-nucleotide RNA sequences. Amino acids joined together
by peptide bonds (polypeptide). Proteins can be composed of
one or more polypeptide chains.

Proteins have many functions: provide structure that help cells
integrity and shape (e.g. collagen in bone); serve as enzymes
and hormones; bind and carry substance and control of
activities of genes….
Polypeptide chain
R
Amino terminus
of the protein
chain
H
N
H
O
H
H
Cα
H
N
C
O
ψ
φ
Cα
R
C
R
0° - cis
180° - trans
ω
N
Cα
H
Towards
carboxyl
terminus
H
23
24
The big picture of protein structure
…GTC CAG TCA ATA GCG GTC …
Genomic DNA
Transcription
&
Translation
DNA
Protein
Amino acid sequence
(Primary protein structure)
…C A W V Q S I A W S Y D R M A…
Local protein folding
Secondary protein structure
…T T H H H H H T T E E E E…
Protein folding continues…
Tertiary protein structure
25
Tertiary structure

Definition


Locations of all atoms in the protein chain
Data source


Can be resolved through experimental techniques

X-ray crystallography, NMR spectroscopy

Unreliable, costly, not always possible
Computational methods can be applied sometimes

Comparative modeling, fold recognition, ab-initio predictions
26
Secondary structure


Regions of local repeating regular structure

Helices: α, 310, π

β-strands which form β-sheets
46% of residues form non-regular structure

Connecting chain between regular structures


Turns, bends, binding sites
Data source

Can be derived from tertiary structure
27
Alpha-helices
•Corkscrew shape
•H-bonds between residue i and i+4
•Most compact structure
From S-Star.org lecture 6, http://www.s-star.org
•Most abundant regular structure
•(32-38% or residues)
28
Beta-sheets
From S-Star.org lecture 6, http://www.s-star.org
29
30
The importance of protein structure
Proteins are involved in almost all
biological processes
 Function is determined by structure
 Determining protein structure is of
fundamental importance to biology


Example:

Current drugs target only ~500 of 30,000
proteins
31
*2
Review...
32
*1
Types of gene expression control in
eukaryotes

Transcriptional, prevent transcription, prevent mRNA from
being synthesized.


Regulatory regions, Chromosome structure
Posttranscriptional, control mRNA after it has been produced.

microRNAs

Translational, prevent translation; involve protein factors
needed for translation.

Posttranslational, after the protein has been produced.

Many…
*1
Genomic DNA Organization
Transcriptional Control
35
Microarrays


Measure gene expression by quantifying mRNA
levels for each gene
Matrix of wells, each with probes inside



Quantify using brightness of well (spot)


Complementary DNA sequence “catches” mRNA
Wash rest away
DNA tagged with fluorescent beads
Being replaced with next generation sequencing
36
37
38
Protein evolution
Mutations
 Phylogenetics
 Multiple sequence alignment


ID conserved regions
39
*1
DNA Mutation

Mutation include both gross alteration of chromosome and more
subtle alteration to specific gene sequence.

Gross chromosomal aberrations include: large deletions; addition and
translocation (reciprocal and nonreciprocal).

Mutation in a gene's DNA sequence can alter the amino acid
sequence of the protein encoded by the gene. Point mutations are the
result of the substitution of a single base. Frame-shift mutations occur
when the reading frame of the gene is shifted by addition or deletion
of one or more bases.

Mutations can have harmful, beneficial, neutral, or uncertain effects on
health and may be inherited as autosomal dominant, autosomal
recessive, or X-linked traits. Mutations that cause serious disability
early in life are usually rare because of their adverse effect on life
expectancy and reproduction.

GREAT site about SNPs, personalized medicine:
http://learn.genetics.utah.edu/content/health/pharma/snips/
41
Multiple sequence alignment
42
43
*2
44
Mass Spectrometry
Important proteomic tool
 Analytic technique
 Identifies proteins from sample

45
Mass Spectrometry
…WDQYTDFUEFAGDUDDALLVKLKLKLMNEFLQWKEQW
DGHQW…
46
Mass Spectrometry
47
Mass Spectrometry
48
Mass Spectrometry
+
+
+
+
+
+
+
49
Mass Spectrometry
+
+
+
+
+
+
abundance
+
m/z ratio
(Survey ion
spectrum)
50
Mass Spectrometry
abundance
+
2
+
1
3
m/z ratio
(Survey
spectrum)
51
Mass Spectrometry
+
+
52
Mass Spectrometry
V
L
L
D
K
A
53
abundance
Mass Spectrometry
m/z ratio
(Product ion
spectrum)
54
abundance
Mass Spectrometry
m/z ratio
(Product ion
spectrum)
“LLVK”
55
Mass Spectrometry
+
abundance
+
1
2
3
m/z ratio
(Survey ion
spectrum)
56
The Central Dogma
DNA
RNA
Proteins
57
Post-translational
modifications
Phosphorylation
 Glycosylation
 Ubiquitination
 Methylation
 Others!

?
?
?
?
?
58
59
Random neat stuff
Protein-protein interactions
 Functional genomics
 GFP
 Prions and viroids
 PCR

60
Protein-protein interactions
Proteins often physically interact to
perform function
 Can detect complexes in vitro or in silico

61
PIPE
62
PIPE: Homo Sapiens Global Scan

First ever “complete” human
interactome!


242M pairs  170K PPIs
Other methods can only
examine ~25% of protein pairs
Computational complexity
(PIPE <1s per pair)
 Availability of input features
(e.g. structure)
Used HPCVL’s Victoria Falls cluster




1168 Sun UltraSparc T2+ cores
Total runtime: three months
Homo Sapiens (Human)*
* Image from BrainMaps.org
63
PIPE: Seasonal Allergic Rhinitis (SAR)

Collaborative project with:






Department of Pediatrics, Gothenburg
University, Gothenburg, Sweden.
The Centre for Individualized Medication,
Linköping University. Linköping, Sweden.
Banting and Best Department of Medical
Research, Donnelly Centre, University of
Toronto, Toronto, Canada.
“Hay fever”
Study to find new biomarkers to identify
SAR in patients.
Results were supported by patient data.
64
Cross-Organism Predictions

PIPE capable of crossspecies predictions


Makes it possible to predict
PPI in a newly sequenced
organism, something most
methods cannot do.
Can predict host-pathogen
interactions (HIV, Zika,
Hepatitis)
65
PIPE: Volvox/Chlamy/Gonium

Collaborative project with:





Chlamydomonas (C. reinhardtii)


Unicellular (undifferentiated cells).
Goniaceae (G. pectorale)


Bradley Olson (Olson Lab, Kansas State)
Pierre Durand (Wits University, South Africa)
Jonathan Featherston (Agricultural Research Council, South
Africa)
Richard E. Michod (University of Arizona)
Unicellular, but forms colonies.
Volvocaceae (V. carteri)

Multicellular.
Richard E. Michod, Evolution of individuality during
the transition from unicellular to multicellular life, PNAS, 2007
66
In-Silico Protein Synthesizer
(InSiPS)
PIPE + Genetic Algorithms + IBM Blue Gene/Q
Create novel proteins
 Bind strongly with target protein
 Don’t bind to any other proteins (side-effects)

UV Light Exposure
(DNA damage)
Fitness: 0.465
Target: 0.718
Max off-target:0.352
67
Functional genomics
~4500 gene deletion strains of yeast
 Apply treatment  effect on colony growth

68
69
70
71
72
PCR
73
74
75
Questions?
76