Download C - NCSU Bioinformatics Research Center

Document related concepts

Protein wikipedia , lookup

Metabolism wikipedia , lookup

Community fingerprinting wikipedia , lookup

Molecular cloning wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Polyadenylation wikipedia , lookup

DNA supercoil wikipedia , lookup

RNA silencing wikipedia , lookup

Promoter (genetics) wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Proteolysis wikipedia , lookup

RNA wikipedia , lookup

Messenger RNA wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Two-hybrid screening wikipedia , lookup

RNA-Seq wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Point mutation wikipedia , lookup

Biochemistry wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genetic code wikipedia , lookup

Epitranscriptome wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Gene expression wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Biosynthesis wikipedia , lookup

Transcript
BI I: Bioinformatics and Basic Molecular Biology
August 23, 2011
1
Announcements
• Remember: course website up and running
• http://statgen.ncsu.edu/st590a/index.php
• Login: students
• Password: st590a
2
Sequences are letters in an alphabet
• Class focuses on biological sequence analysis
• Sequences are strings of “letters”
• Come from a biological “alphabet”
• Most biological “alphabets” look like the alphabet
• DNA:
A,C,G,T
• RNA:
A,C,G,U
• Amino acids: A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y
3
Sequences are words in a sentence
• Proper function depends on:
• Both spelling and grammar
Class
is
interesting
but
it
moves
• Both tell us how a sentence should look
• But change either one and you change the meaning
4
Goals of today
• Topics
• Cells
• Chromosomes
• Basic alphabets
• DNA, RNA, amino acids
• Basic grammatical structures
• Genes, proteins
• The genome, transcriptome, and proteome
5
Cells
• Complex system enclosed in a
membrane
• Organisms can be unicellular
• e.g. bacteria, baker’s yeast
• Organisms can be multicellular
• Humans:
• 60 trillion cells
• 320 cell types
www.ebi.ac.uk/microarray/ biology_intro.htm
Example Animal Cell
6
A dichotomy of organisms, at least
• Eukaryotes
• contain a membrane-bound nucleus and organelles
(plants, animals, fungi,…)
• Prokaryotes
• lack a true membrane-bound nucleus and organelles
(single-celled, includes bacteria)
• Not all single celled organisms are prokaryotes
7
Chromosomes
• In eukaryotes, the nucleus contains one or several
double stranded DNA molecules organized as
chromosomes
• Humans:
• 22 Pairs of autosomes
• 1 pair sex chromosomes
Human Karyotype
http://avery.rutgers.edu/WSSP/StudentScholars/Session8/Session8.html
8
A peek inside the cell
Image source: www.biotec.or.th/Genome/whatGenome.html
9
DNA densely packed into chromosomes
• How long is unwound human chromosome? ~5cm
10
Ex: Bioinformatics informs biology
• Discovery a new grammar, publish in Nature
11
What is DNA?
• DNA: Deoxyribonucleic Acid
• Single stranded molecule (oligomer, polynucleotide)
chain of nucleotides
• Four different nucleotides:
•
•
•
•
Adenosine
Cytosine
Guanine
Thymine
(A)
(C)
(G)
(T)
• Our first alphabet
• Question: Does a small alphabet require long words?
12
Nucleotide bases
• Purines (A and G)
• Pyrimidines (C and T)
• Difference is in base structure
Image Source: www.ebi.ac.uk/microarray/ biology_intro.htm
13
Nucleotides chain together to form DNA
Phosphate group
Base
Ribose or
Deoxyribose
(shown here)
• A nucleotide unit consists of a
pentose sugar, a phosphate
moiety (containing up to 3
phosphate groups) and a
Base.
• Subunits are linked together
by phosphodiester bond, to
form a ‘sugar-phosphate
backbone’:
14
Single-stranded DNA polynucleotide
• Example polynucleotide:
5’ G→T→A→A→A→G→T→C→C→C→G→T→T→A→G→C 3’
• Or more commonly:
GTAAAGTCCCGTTAGC
15
Double-stranded DNA
• DNA can be single-stranded or double-stranded
• Double-stranded DNA
• second strand is the “reverse complement” strand
• Reverse complement runs in opposite direction and
bases are complementary
• Complementary bases:
• A complements T
• C complements G
16
Double-stranded sequence
• Example double-stranded polynucleotide:
5’ G→T→A→A→A→G→T→C→C→C→G→T→T→A→G→C 3’
| | | | | | | | | | | | | | | |
3’ C←A←T←T←T←C←A←G←G←G←C←A←A←T←C←G 5’
• Or more commonly
GTAAAGTCCCGTTAGC
CATTTCAGGGCAATCG
• Chromosomal DNA is double-stranded
• Why store redundant information?
17
The Double Helix
• Two complementary DNA strands form a stable
DNA double helix
• Spring ‘03 was the 50th anniversary of its discovery
18
DNA Replication
• “It has not escaped our notice that the specific
pairing we have postulated immediately suggests a
possible copying mechanism for the genetic material”
• Watson & Crick, Nature (1953)
19
Think: Unwind and copy
20
Replication yields two identical(?) double-helices
21
H-bonding between complementary bases
Why store redundant information?
• Nomenclature: sequence hybridizes to its reverse complement
22
BI: Integrating your knowledge
• Hydrogen bonding holds the two strands together
• 2 bonds between A and T
• 3 bonds between G and C
• How much of a organism’s DNA is G-C vs. A-T?
• Fact: Heat can denature molecules, DNA included
• Would bacteria in a hot environment benefit from
an excess of G-C base pairs?
23
An interesting research question
24
Comparative analysis of prokaryotes requires:
• To answer the research question, first we must ask
•
•
•
•
•
•
•
Where do we get the sequences?
Which subset of the genome should we use?
For which prokaryotes are genome sequences known?
What if the prokaryotic genomes are different lengths?
How do we account for evolutionary relationships?
What are the other constraints on G-C content?
…
25
Replication yields two identical(?) double-helices
• What if replication makes a mistake?
• The birth of sequence variation!
• This is the subject of our next class
26
Replication precision in humans
• Error rate:
• Once for every 10 billion operations! (???)
• DNA Polymerase
• Makes sure complementary bases are placed with one
another as a strand of bases becomes duplicated DNA
• Before the DNA polymerase adds the next
nucleotide, previous nucleotide pair is “checked”
• Incorrect pair are clipped off and replaced
27
Broad summary
• Thus far:
• Storage of genetic information
• Replication of genetic information
• Next up:
• Execution of genetic information
28
Central
dogma
of
molecular
biology
Flow of Information in Living Systems
• Genome = blueprint
• Proteins = building blocks
transcription
DNA

translation
RNA

Protein
DNA Sequence Implies Structure Implies Function
29
Dogma detailed and depicted
Transcription
mRNA
Transport
Translation
Nascent polypeptide
mRNA
ribosome
Post-transl. modif
functional protein
30
RNA
• Ribonucleic Acid
• Another alphabet
• Similar to DNA
• Thymine (T) is replaced by uracil (U)
• Note that U is also complementary to A
• RNA can be:
• Single stranded
• Double stranded
• Hybridized with DNA
31
RNA
• RNA is generally single stranded
• Forms secondary or tertiary structures
• When spelled correctly!
• RNA folding will be discussed later
• Important in a variety of ways, including protein
synthesis
32
mRNA
• Messenger RNA
• Linear molecule encoding genetic information copied
from DNA molecules
• Transcription
• Process in which DNA is copied into an RNA molecule
• mRNA is complementary to the DNA from which
it is transcribed: CTGAAT
GACUUA
33
Types of RNA
• mRNA
• Messenger RNA
• tRNA
• Transfer RNA
• rRNA
• Ribosomal RNA
• snRNA
• Small nuclear RNA
34
RNA self-complementarity
• A single-stranded RNA can fold back on itself
• Complementary bases are “sticky”
GU
GGUGCG A
GGUGCGGUAAGAGCGCACC
A
CCACGC
G
GA
• Folded molecules “make things work”
• We will see this with proteins
35
RNA secondary structure
• E. coli Rnase P RNA secondary
structure
• How might one predict whether
or not a sequence of bases
(A,C,G,U) is likely an RNA?
Image source: www.mbio.ncsu.edu/JWB/MB409/lecture/ lecture05/lecture05.htm
36
Life depends on three critical molecules
• DNAs
• Hold information on how cell works
• RNAs
• Act to transfer short pieces of information to cell parts
• Provide templates to synthesize into protein
• Proteins
• Form enzymes that send signals to other cells and
regulate gene activity
• Form body’s major components (e.g. hair, skin, etc.)
37
DNA, RNA, and the flow of information
Replication
Transcription
Translation
38
Genes make proteins
• Proteins do all sorts of things
• Catalysis, structure, movement, defense, regulation,
transport, storage, stress response,…
39
But what is a gene?
40
Gene structure
• Genes must have:
• Exons (usually, protein-coding DNA to be translated)
• Start site
• Control region
Eukaryotic gene structure
• Proper gene function requires:
• Parts spelled correctly in grammatically correct order
41
Eukaryotic genes
Enhancer
Promoter Transcribed Region Terminator
Transcription
RNA Polymerase II
Primary transcript 5’
Intron1
3’
Exon1
Cap
Splice
Cleave/Polyadenylate
Translation
C
N
Polypeptide
Exon2
7mG
An
Transport
7mG
An
42
Prokaryotic genes
Promoter
Cistron1
Cistron2
Transcription
CistronN Terminator
RNA Polymerase
mRNA 5’
3’
1
2
Translation
C
N
N
N
Ribosome, tRNAs,
Protein Factors
C
N
Polypeptides
C
1
2
3
• Prokaryotic and eukaryotic genes have very
different grammars
43
Prokaryotes versus eukaryotes
• In prokaryotes
• The transcribed mRNA is ready to be translated into
protein (polypeptide) product
• In eukaryotes
• The transcribed mRNA (pre-mRNA) must first be
processed into mature mRNA
• The protein-coding regions (exons) are interspersed
with non-coding regions (introns) which must be excised
44
Key feature of Eukaryotic gene structure
• Most eukaryotic genes are split, containing large
untranscribed sequences
• Exon
• Part of the gene contributing to mature mRNA
• Intron
• Part of the gene which is not transcribed
• Introns found in all genes, including those
coding for RNAs
45
The structure of human PSA
• How might you find the exons?
• There are grammatical rules
• And exons and introns are spelled differently!
46
How does the gene “become” a protein?
DNA
Transcription
RNA
Translation
Protein
47
Translation
• mRNA is used as a template to
make proteins
• mRNA is bound by tRNA
three bases at a time
• Occurs at the ribosome
• The bound tRNA also binds
to a specific amino acid
• This amino acid extends the
nascent protein sequence
48
mRNA encodes the protein message in DNA
• Exonic DNA is interpreted in groups of 3 (codons)
AGTTTTGGGCCCAAA
• The 64 (4 × 4 × 4) codons correspond to actions
to be taken at the ribosome
• Start transcription (begin a protein)
• Add one of twenty amino acids (extend a protein)
• Stop transcription (end a protein)
49
Genetic Code
• From RNA triplets to amino acids
•
•
•
•
•
•
•
4 possible bases (A, C, G, U)
3 bases in the codon
4 x 4 x 4 = 64 possible codon sequences
Start codon: AUG
Stop codons: UAA, UAG, UGA
61 codons to code for amino acids (AUG as well)
20 amino acids – redundancy in genetic code
• Yet another alphabet!
• This one spells proteins
50
Genetic code: 64 triplets code 22 tasks
http://www.geneticengineering.org/chemis/Chemis-NucleicAcid/RNA.htm
51
Amino acids
• Building blocks for proteins (20 different)
• vary by side chain groups
• Side chain group gives amino acid physical and chemical
properties
• e.g. hydrophilic amino acids are water soluble (vs. Hydrophobic)
• Linked together via a single chemical bond (peptide bond)
• Peptide: Short linear chain of amino acids (< 30)
• Polypeptide: long chain of amino acids (which can be
upwards of 4000 residues long).
52
The 20 naturally occurring amino acids
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Glycine (G, GLY)
Alanine (A, ALA)
Valine (V, VAL)
Leucine (L, LEU)
Isoleucine (I, ILE)
Phenylalanine (F, PHE)
Proline (P, PRO)
Serine (S, SER)
Threonine (T, THR)
Cysteine (C, CYS)
Methionine (M, MET)
Tryptophan (W, TRP)
Tyrosine (T, TYR)
Asparagine (N, ASN)
Glutamine (Q, GLN)
Aspartic acid (D, ASP)
Glutamic Acid (E, GLU)
Lysine (K, LYS)
Arginine (R, ARG)
Histidine (H, HIS)
53
Peptide bonds link amino acids
54
Side-chains distinguish amino acid properties
• Each AA has a unique side-chain
• Unique molecular properties
• Molecular properties of AAs
determine protein structure
• Another alphabet!
• More like a linguistic alphabet
55
Proteins
• Polypeptides having a three dimensional structure.
• Proteins can form interations:
• Proteins (complexes, oligomers)
• mRNA
• DNA
• Proteins can bind to each other depending on their
relative charges and structures
56
Four levels of protein structure
57
Secondary structure
• Parts of a protein may fold on themselves to form
• α-helix, β-sheet, random coil
58
Domains
• Domains are structural and/or functional modules within
the protein that are usually separately folded
Leucine zipper
E-F Hand
Zinc finger
59
Recognizing domains
• Predict protein fold (secondary structure) based on
primary sequence
• Requires knowledge of spelling and grammar
• Key point:
• Similar spelling often implies similar function
60
Tertiary interactions
61
Predicting protein function and structure
• Given the sequence of amino acids (primary
structure) of an anonymous protein
• How might one predict its function?
• How might one predict its folded structure?
62
Review of alphabetic BI data
• DNA
• String from 4-letter alphabet of nucleotides (A,G,C,T)
• RNA
• String from 4-letter alphabet of nucleotides (A,G,C,U)
• Coding sequence
• String from 64-letter alphabet of nucleotide triplets (AAA,…)
• Proteins
• String from 20-letter alphabet of amino acids (Ala, Cys, …)
63
From now until December
• Explore and exploit the grammar of the genome
• How does evolution shape sequence diversity?
• Change over time subject to constraints on grammar
and spelling
• Constraints themselves change over time!
• Distinguish random sequences from those
structured by function
• One million monkeys on one million typewriters?
• Or one Shakespeare with vellum?
64
Biological sequence analysis in one sentence
Use biological knowledge of
(1) grammar
(2) spelling
(3) evolution
to identify sequences with too much
structure to have occurred by chance
66
Looking forward
• Two goals of this class
• What are the important problems?
• What tools and techniques can we use to address them?
• Next time
• Origins of sequence diversity
• Reading?
67