Download C - Bioinformatics Research Center

Document related concepts

Gene regulatory network wikipedia , lookup

RNA interference wikipedia , lookup

DNA supercoil wikipedia , lookup

Metalloprotein wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Metabolism wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Protein wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Polyadenylation wikipedia , lookup

RNA silencing wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

RNA wikipedia , lookup

Proteolysis wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene wikipedia , lookup

Messenger RNA wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Protein structure prediction wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Point mutation wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Biochemistry wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Genetic code wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Epitranscriptome wikipedia , lookup

Gene expression wikipedia , lookup

Biosynthesis wikipedia , lookup

Transcript
BI I: Bioinformatics and Basic Molecular Biology
August 27, 2013
1
Announcements
• Remember: course website up and running
• http://statgen.ncsu.edu/st590a/index.php
• Login: students
• Password: st590a
2
Sequences are letters in an alphabet
• Much of class focuses on biological sequence
analysis
• Sequences are strings of “letters”
• Come from a biological “alphabet”
• Most biological “alphabets” look like the alphabet
• DNA:
A,C,G,T
• RNA:
A,C,G,U
• Amino acids: A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y
3
Sequences are words in a sentence
• Proper function depends on:
• Both spelling and grammar
Class
is
interesting
but
it
moves
• Both tell us how a sentence should look
• But change either one and you change the meaning
4
Cells
• Complex system enclosed in a
membrane
• Organisms can be unicellular
• e.g. bacteria, baker’s yeast
• Organisms can be multicellular
• Humans:
• 60 trillion cells
• 320 cell types
www.ebi.ac.uk/microarray/ biology_intro.htm
Example Animal Cell
5
A dichotomy of organisms, at least
• Eukaryotes
• contain a membrane-bound nucleus and organelles
(plants, animals, fungi,…)
• Prokaryotes
• lack a true membrane-bound nucleus and organelles
(single-celled, includes bacteria)
• Not all single celled organisms are prokaryotes
6
Chromosomes
• In eukaryotes, the nucleus contains one or several
double stranded DNA molecules organized as
chromosomes
• Humans:
• 22 Pairs of autosomes
• 1 pair sex chromosomes
Human Karyotype
http://avery.rutgers.edu/WSSP/StudentScholars/Session8/Session8.html
7
A peek inside the cell
Image source: www.biotec.or.th/Genome/whatGenome.html
8
What is DNA?
• DNA: Deoxyribonucleic Acid
• Single stranded molecule (oligomer, polynucleotide)
chain of nucleotides
• Four different nucleotides:
•
•
•
•
Adenosine
Cytosine
Guanine
Thymine
(A)
(C)
(G)
(T)
• Our first alphabet
• Question: Does a small alphabet require long words?
9
Nucleotide bases
• Purines (A and G)
• Pyrimidines (C and T)
• Difference is in base structure
Image Source: www.ebi.ac.uk/microarray/ biology_intro.htm
10
Nucleotides chain together to form DNA
Phosphate group
Base
Ribose or
Deoxyribose
(shown here)
• A nucleotide unit consists of a
pentose sugar, a phosphate
moiety (containing up to 3
phosphate groups) and a
Base.
• Subunits are linked together
by phosphodiester bond, to
form a ‘sugar-phosphate
backbone’:
11
Single-stranded DNA polynucleotide
• Example polynucleotide:
5’ G→T→A→A→A→G→T→C→C→C→G→T→T→A→G→C 3’
• Or more commonly:
GTAAAGTCCCGTTAGC
12
Double-stranded DNA
• DNA can be single-stranded or double-stranded
• Double-stranded DNA
• second strand is the “reverse complement” strand
• Reverse complement runs in opposite direction and
bases are complementary
• Complementary bases:
• A complements T
• C complements G
13
Double-stranded sequence
• Example double-stranded polynucleotide:
5’ G→T→A→A→A→G→T→C→C→C→G→T→T→A→G→C 3’
| | | | | | | | | | | | | | | |
3’ C←A←T←T←T←C←A←G←G←G←C←A←A←T←C←G 5’
• Or more commonly
GTAAAGTCCCGTTAGC
CATTTCAGGGCAATCG
• Chromosomal DNA is double-stranded
• Why store redundant information?
14
H-bonding between complementary bases
Why store redundant information?
• Nomenclature: sequence hybridizes to its reverse complement
15
BI: Integrating your knowledge
• Hydrogen bonding holds the two strands together
• 2 bonds between A and T
• 3 bonds between G and C
• How much of a organism’s DNA is G-C vs. A-T?
• Fact: Heat can denature molecules, DNA included
• Would bacteria in a hot environment benefit from
an excess of G-C base pairs?
16
An interesting research question
17
Comparative analysis of prokaryotes requires:
• To answer the research question, first we must ask
•
•
•
•
•
•
•
Where do we get the sequences?
Which subset of the genome should we use?
For which prokaryotes are genome sequences known?
What if the prokaryotic genomes are different lengths?
How do we account for evolutionary relationships?
What are the other constraints on G-C content?
…
18
Replication yields two identical(?) double-helices
• What if replication makes a mistake?
• The birth of sequence variation!
• This is the subject of our next class
19
Broad summary
• Thus far:
• Storage of genetic information
• Replication of genetic information
• Next up:
• Execution of genetic information
20
Central
dogma
of
molecular
biology
Flow of Information in Living Systems
• Genome = blueprint
• Proteins = building blocks
transcription
DNA

translation
RNA

Protein
DNA Sequence Implies Structure Implies Function
21
Dogma detailed and depicted
Transcription
mRNA
Transport
Translation
Nascent polypeptide
mRNA
ribosome
Post-transl. modif
functional protein
22
RNA
• Ribonucleic Acid
• Another alphabet
• Similar to DNA
• Thymine (T) is replaced by uracil (U)
• Note that U is also complementary to A
• RNA can be:
• Single stranded
• Double stranded
• Hybridized with DNA
23
RNA
• RNA is generally single stranded
• Forms secondary or tertiary structures
• When spelled correctly!
• RNA folding will be discussed later
• Important in a variety of ways, including protein
synthesis
24
mRNA
• Messenger RNA
• Linear molecule encoding genetic information copied
from DNA molecules
• Transcription
• Process in which DNA is copied into an RNA molecule
• mRNA is complementary to the DNA from which
it is transcribed: CTGAAT
GACUUA
25
(Some) types of RNA
• mRNA
• Messenger RNA
• tRNA
• Transfer RNA
• rRNA
• Ribosomal RNA
• snRNA
• Small nuclear RNA
26
RNA self-complementarity
• A single-stranded RNA can fold back on itself
• Complementary bases are “sticky”
GU
GGUGCG A
GGUGCGGUAAGAGCGCACC
A
CCACGC
G
GA
• Folded molecules “make things work”
• We will see this with proteins
27
RNA secondary structure
• E. coli Rnase P RNA secondary
structure
• How might one predict whether
or not a sequence of bases
(A,C,G,U) is likely an RNA?
Image source: www.mbio.ncsu.edu/JWB/MB409/lecture/ lecture05/lecture05.htm
28
Life depends on three critical molecules
• DNAs
• Hold information on how cell works
• RNAs
• Act to transfer short pieces of information to cell parts
• Provide templates to synthesize into protein
• Proteins
• Form enzymes that send signals to other cells and
regulate gene activity
• Form body’s major components (e.g. hair, skin, etc.)
29
DNA, RNA, and the flow of information
Replication
Transcription
Translation
30
Genes make proteins
• Proteins do all sorts of things
• Catalysis, structure, movement, defense, regulation,
transport, storage, stress response,…
31
But what is a gene?
32
Gene structure
• Genes must have:
• Exons (usually, protein-coding DNA to be translated)
• Start site
• Control region
Eukaryotic gene structure
• Proper gene function requires:
• Parts spelled correctly in grammatically correct order
33
Eukaryotic genes
Enhancer
Promoter Transcribed Region Terminator
Transcription
RNA Polymerase II
Primary transcript 5’
Intron1
3’
Exon1
Cap
Splice
Cleave/Polyadenylate
Translation
C
N
Polypeptide
Exon2
7mG
An
Transport
7mG
An
34
Prokaryotic genes
Promoter
Cistron1
Cistron2
Transcription
CistronN Terminator
RNA Polymerase
mRNA 5’
3’
1
2
Translation
C
N
N
N
Ribosome, tRNAs,
Protein Factors
C
N
Polypeptides
C
1
2
3
• Prokaryotic and eukaryotic genes have very
different grammars
35
Prokaryotes versus eukaryotes
• In prokaryotes
• The transcribed mRNA is ready to be translated into
protein (polypeptide) product
• In eukaryotes
• The transcribed mRNA (pre-mRNA) must first be
processed into mature mRNA
• The protein-coding regions (exons) are interspersed
with non-coding regions (introns) which must be excised
36
Key feature of Eukaryotic gene structure
• Most eukaryotic genes are split, containing large
untranscribed sequences
• Exon
• Part of the gene contributing to mature mRNA
• Intron
• Part of the gene which is not transcribed
• Introns found in all types of genes, including those
coding for RNAs
37
The structure of human PSA
• How might you find the exons?
• There are grammatical rules
• And exons and introns are spelled differently!
38
How does the gene “become” a protein?
DNA
Transcription
RNA
Translation
Protein
39
Translation
• mRNA is used as a template to
make proteins
• mRNA is bound by tRNA
three bases at a time
• Occurs at the ribosome
• The bound tRNA also binds
to a specific amino acid
• This amino acid extends the
nascent protein sequence
40
mRNA encodes the protein message in DNA
• Exonic DNA is interpreted in groups of 3 (codons)
AGTTTTGGGCCCAAA
• The 64 (4 × 4 × 4) codons correspond to actions
to be taken at the ribosome
• Start transcription (begin a protein)
• Add one of twenty amino acids (extend a protein)
• Stop transcription (end a protein)
41
Genetic Code
• From RNA triplets to amino acids
•
•
•
•
•
•
•
4 possible bases (A, C, G, U)
3 bases in the codon
4 x 4 x 4 = 64 possible codon sequences
Start codon: AUG (also encodes methionine)
Stop codons: UAA, UAG, UGA
61 codons to code for amino acids (AUG as well)
20 amino acids – redundancy in genetic code
• Yet another alphabet!
• This one spells proteins
42
Genetic code: 64 triplets code 22 tasks
http://www.geneticengineering.org/chemis/Chemis-NucleicAcid/RNA.htm
43
Amino acids
• Building blocks for proteins (20 different)
• vary by side chain groups
• Side chain group gives amino acid physical and chemical
properties
• e.g. hydrophilic amino acids are water soluble (vs. Hydrophobic)
• Linked together via a single chemical bond (peptide bond)
• Peptide: Short linear chain of amino acids (< 30)
• Polypeptide: long chain of amino acids (which can be
upwards of 4000 residues long).
44
The 20 naturally occurring amino acids
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Glycine (G, GLY)
Alanine (A, ALA)
Valine (V, VAL)
Leucine (L, LEU)
Isoleucine (I, ILE)
Phenylalanine (F, PHE)
Proline (P, PRO)
Serine (S, SER)
Threonine (T, THR)
Cysteine (C, CYS)
Methionine (M, MET)
Tryptophan (W, TRP)
Tyrosine (T, TYR)
Asparagine (N, ASN)
Glutamine (Q, GLN)
Aspartic acid (D, ASP)
Glutamic Acid (E, GLU)
Lysine (K, LYS)
Arginine (R, ARG)
Histidine (H, HIS)
45
Peptide bonds link amino acids
46
Side-chains distinguish amino acid properties
• Each AA has a unique side-chain
• Unique molecular properties
• Molecular properties of AAs
determine protein structure
• Another alphabet!
• More like a linguistic alphabet
47
Proteins
• Polypeptides having a three dimensional structure.
• Proteins can form interations:
• Proteins (complexes, oligomers)
• mRNA
• DNA
• Proteins can bind to each other depending on their
relative charges and structures
48
Four levels of protein structure
49
Secondary structure
• Parts of a protein may fold on themselves to form
• α-helix, β-sheet, random coil
50
Domains
• Domains are structural and/or functional modules within
the protein that are usually separately folded
Leucine zipper
E-F Hand
Zinc finger
51
Recognizing domains
• Predict protein fold (secondary structure) based on
primary sequence
• Requires knowledge of spelling and grammar
• Key point:
• Similar spelling often implies similar function
52
Tertiary interactions
53
Predicting protein function and structure
• Given the sequence of amino acids (primary
structure) of an anonymous protein
• How might one predict its function?
• How might one predict its folded structure?
54
Review of alphabetic BI data
• DNA
• String from 4-letter alphabet of nucleotides (A,G,C,T)
• RNA
• String from 4-letter alphabet of nucleotides (A,G,C,U)
• Coding sequence
• String from 64-letter alphabet of nucleotide triplets (AAA,…)
• Proteins
• String from 20-letter alphabet of amino acids (Ala, Cys, …)
55
From now until December
• Explore and exploit the grammar of the genome
• How does evolution shape sequence diversity?
• Change over time subject to constraints on grammar
and spelling
• Constraints themselves change over time!
• Distinguish random sequences from those
structured by function
• One million monkeys on one million typewriters?
• Or one Shakespeare with vellum?
56
Biological sequence analysis in one sentence
Use biological knowledge of
(1) grammar
(2) spelling
(3) evolution
to identify sequences with too much
structure to have occurred by chance
58
Looking forward
• Two goals of this class
• What are the important problems?
• What tools and techniques can we use to address them?
• Next time
• Origins of sequence diversity
• Reading?
59