Download Introduction to Bioinformatics

Document related concepts

Synthetic biology wikipedia , lookup

RNA silencing wikipedia , lookup

Replisome wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Protein wikipedia , lookup

RNA-Seq wikipedia , lookup

Non-coding RNA wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Genome evolution wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Non-coding DNA wikipedia , lookup

Epitranscriptome wikipedia , lookup

Proteolysis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Expanded genetic code wikipedia , lookup

Gene expression wikipedia , lookup

Protein structure prediction wikipedia , lookup

Deoxyribozyme wikipedia , lookup

List of types of proteins wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Biochemistry wikipedia , lookup

Genetic code wikipedia , lookup

Molecular evolution wikipedia , lookup

Transcript
Introduction to Bioinformatics
Doç. Dr. Nizamettin AYDIN
[email protected]
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
1
Recommended Texts
www.amazon.com
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
2
Recommended Texts - 2
www.amazon.com
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
3
Recommended Texts - 3
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
4
Recommended Texts - 4
Bioinformatics for
Dummies
Jean Claverie, Cedric
Notredame
Bioinformatics: A Practical
Guide to the Analysis of
Genes and Proteins
Andreas D. Baxevanis, B. F.
Ouellette, Ouellette B. F.
Francis.
Instant Notes in
Bioinformatics
D. R. Westhead, Richard M.
Twyman, J. H. Parish
Bioinformatics:
Sequence and
Genome Analysis,
Vol. 5
David W. Mount, David
Mount
Developing
Bioinformatics
Computer Skills
Cynthia Gibas, Per
Jambeck, Lorrie
LeJeune (Editor)
Discovering
Genomics,
Proteomics, and
Bioinformatics
A. Malcolm Campbell,
Laurie J. Heyer
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
5
Recommended Texts - 5
Structural
Bioinformatics
Philip E. Bourne (Editor),
Helge Weissig
Beginning Perl for
Bioinformatics
James Tisdall
Mastering Perl for
Bioinformatics
James D. Tisdall
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
6
What is Bioinformatics?...
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
7
...What is Bioinformatics?...
Computational
Biology
Bioinformatics
Genomics
Proteomics
Functional
genomics
Structural
bioinformatics
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
8
...What is Bioinformatics?
• Bioinformatics: collection and storage of
biological information
• Computational biology: development of
algorithms and statistical models to analyze
biological data
• Bioinformatics/Computational Biology will
be interchanged
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
9
Why is Bioinformatics Important?
• Applications areas include
–
–
–
–
–
–
–
–
Medicine
Pharmaceutical drug design
Toxicology
Molecular evolution
Biosensors
Biomaterials
Biological computing models
DNA computing
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
10
Why should I care?
• SmartMoney ranks
Bioinformatics as #1 among
next HotJobs
• Business Week 50 Masters of
Innovation
• Jobs available, exciting research
potential
• Important information waiting
to be decoded!
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
11
Why is bioinformatics hot?
• Supply/demand: few people adequately trained in
both biology and computer science
• Genome sequencing, microarrays, etc lead to large
amounts of data to be analyzed
• Leads to important discoveries
• Saves time and money
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
12
The Role of Computational Biology
Source: GenBank
GenBank BASEPAIR GROWTH
3.841
Millions
4.000
3.500
3.000
2.009
2.500
2.000
1.160
1.500
1.000
652
1
2 3
5
10
16
24
35
49
72
101 157
217
385
500
0
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
3D Structures
Growth:
Source:
http://www.rcsb.org/pdb/
holdings.html
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
13
Fighting Human Disease
• Genetic / Inherited
– Diabetes
• Viral
– Flu, common cold
• Bacterial
– Meningitis, Strep throat
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
14
Drug Development Life Cycle
Discovery
(2 to 10 Years)
Preclinical Testing
(Lab and Animal Testing)
Phase I
(20-30 Healthy Volunteers used to
check for safety and dosage)
Phase II
(100-300 Patient Volunteers used to
check for efficacy and side effects)
Phase III
(1000-5000 Patient Volunteers
used to monitor reactions to
long-term drug use)
$600-700 Million!
FDA Review
& Approval
Post-Marketing
Testing
Years
0
2
4
6
8
10
12
14
16
7 – 15 Years!
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
15
Drug lead screening
5,000 to 10,000
compounds screened
5 Drug Candidates
enter Clinical Testing;
250 Lead Candidates in
Preclinical
Testing
80% Pass Phase I
30%Pass Phase II
80% Pass Phase III
One drug approved by the FDA
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
16
What skills are needed?
• Well-grounded in one of the following areas:
– Computer science
– Molecular biology
– Statistics
• Working knowledge and appreciation in the
others!
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
17
Where Can I Learn More?
•
•
•
•
•
ISCB: http://www.iscb.org/
NBCI: http://ncbi.nlm.nih.gov/
http://www.bioinformatics.org/
Journals
Conferences
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
18
Overview of Molecular Biology
•
•
•
•
•
•
•
Cells
Chromosomes
DNA
RNA
Amino Acids
Proteins
Genome/Transcriptome/Proteome
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
19
Cells
• Complex system enclosed in a
membrane
Example Animal Cell
• Organisms are unicellular
(bacteria, baker’s yeast) or
multicellular
www.ebi.ac.uk/microarray/ biology_intro.htm
• Humans:
– 60 trillion cells
– 320 cell types
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
20
Organisms
• Classified into two types:
• Eukaryotes: contain a membrane-bound nucleus and
organelles (plants, animals, fungi,…)
• Prokaryotes: lack a true membrane-bound nucleus and
organelles (single-celled, includes bacteria)
• Not all single celled organisms are prokaryotes!
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
21
Chromosomes
• In eukaryotes, nucleus
contains one or several
double stranded DNA
molecules organized as
chromosomes
Human Karyotype
http://avery.rutgers.edu/WSSP/StudentScholars/
Session8/Session8.html
• Humans:
– 22 Pairs of autosomes
– 1 pair sex chromosomes
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
22
Chromosomes
Image source: www.biotec.or.th/Genome/whatGenome.html
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
23
DNA is the blueprint for life
• DNA: Deoxyribonucleic Acid
• Every cell in your body has 23
chromosomes in the nucleus
• The genes in these chromosomes
determine all of your physical
attributes.
• Single stranded molecule
(oligomer, polynucleotide) chain of
nucleotides
• 4 different nucleotides:
–
–
–
–
Adenosine (A)
Cytosine (C)
Guanine (G)
Thymine (T)
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
24
Mapping the Genome
• The human genome project has provided us
with a draft of the entire human genome.
• Four bases:
A, T, C, G
• 3.12 billion basepairs
• 99% of these are
the same
• Polymorphisms =
where they differ
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
25
Nucleotide Bases
• Purines (A and G)
• Pyrimidines (C and T)
• Difference is in base structure
Image Source: www.ebi.ac.uk/microarray/ biology_intro.htm
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
26
DNA
• Can be thought of as an alphabet with 4 characters
• 4 letter alphabet with sufficiently long words
contains information to create complex organisms
• Not unlike a computer with a small alphabet
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
27
DNA polynucleotides(oligomers)
• Different nucleotides are
strung together to form
polynucleotides
• Ends of the
polynucleotide are
different
• A directionality is present
• Convention is to label the
coding strand from 5’ to
3’
http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookDNAMOLGEN.html
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
28
Single Strand Polynucleotide
Example polynucleotide:
5’
GTAAAGTCCCGTTAGC
3’
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
29
Double Stranded DNA
• DNA can be single-stranded or double-stranded
• Double stranded DNA: second strand is the “reverse complement”
strand
• Reverse complement runs in opposite direction and bases are
complementary
• Complementary bases:
– A, T
– C, G
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
30
Double Stranded Sequence
Example double stranded polynucleotide:
5’ GTAAAGTCCCGTTAGC 3’
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
3’ CATTTCAGGGCAATCG 5’
http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookDNAMOLGEN.html
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
31
Double Stranded DNA
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
32
Double Helix
• Two complementary DNA strands form a stable DNA double
helix
• This spring marks the 50th anniversary of its discovery
Image source; www.ebi.ac.uk/microarray/ biology_intro.htm
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
33
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
34
How does the code work?
• Template for construction of proteins
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
35
Proteins: Molecular machinery
• Proteins in your muscles allows you to move:
myosin
and
actin
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
36
Proteins: Molecular machinery
• Enzymes
(digestion, catalysis)
• Structure (collagen)
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
37
Proteins: Molecular machinery
• Signaling
(hormones,
kinases)
• Transport
(energy,
oxygen)
Image source: Crane digital, http://www.cranedigital.com/
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
38
Example Case: HIV Protease
1. Exposure &
infection
2. HIV enters your cell
3. Your own cell reads
the HIV “code” and
creates the HIV
proteins.
4. New viral proteins
prepare HIV for
infection of other
cells.
© George Eade, Eade Creative Services, Inc.
http://whyfiles.org/035aids/index.html
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
39
HIV Protease & Inhibition
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
40
HIV Protease as a drug target
• Many drugs bind
to protein active
sites.
• This HIV protease
can no longer
prepare HIV
proteins for
infection, because
an inhibitor is
already bound in
its active site.
HIV Protease + Peptidyl inhibitor (1A8G.PDB)
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
41
Drug Discovery
• Target Identification
– What protein can we attack to stop the disease
from progressing?
• Lead discovery & optimization
– What sort of molecule will bind to this protein?
• Toxicology
– Does it kill the patient?
– Does it have side effects?
– Does it get to the problem spots?
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
42
Drug discovery: past & present
• Put some of the infectious agent into
thousands of tiny wells
• Add a known drug lead compound into each
well.
– Try nearly every drug lead known.
• See which ones kill the agent…
– Too small to see, so we have to use chemical
tests called assays
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
43
Finding drug leads
• Once we have a target, how do we find some
compounds that might bind to it?
• The old way: exhaustive screening
• The new way: computational screening!
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
44
Drug Lead Screening & Docking
?
• Complementarity
– Shape
– Chemical
– Electrostatic
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
45
Problems in Bioinformatcs
• Genomics
– Gene finding
– Annotation
• Sequence alignment and database search
– Functional genomics
• Microarray expression, “gene chips”
• Proteomics
– Structure prediction
• Comparative modeling
– Function prediction
• Structural bioinformatics
– Molecular docking, screening, etc.
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
46
RNA
• Ribonucleic Acid
• Similar to DNA
• Thymine (T) is replaced by uracil (U)
• RNA can be:
– Single stranded
– Double stranded
– Hybridized with DNA
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
47
RNA
• RNA is generally single stranded
• Forms secondary or tertiary structures
• RNA folding will be discussed later
• Important in a variety of ways, including protein
synthesis
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
48
RNA secondary structure
• E. coli Rnase P
RNA secondary
structure
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
49
mRNA
• Messenger RNA
• Linear molecule encoding genetic
information copied from DNA molecules
• Transcription: process in which DNA is
copied into an RNA molecule
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
50
mRNA processing
• Eukaryotic genes can be pieced together
– Exons: coding regions
– Introns: non-coding regions
• mRNA processing removes introns, splices exons
together
• Processed mRNA can be translated into a protein
sequence
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
51
mRNA Processing
Image source: http://departments.oxy.edu/biology/Stillman/bi221/111300/processing_of_hnrnas.htm
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
52
tRNA
• Transfer RNA
• Well-defined three-dimensional structure
• Critical for creation of proteins
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
53
tRNA structure
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
54
tRNA
• Amino acid attached to each tRNA
• Determined by 3 base anticodon sequence
(complementary to mRNA)
• Translation: process in which the nucleotide
sequence of the processed mRNA is used in order
to join amino acids together into a protein with the
help of ribosomes and tRNA
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
55
Genetic Code
•
•
•
•
•
•
•
4 possible bases (A, C, G, U)
3 bases in the codon
4 * 4 * 4 = 64 possible codon sequences
Start codon: AUG
Stop codons: UAA, UAG, UGA
61 codons to code for amino acids (AUG as well)
20 amino acids – redundancy in genetic code
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
56
20 Amino Acids
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Glycine (G, GLY)
Alanine (A, ALA)
Valine (V, VAL)
Leucine (L, LEU)
Isoleucine (I, ILE)
Phenylalanine (F, PHE)
Proline (P, PRO)
Serine (S, SER)
Threonine (T, THR)
Cysteine (C, CYS)
Methionine (M, MET)
Tryptophan (W, TRP)
Tyrosine (T, TYR)
Asparagine (N, ASN)
Glutamine (Q, GLN)
Aspartic acid (D, ASP)
Glutamic Acid (E, GLU)
Lysine (K, LYS)
Arginine (R, ARG)
Histidine (H, HIS)
START: AUG
STOP: UAA, UAG, UGA
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
57
Amino Acids
• building blocks for proteins (20 different)
• vary by side chain groups
• Hydrophilic amino acids are water soluable
• Hydrophobic are not
• Linked via a single chemical bond (peptide bond)
• Peptide: Short linear chain of amino acids (< 30) polypeptide: long
chain of amino acids (which can be upwards of 4000 residues long).
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
58
Proteins
• Polypeptides having a three dimensional structure.
• Primary–sequence of amino acids constituting the polypeptide chain
• Secondary–local organization into secondary structures such as 
helices and  sheets
• Tertiary –three dimensional arrangements of the amino acids as they
react to one another due to the polarity and resulting interactions between
their side chains
• Quaternary–number and relative positions of the protein subunits
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
59
Protein Structure
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
60
Central Dogma
DNA

RNA

PROTEIN
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
61
Central Dogma
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
62
What is a Gene?
• the physical and functional unit of heredity
that carries information from one generation
to the next
• DNA sequence necessary for the synthesis of
a functional protein or RNA molecule
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
63
Genome
• chromosomal DNA of an organism
• number of chromosomes and genome size
varies quite significantly from one organism
to another
• Genome size and number of genes does not
necessarily determine organism complexity
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
64
Genome Comparison
ORGANISM
CHROMOSOMES
GENOME SIZE
GENES
Homo sapiens
(Humans)
23
3,200,000,000
~ 30,000
Mus musculus
(Mouse)
20
2,600,000,000
~30,000
Drosophila
melanogaster
(Fruit Fly)
4
180,000,000
~18,000
Saccharomyces
cerevisiae (Yeast)
16
14,000,000
~6,000
Zea mays (Corn)
10
2,400,000,000
???
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
65
Transcriptome
• complete collection of all possible mRNAs
(including splice variants) of an organism.
• regions of an organism’s genome that get
transcribed into messenger RNA.
• transcriptome can be extended to include all
transcribed elements, including non-coding RNAs
used for structural and regulatory purposes.
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
66
Proteome
• the complete collection of proteins that can
be produced by an organism.
• can be studied either as static (sum of all
proteins possible) or dynamic (all proteins
found at a specific time point) entity
“INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN”
67