Download Document

Document related concepts

Bottromycin wikipedia , lookup

Messenger RNA wikipedia , lookup

RNA silencing wikipedia , lookup

Protein moonlighting wikipedia , lookup

Cell-penetrating peptide wikipedia , lookup

Transcriptional regulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Western blot wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Protein wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Silencer (genetics) wikipedia , lookup

List of types of proteins wikipedia , lookup

Cyclol wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Molecular evolution wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Expanded genetic code wikipedia , lookup

Epitranscriptome wikipedia , lookup

Metalloprotein wikipedia , lookup

Protein adsorption wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Point mutation wikipedia , lookup

Non-coding RNA wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Genetic code wikipedia , lookup

Gene expression wikipedia , lookup

Biochemistry wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
Machine Learning
& Bioinformatics
Tien-Hao Chang (Darby Chang)
Machine Learning & Bioinformatics
1
Molecular biology


Nucleic acid

Protein
– DNA
– Amino acid
– RNA
– Primary structure
Central dogma
– Transcription
– Secondary structure
– Tertiary structure
– Translation
Machine Learning & Bioinformatics
2
Nucleic acid

A nucleic acid is a macromolecule composed
of chains of monomeric nucleotide

In biochemistry these molecules carry genetic
information or form structures within cells

The most common nucleic acids are
deoxyribonucleic acid (DNA) and ribonucleic
acid (RNA)
Machine Learning & Bioinformatics
3
http://juang.bst.ntu.edu.tw/BC2008/images/NA%20Fig1.jpg
Nucleic acid components
Sugar
http://www.mun.ca/biology/scarr/Fg10_09b_revised.gif
Machine Learning & Bioinformatics
5
Nucleic acid components
Base

Purine
– Adenine (A) and guanine (G)

Pyrimidine
– Thymine (T), cytosine (C)
– Uracil (U, only in RNA)
Machine Learning & Bioinformatics
6
http://www.elmhurst.edu/~chm/vchembook/images/580bases.gif
http://fig.cox.miami.edu/~cmallery/150/chemistry/sf3x14a.jpg
DNA

Chemically, DNA is a long polymer of simple units
called nucleotides, with a backbone made of sugars and
phosphate groups joined by ester bonds

Attached to each sugar is one
of four types of molecules
called bases

It is the sequence of these four
bases along the backbone that
encodes information
http://upload.wikimedia.org/wikipedia/commons/8/87/DNA_orbit_animated_small.gif
Machine Learning & Bioinformatics
9
DNA
Base pairing



Each type of base on one strand forms a bond
with just one type of base on the other strand
Here, purines form hydrogen bonds to
pyrimidines, with A bonding only to T, and C
bonding only to G
DNA sequence
– 5’CpGpCpApApTpT
3’TpTpApApCpGpC
– CGCGAATT
Machine Learning & Bioinformatics
10
http://www.ucl.ac.uk/~sjjgsca/NucleotidePairing.jpg
Double helix
http://www.coe.drexel.edu/ret/personalsites/2005/dayal/curriculum1_files/image001.jpg
Hydrogen bond



A hydrogen bond exists between an electronegative atom
and a hydrogen atom bonded to another electronegative
atom
This type of force always involves a hydrogen atom and the
energy of this attraction is close to that of weak covalent
bonds (155 kJ/mol), thus the name – Hydrogen Bonding
Biological functions
–
–
–
–
DNA/RNA base paring
protein secondary/tertiary structure formation
some properties of water molecule
antibody-antigen (and other protein-protein) binding
Machine Learning & Bioinformatics
13
Hydrogen bond is resulted
from electronegativity
http://upload.wikimedia.org/wikipedia/commons/4/43/Liquid_water_hydrogen_bond.png
Grooves
http://courses.biology.utah.edu/horvath/biol.3525/1_DNA/Fig2/marty_1.jpg
DNA structure
http://www.youtube.com/watch?v=qy8d
k5iS1f0&NR=1
Machine Learning & Bioinformatics
16
Any Questions?
About DNA
Machine Learning & Bioinformatics
17
Central dogma
http://fig.cox.miami.edu/~cmallery/255/255hist/mcb4.1.dogma.jpg
Central dogma

The process by witch information is extracted
from the nucleotide sequence of a gene and then
used to make a protein is essentially the same for
all living things on Earth
and is described by the grandly
named central dogma of
molecular biology

Information in cells passes from
DNA to RNA to proteins
http://upload.wikimedia.org/wikipedia/commons/3/3a/Crick's_1958_central_dogma.svg
Machine Learning & Bioinformatics
19
RNA


Information stored from DNA is used to make a more
transient, single-stranded polynucleotide called RNA
(Ribonucleic Acid)
RNA is very similar to DNA, but differs in a few
important structural details
– in the cell RNA is usually single stranded, while DNA is
usually double stranded
– RNA nucleotides contain ribose while DNA contains
deoxyribose (a type of ribose that lacks one oxygen atom)
– in RNA the nucleotide uracil substitutes for thymine, which
is present in DNA
Machine Learning & Bioinformatics
20
http://www.dadamo.com/wiki/dna-rna.png
Central dogma
Transcription

Transcription is the synthesis of RNA under
the direction of DNA

Both nucleic acid sequences use the same
language, and the information is simply
transcribed, or copied

DNA sequence is copied by RNA polymerase
to produce a complementary nucleotide RNA
strand, called messenger RNA (mRNA)
Machine Learning & Bioinformatics
22
DNA transcription
http://www.youtube.com/watch?v=vJSm
Z3DsntU
Machine Learning & Bioinformatics
23
Transcription detail
http://wwwclass.unl.edu/biochem/gp2/m_biology/an
imation/m_animations/gene2.swf
Machine Learning & Bioinformatics
24
RNA
Various types

mRNA
– messenger RNA (mRNA) is the RNA that carries
information from DNA to the ribosome
– the coding sequence of the mRNA determines the
amino acid sequence in the protein that is produced

Non-coding RNA
Machine Learning & Bioinformatics
25
Various RNA types
Non-coding RNA

Many RNAs do not code for protein

These ncRNAs encode in specific genes (RNA
genes) or mRNA introns

The most common ncRNAs are transfer RNA
(tRNA) and ribosomal RNA (rRNA)

Other ncRNAs such as microRNA (miRNA)
involve in post-transcriptional gene regulation
Machine Learning & Bioinformatics
26
http://eurheartj.oxfordjournals.org/content/vol0/issue2010/images/large/ehp57301.jpeg
Central dogma
Translation

Translation is the second stage of protein
biosynthesis

Translation occurs in the cytoplasm where the
ribosomes are located

In translation, mRNA is decoded to produce a
specific polypeptide according to the rules
specified by the genetic code
Machine Learning & Bioinformatics
28
From RNA to protein
synthesis
http://www.youtube.com/watch?v=NJxob
gkPEAo
Machine Learning & Bioinformatics
29
Protein translation
http://www.youtube.com/watch?v=nl8pS
lonmA0
Machine Learning & Bioinformatics
30
http://biology.kenyon.edu/courses/biol114/Chap05/code.gif
Any Questions?
About central dogma
Machine Learning & Bioinformatics
32
Protein
Machine Learning & Bioinformatics
33
Protein

Proteins are large organic compounds made of amino
acids arranged in a linear chain and joined together by
peptide bonds between the carboxyl and amino
groups of adjacent amino acid residues

Proteins can also work together to achieve a
particular function, and they often associate to form
stable complexes
Machine Learning & Bioinformatics
34
Protein
Amino acid

In chemistry, an amino acid is a molecule that
contains both amine and carboxyl functional
groups

In biochemistry, this term refers to alphaamino acids with the general formula
H2NCHRCOOH, where R is an organic
substituent
Machine Learning & Bioinformatics
35
http://upload.wikimedia.org/wikipedia/commons/thumb/c/ce/AminoAcidball.svg/702px-AminoAcidball.svg.png
Amino acid
Various side chains

The various alpha amino acids differ in which
side chain (R group) is attached to their alpha
carbon

They can vary in size from just a hydrogen
atom in glycine through a methyl group in
alanine to a large heterocyclic group in
tryptophan
Machine Learning & Bioinformatics
37
http://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Aa.svg/2000px-Aa.svg.png
http://juang.bst.ntu.edu.tw/BC2008/images/Amino%281%29%202007/A1-7.JPG
http://juang.bst.ntu.edu.tw/BC2008/images/Amino%281%29%202007/A1-9.JPG
http://www.russell.embl-heidelberg.de/aas/other_images/lb3.gif
Machine Learning & Bioinformatics
41
Amino acid
The building blocks of proteins

Amino acids combine in a condensation
reaction and the new “amino acid residue” are
held together by peptide bonds

Proteins are defined by their unique sequence
of residues (primary structure)

As the letters form various words, amino acids
form a vast variety of sequences/proteins
Machine Learning & Bioinformatics
42
http://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Peptidformationball.svg/2000px-Peptidformationball.svg.png
http://juang.bst.ntu.edu.tw/BC2008/images/Amino(1)%202007/A1-11.JPG
http://juang.bst.ntu.edu.tw/BC2008/images/Amino(1)%202007/A1-13.JPG
Protein
After knowing amino acids

Amino acids form short polymer chains called
peptides or longer chains called either
polypeptides or proteins

The process of such formation from an mRNA
template (obeying genetic code) is known as
translation, which is part of protein
biosynthesis
Machine Learning & Bioinformatics
46
Protein structure hierarchy
Machine Learning & Bioinformatics
47
http://cropandsoil.oregonstate.edu/classes/css430/lecture%209-07/figure-09-03.JPG
http://juang.bst.ntu.edu.tw/BC2008/images/Protein(1)%202007/P1-4.JPG
50
http://juang.bst.ntu.edu.tw/BC2008/images/Protein(1)%202007/P1-8.JPG
http://juang.bst.ntu.edu.tw/BC2008/images/Protein(1)%202007/P1-9.JPG
Protein structure hierarchy
Secondary structure

In biochemistry and structural biology,
secondary structure is the general threedimensional form of local segments of
biopolymers such as proteins and nucleic acids

It does not, however, describe specific atomic
positions in three-dimensional space, which
are considered to be tertiary structure
Machine Learning & Bioinformatics
52
http://juang.bst.ntu.edu.tw/BC2008/images/Protein(2)%202007/P2-3.JPG
Protein structure hierarchy
Tertiary structure

The three-dimensional structure of a protein or
any other macromolecule, as defined by the
atomic coordinates

Describe the spatial relations among it
secondary structures

Tertiary structure is considered to be largely
determined by the protein’s primary sequence
Machine Learning & Bioinformatics
54
Protein tertiary structure
Experiment techniques

The majority of protein structures have been
solved with X-ray crystallography

The second common way is NMR (Nuclear
Magnetic Resonance)
– lower resolution
– limited to small proteins
– provide time-dependent information in solution
Machine Learning & Bioinformatics
55
http://campusapps.fullerton.edu/news/arts/2003/photos/protein-art.jpg
Protein structure hierarchy
Quaternary structure

Many proteins are actually
assemblies of more than one
polypeptide chain, which in the
context of the larger assemblage
are known as protein subunits

In addition to the tertiary structure
of the subunits, multiple-subunit
proteins possess a quaternary
structure, which is the arrangement
into which the subunits assemble
http://courses.cm.utexas.edu/jrobertus/ch339k/overheads-1/ch6_quat-struct1.jpg
Machine Learning & Bioinformatics
57
Protein sub-structure
Machine Learning & Bioinformatics
58
Protein sub-structure
Domain

A part of protein sequence
and structure that can
evolve, function, and exist
independently

About 25–500 aa

Often form functional
units
http://upload.wikimedia.org/wikipedia/commons/6/67/1pkn.png
Machine Learning & Bioinformatics
59
Zinc fingers are
small protein
structural motifs
that can coordinate
zinc ions to help
stabilize their
folds
http://upload.wikimedia.org/wikipedia/commons/7/79/Zinc_finger_DNA_complex.png
Protein sub-structure
Motif

A sequence motif indicate a nucleotide or
amino-acid sequence pattern that is widespread
and often has a biological significance

For proteins, a sequence motif is distinguished
from a structural motif, a motif formed by the
three dimensional arrangement of amino acids,
which may not be adjacent
Machine Learning & Bioinformatics
61
Protein sub-structure
Structure motif

A 3D structural element or fold, which appears
also in a variety of other molecules

In the context of proteins, the term is
sometimes used interchangeably with
“structure domain,” although a domain need
not be a motif nor, if it contains a motif, need
not be made up of only one
Machine Learning & Bioinformatics
62
http://www.biomedcentral.com/content/figures/1471-2164-8-60-8.jpg
http://juang.bst.ntu.edu.tw/BC2008/images/Protein(1)%202007/P1-3.JPG
Molecular biology
Reference

台大莊榮輝教授網站
– http://juang.bst.ntu.edu.tw/BC2008/index.htm

交大分子生物學網站
– http://www.life.nctu.edu.tw/~mb/c40101.htm
Machine Learning & Bioinformatics
66
Any Questions?
About molecular biology
Machine Learning & Bioinformatics
67