Download Genome

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

JADE1 wikipedia , lookup

List of types of proteins wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Genomics: The Technology
behind the Human
Genome Project
Shu-Ping Lin, Ph.D.
Institute of Biomedical Engineering
E-mail: [email protected]
Website: http://web.nchu.edu.tw/pweb/users/splin/
O.J. Simpson capital murder case,1/95-9/95
Odds of blood in Ford Bronco not being R. Goldman’s:

6.5 billion to 1

Odds of blood on socks in bedroom not being N. Brown-Simpson’s:

8.5 billion to 1
Odds of blood on glove not being from R. Goldman, N. Brown-Simpson, and O.J.
Simpson:

21.5 billion to 1

Number of people on planet earth:

6.1 billion

Odds of being struck by lightning in the U.S.:

2.8 million to 1

Odds of winning the Illinois Big Game lottery:

76 million to 1

Odds of getting killed driving to the gas station to buy a lottery ticket

4.5 million to 1

Odds of seeing 3 albino deer at the same time:

85 million to 1

Odds of having quintuplets:

85 million to 1

Odds of being struck by a meteorite:

10 trillion to 1


DNA Technology and Genome




Genome: collection of DNA molecules that carries hereditary
information of organism
Genomics: study of sequence, content, and history of the genome
Random mutations in sequence of DNA and DNA duplication
play a significant role in the evolution of genomes.
Sequence similarity: in comparison of genomes and in
constructing a tree of life indicating kinship relationships among
species
Technology of Genome Sequencing
1. Restriction enzymes are used to make recombinant DNA: Enzymatic
techniques for cutting DNA into pieces and combining DNA from different
sources (Recombinant DNA)
2. Separating DNA fragments according to size (Electrophoresis)
3. Making copies of DNA using cells’ machinery for DNA replication (Cloning)
4. Polymerase chain reaction (PCR) amplify DNA directly in-vitro
 Causing genomic perturbations by gene mutation, insertion, and
deletion
Tree of Life




Constructing a timeline (history) and kinship for millions of life forms
on Earth  In Darwin’s book Origin of Species is an evolutionary
tree, describing hierarchical structure relating species to their most
recent common ancestors  More recently, gene-sequence
similarity as a criterion to uncover hierarchical kinship relations
between organisms.  Tree divides organisms into 3 domians:
bacteria, archaea, and eukaryota.
Such genes have similar, but often not identical, nucleotide
sequences in different organisms.
Family trees of organisms that are based on gene similarity are
called gene trees  Based on sequence similarity analysis of
gene for 16S ribosomal RNA, which specifies component of
machinery that translates nucleotide sequence of gene into protein
RNA could function as template for both DNA (reverse transcription)
and protein (translation), and it has an enzymatic role in
fundamental cellular activities.

Universal ancestor cells as the trunk and descendents as branches and
leaves (present day life forms) – Working back to ancestor using sequence of DNA
letters is similar to construction of words of ancient mother tongues by linguists.



Tree construction based on molecular similarity was dependent on
molecule or molecule clusters
In animals, genes are passed vertically from parent to child
Gene sequence similarities suggest both archaea and eukaryotes have
acquired metabolic genes from bacteria in lateral gene transfer called
horizontal transfer
All life
Viruses
Protists
Archaea
Fungi
Bacteria
Eukaryotes
Green Pants
Invertebrates
Fish
Monotremata
Reptiles
Animals
Vertebrates
Amphibians
Marsupials
Woese et al. (1990)
Birds
…
Mammals
Primates
Archaea



Have no nuclei or other organelles
Bacteria and archaea together form prokaryotes.
Include microbial species growing at 95℃ in the
highly acidic conditions found in hot sulfur beds –
Extreme resistance of the enzymes produced by
archaea to heat and acid make them highly
attractive to biotechnology companies for their
potential use in industry
Bacteria






Ubiquitous single-celled organisms (millions
everywhere)
Their membranes are made of material typically
different than the ones in eukaryotes
Have no nuclei or other organelles
Almost all they do is make more bacteria
Include disease causing germs and symbiotic
organisms
Escherichia coli (E. coli) is a bacterium that lives in
human intestines and is required for normal
digestion

Well-studied and easy to grow
Viruses

Obligatory parasites




They rely on the biochemical machinery of their
host cell to survive and reproduce
Consist of just a small amount of genetic
material surrounded by a protein coat
A small virus can have as few as 5000
elements in its genetic material.
Actively studied because of their


simplicity
role in human disease
Eukaryotes

Eukaryotes have cells that contain:





DNA organization
Nuclei: a specialized area in the cell that holds the genetic
material
Other organelles (specialized cellular areas):
mitochondria where respiration takes place, chloroplasts (in
plants) capture energy from sunlight
Cytoskeleton: genes for cytoskeletal proteins such as
actin, myosin, and microtubules
All multicellular organisms (e.g., people, mosquitoes,
maple trees) are eukaryotes as are many singlecelled organisms (e.g. yeasts)
Living Parts





Tissues, cells, compartments, and organelles
Groups of cells specializing in a particular function
are tissues and their cells are said to be
differentiated
Once differentiated, a cell cannot change from one
type to another
Yet, all cells of an organism have exactly the same
genetic code
Differences come from differences in gene
expression, that is whether or not the product a
gene codes for is produced and how much is
produced
Mutation




Due to imperfections in replication, repair, and quality-control processes 
Any change in base sequence of gene or noncoding DNA segment
External environmental factors: radiation and chemical insult
Point mutation: a change affecting a single nucleotide in gene; may cause
corresponding change in amino-acid sequence of protein that gene
produces, Ex: GGC(CCG,Proline,P) to CGC (GCG,Alanine,A); others does
not affect amino-acid sequence of protein, Ex: TGC(ACG) to TGT(ACA)
both correspond to Threonine(T); occur much more frequently in
noncoding regions of genome because gene mutations leading to
nonfunctional proteins do not survive the forces of evolution
Rearrangement mutation affects large region of DNA: insertions of
additional material, order of codons shifted, or deletions of gene; occur in
the sequence of gene may prevent gene expression or result in gene
product that is unrecognized by cell, mutated genes may survive and
contribute to diversity of species
Sequence Similarity



Sequence shared among the pairs with the highest S value is the predicted
ancestor sequence GTAATCG, 2 sequences with S=13 are said to be
homologous sequences to the ancestor sequence
Global alignment of pairs of genes (or proteins): alignment throughout
their lengths where all bases are aligned with another base or a gap
Local alignment: does not need to align all the bases in all sequences,
Provide information on sequence motifs of proteins found at the sites of
interaction with other proteins
Similarity Index
+3 for each matching base, -5 for mismatch, -6 as gap
opening
Homework
S=-13
S=-21
S=??
+1 for each matching base, -1 for mismatch, -5 as gap
opening
GTAACTGCTAGA_ _;
GTAC_ _GC_ _GTCG.
Probability of Sequence Similarity

Nucleotide sequences of length r:

Nucleotide sequences of length r, random sequence of length m:

E-value means expectation value

The E-value is the measure most commonly used for estimating sequence similarity

How many times is a match at least as good expected to happen by chance ?



This estimate is based on the similarity measure
If a match is highly unexpected, it probably results from something other than chance

Common origin is the most likely explanation

This is how homology is inferred
Low E-value  good hit

1
= bad e-Value

10e-3 = borderline E-value

10e-4 = good E-value

10e-10 = very good E-value

E-values lower than 10e-4 indicate possible homology

E-values higher than 10e-4 require extra evidence to support homology
Comparison of Amino-Acid of Proteins



Also yields information about their origin and evolution
Most prevalent replacements occur between amino acids with
similar side chains such as substitution within the following
groups: (G, A); (A, S); (S, T); (I, V, L)  Amino acids
replacements involving chemically similar amino acids appear
more often on DNA than expected for random mutations.
Tryptophan is not chemically similar to any of the other 19
amino acids found in proteins, and its presence is largely
conserved during evolution.
Defining Homolog, Ortholog and Paralog



Homolog: A gene related to a second gene by descent from a common
ancestral DNA sequence. The term "homolog" may apply to the
relationship between genes separated by speciation (ortholog), or to the
relationship between genes originating via genetic duplication (paralog).
Ortholog: Orthologs are genes in different species that have evolved from
a common ancestral gene via speciation. Orthologs often (but certainly not
always) retain the same function(s) in the course of evolution. Thus,
functions may be lost or gained when comparing a pair of orthologs.
Paralog: Paralogs are genes produced via gene duplication within a
genome. Paralogs typically evolve new functions or else eventually become
pseudogenes.
http://lh6.ggpht.com/_Z6TlOmziVoM/SS69ycCKcC
I/AAAAAAAAGUA/QtGd2QwekE/s720/%5BUNSET%5D.png
Things You Share with -Mouse
http://www.youtube.com/watch?v=VhgSReb4RR
Y&NR=1
 Zebra fish
http://www.youtube.com/watch?v=DF5CG_p1TC
w
 Fruit fly
http://www.youtube.com/watch?v=mw5SPcEc5Q
