Download BioInformatics at FSU - whose job is it and why it needs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epitranscriptome wikipedia , lookup

Non-coding RNA wikipedia , lookup

Replisome wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

DNA barcoding wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Mutation wikipedia , lookup

RNA-Seq wikipedia , lookup

Molecular cloning wikipedia , lookup

Gene expression wikipedia , lookup

DNA supercoil wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Biosynthesis wikipedia , lookup

Community fingerprinting wikipedia , lookup

Genomic library wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Genetic code wikipedia , lookup

Gene wikipedia , lookup

Point mutation wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Genome evolution wikipedia , lookup

Non-coding DNA wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Transcript
DNA Properties
CSE, Marmara University
mimoza.marmara.edu.tr/~m.sakalli/cse546
Oct/19/09
Computational
Molecular
Biology
Bioinformatics
Genomics
Proteomics
Functional
genomics
Structural
bioinformatics
No simple definition of being alive!! (life)..
Reproducing itself, a default mechanism for every alive being
How about computer programs, crystals, and self building and self
learning robotics and computers..
Life on earth is a result of an evolutionary process, and idea is that all
living things have a common ancestor and are related through…
Basic components of evolution:
Inheritance
Variation: defined legal moves in genotype space.
Selection: a probabilistic evaluation function
In Computer Science: DNA is a string of symbols from alphabet
{A,C,G,T}
A search through a very large space of possible organism
characteristics.
And the words built from the four letter alphabet covers all the inherited
characteristics (called the genotype) of all the organisms.
The Central Dogma in molecular biology
http://proquestcombo.safaribooksonline.com/0596002998/blast-CHP-2
3 processes: Replication, Transcription, and Translation.
Every cell in our body has 23 chromosomes in the nucleus and the genes in these chromosomes are
responsible for almost all of the characteristics (not merely a physical).
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=mboc4.figgrp.600, by Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff,
Keith Roberts, and Peter Walter
Figure 4-5. The DNA double helix. (A) A space-filling model of 1.5 turns of the DNA double helix. Each turn of DNA is made up of 10.4
nucleotide pairs and the center-to-center distance between adjacent nucleotide pairs is 3.4 nm. The coiling of the two strands around
each other creates two grooves in the double helix. As indicated in the figure, the wider groove is called the major groove, and the smaller
the minor groove. (B) A short section of the double helix viewed from its side, showing four base pairs. The nucleotides are linked
together covalently by phosphodiester bonds through the 3 -hydroxyl (-OH) group of one sugar and the 5 -phosphate (P) of the next.
Thus, each polynucleotide strand has a chemical polarity; that is, its two ends are chemically different. The 3 end carries an unlinked -OH
group attached to the 3 position on the sugar ring; the 5 end carries a free phosphate group attached to the 5 position on the sugar ring.
DNA structure and base pairing
Polymer of:
Ribose sugar
Phosphate
Nitrogenous base
Bases
A, C, G, T
and Uracil
Pairing rule
A (R) — T (Y)
G (R) — C (Y)
PuRine, Pyrimidine
Why double-stranded!
Chemically and biophysically more stable!!, allows some error correction (backup)
if accidentally damaged—UV irradiation--.
RNA - Translation
Genes (less than 5% of all),
providing the coding information.
Instructions for protein synthesis,
regulatory functions..
Redundancy translates to
robustness!!
Synonymous codons
Dual strands
Diploid
In translation the information now
encoded in RNA is deciphered
(translated) into instructions for
making a protein.
Codon: Sets of three nucleotides.
Codon determines which amino
acid to be added next in the protein
chain.
For example, GCU, the first codon
in the figure, codes for alanine.
The table of the nucleotide triplets (codons) and their corresponding aa. a uracil (U) is substituted for a
thymine (T). This is Universal process..
The RNA alphabet is A, C, G, and U, GAAUUC
the third position of a codon is often insignificant
ATG: Start codon protein (methionine)
T in the middle hydrophobic aa.
64 possible codons but 20 total aa, start and stop kind of!!.. Or regulatory functions.
Second nt position, U, C, A, G
3rd nt position, U, C, A, G
1st nt position, U, C, A, G
SNP, single nucleotide polymorphism, wobbling in the code, neutral
synonymous mutations.
Some changes at every third of the DNA sequence, for example a point mutation such as
that shown below, will not yield any variation of the amino acid sequence and nor the
protein produced, for example alanine is produced in either case of a U to a C, therefore a
point mutation from U to C would make no difference.
GCUAGGAUCUCAGGCUCA
Point mutation
GCCAGGAUCUCAGGCUCA
Protein coding sequences are called exons.
The redundant parts are introns, intervening
DNA segments. Both introns and exons are
transcribed into mRNA (see next slide) but
only exons remain in the final transcript.
Frameshift of the sequence: 6 possible
reading. Therefore it is important to
know which codon to start translation
with, and where to stop.
http://en.wikipedia.org/wiki/Gene
A protein-coding region framed with
Met (ATG) and any stop codon is
(called an open reading frame). TAA,
TAG, or TGA. An example of an ORF.
Splicing of DNA to
eliminate introns
….TCGAATGGCATTCGCAGTC…………..T
ACTTGCACGCTTGACCGTCATAAGCA….
In addition, each of the 20 aa’s have
different chemical properties which
cause the protein chains to form
different 3D shapes, and differentiate
their particular functions in the cell.
For example, certain folding patterns
(called tertiary structures) make it
possible for specific enzymes to bind
in a particular place. One change in
the DNA sequence could change the
amino acid, which could change the
protein structure…. And the
enzymes..
A Science Primer http://www.ncbi.nlm.nih.gov/About/primer/est.html
Levels and types of genome variations
Plant genomes may differ from one another in different ways:
http://www.igd.cornell.edu/Comparative%20Genomics
1.
2.
3.
Amount of DNA in the nucleus. Quantified in picogrms, (also called C-value),
varies over 1000-fold.
Number and size of chromosomes.
Differences at the sequence level, both in the |absolute order| of the bases, and in
the type and number of different classes of sequences.
Organisms originated millions of years ago, from the same sequence should be sharing
the same sequential structures, family-tree, phylogeny.
Some of the mechanisms of genetic variations:

Point mutations

Insertions and deletions

Translocations


Transposons, (mobile) jumping genes, retrotransposons copying
themselves from RNA back to DNA – reverse transcriptase,
Splicing, transcription and translation errors
DNA: contains non-genic material
RNA: unstable
cDNA: stable and mainly genes
Finding genes: cDNA: The genetic sequence could
be analyzed from the DNA, but it has too much non-genetic
junk materials, jut studying mRNA, however, mRNA and
protein are very unstable and therefore difficult to work
with.
Instead, scientists use special enzymes to convert RNA into
complementary DNA (cDNA) which is a much more
stable compound and because it was generated from a
mRNA in which the introns have been removed, cDNA
represents only transcribed DNA sequence, the genes.
Genetic Mapping: Used for linkage mapping, and uses the
concepts of Mendelian inheritance and recombination
frequencies to determine the chromosomal location by
analyzing their inherited patterns. Done by either Southern
blot (electrophoresis separated fragments subsequently
detected by probe hybridization) and, more recently
polymerase chain reaction - PCR (using thermal cycling)
based methods.
A tomato F2 population used to calculate recombination
frequencies, and genetic distances, between a selection of
SSRs simple sequence repeat (microsatellites) SSRs and
other molecular markers.
Comparative mapping: Among related but
sexually incompatible species, heterologous
(between species) DNA markers can be used to
generate comparative maps and to infer linkage
conservation and the position of orthologous (if
branched from the homologous) loci. This
requires a minimal amount of similarity between
the target and probe species and so cannot be
used with more distantly related species. Most
gramineae genomes (i.e. grass species, maize,
rice, wheat, barley, millet, etc) are connected
through comparative genetic maps. While
genome size varies dramatically among grass
species, but the gene content and gene order
remain more highly conserved..
Packing of DNA
in the nucleus
http://employees.csbsju.edu/hjakubowski/classes/ch331/DNA/oldnastructure.html
Archebacterium living in a
superheated sulphur vent at the
bottom of the ocean
A two-ton polar bear roaming the
arctic circle
Genome size (length of DNA)
varies from 5,000 (SV40 virus)
to 3*109 (humans) 1011 (higher
plants)
All organism share basic properties
Made of cells (membrane-enclosed
sacks of chemicals)
Carry basic reactions (e.g. core
metabolic and developmental
pathways)
Figure 1-38. Genome sizes compared. Genome size is measured in nucleotide pairs of DNA per haploid genome, that is, per single copy of the
genome. (The cells of sexually reproducing organisms such as ourselves are generally diploid: they contain two copies of the genome, one inherited
from the mother, the other from the father.) Closely related organisms can vary widely in the quantity of DNA in their genomes, even though they
contain similar numbers of functionally distinct genes. (Data from W.-H. Li, Molecular Evolution, pp. 380 383. Sunderland, MA: Sinauer, 1997.)
Tree of Life
Three major groups:
Archaea (recently discovered)
Bacteria (germs, algae, symbiotic organisms)
Eukaryotes
Animals
Green Plants
Fungi
Protists
Viruses
Figure 1-21. The three major divisions (domains) of the living world. Note that traditionally the word bacteria has been used to refer to
procaryotes in general, but more recently has been redefined to refer to eubacteria specifically. Where there might be ambiguity, we use the term
eubacteria when the narrow meaning is intended. The tree is based on comparisons of the nucleotide sequence of a ribosomal RNA subunit in the
different species. The lengths of the lines represent the numbers of evolutionary changes that have occurred in this molecule in each lineage.