Download Introduction - Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA replication wikipedia , lookup

DNA polymerase wikipedia , lookup

Microsatellite wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Helicase wikipedia , lookup

DNA nanotechnology wikipedia , lookup

Replisome wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Introduction to Bioinformatics
Yana Kortsarts
References: An Introduction to Bioinformatics
Algorithms
bioalgorithms.info
What is Bioinformatics?

Bioinformatics is a relatively new
interdisciplinary field that integrates computer
science, mathematics, biology, and information
technology to manage, analyze, and understand
biological, biochemical and biophysical
information.

Bioinformatics is a computational science and
the subset of larger field of Computational
Biology.
What is Bioinformatics?




Bioinformatics is the use of computers to study
biology
Bioinformatics is the science of using
information to understand biology
Bioinformatics is integration of information
technology (IT) and biology
Bioinformatics is the development of
computational methods for studying structure,
function and evolution of genes, proteins and
whole genomes
Course Curriculum


Ethics, Computing and Genomics
Review of Molecular Biology and
Biochemistry Concepts




DNA and protein structure
Gene expression (transcription and translation)
Molecular Biology Central Dogma
Biological Research on the Web


Public Biological Databases and Data Formats
Searching Biological Databases
Course Curriculum

Introduction to Bioinformatics Algorithms








Sequence alignments, scoring, gaps
Algorithm Design Techniques: Exhaustive Search,
Dynamic Programming
The Needleman and Wunsch Algorithm
The Smith-Waterman Algorithm
Introduction to BLAST
Multiple Sequence Alignment
Phylogenetic Trees
Introduction to Python and Biopython in UNIX
environment
Some Terminology
Cell is a primary unit of life
 Cell consists of molecules, chemical
reactions and a copy of the genome
for that organism
 All life on this planet depends on
three types of molecules: DNA,
RNA and proteins

Some Terminology



DNA
 Holds information on how cell works
RNA
 Acts to transfer short pieces of information to
different parts of cell
 Provide templates to synthesize into protein
Proteins
 Form enzymes that send signals to other cells
and regulate gene activity
 Form body’s major components (e.g. hair, skin,
etc.)
DNA - Deoxyribonucleic Acid



Genetic material
Consists of two long strands
Each strand is made of:
 Phosphates
 Sugar
 Nucleotides
 A (adenine)
 G (guanine)
 C ( cytosine)
 T (thymine)
DNA – Double Helix Structure
Discovery of DNA


DNA Sequences
 Chargaff and Vischer, 1949
 DNA consisting of A, T, G, C
 Adenine, Guanine, Cytosine, Thymine
 Chargaff Rule
 Noticing #A#T and #G#C
 A “strange but possibly meaningless”
phenomenon.
Wow!! A Double Helix
 Watson and Crick, Nature, April 25, 1953


1 Biologist
1 Physics Ph.D. Student
900 words
Nobel Prize
Rich, 1973
 Structural biologist at MIT.
 DNA’s structure in atomic resolution.
Crick
Watson
Watson & Crick – “…the secret of life”




Watson: a zoologist, Crick: a physicist
“In 1947 Crick knew no biology and
practically no organic chemistry or
crystallography..” – www.nobel.se
Applying Chagraff’s rules and the X-ray
image from Rosalind Franklin, they
constructed a “tinkertoy” model showing
the double helix
Watson & Crick with DNA model
Their 1953 Nature paper: “It has not
escaped our notice that the specific pairing
we have postulated immediately suggests
a possible copying mechanism for the
genetic material.”
Rosalind Franklin with X-ray image of DNA
DNA: The Basis of Life

Deoxyribonucleic Acid (DNA)


Double stranded with complementary strands A-T, C-G
DNA is a polymer


Sugar-Phosphate-Base
Bases held together by H bonding to the opposite strand
DNA, continued
Sugar
Phosphate
Base (A,T, C or G)
http://www.bio.miami.edu/dana/104/DNA2.jpg
DNA, continued
DNA has a double helix structure. However,
it is not symmetric. It has a “forward” and
“backward” direction. The ends are labeled 5’
and 3’ after the Carbon atoms in the sugar
component.
5’ AATCGCAAT 3’
3’ TTAGCGTTA 5’
DNA always reads 5’ to 3’ for transcription
replication

Double helix of DNA
The Central Dogma of Molecular
Biology
transcription
DNA

translation
RNA
Protein
Information has been transferred from DNA
(information storage molecule) to RNA
(information transfer molecule) to a specific
protein (a functional, non-coding product)
DNA, RNA, and the Flow of Information
Replication
Transcription
Translation



Central Dogma
(DNARNAprotein)
The paradigm that DNA
directs its transcription
to RNA, which is then
translated into a protein.
Transcription
(DNARNA) The
process which transfers
genetic information from
the DNA to the RNA.
Translation
(RNAprotein) The
process of transforming
RNA to protein as
specified by the genetic
code.
RNA


RNA is similar to DNA chemically. It is usually only
a single strand. T(hyamine) is replaced by U(racil)
Some forms of RNA can form secondary structures
by “pairing up” with itself. This can have change its
properties
dramatically.
DNA and RNA
can pair with
each other.
tRNA linear and 3D view:
http://www.cgl.ucsf.edu/home/glasfeld/tutorial/trna/trna.gif
More Terminology

Transcription of DNA




DNA transcribed into RNA
RNA exits as a single-strand unit and as a double-helix
as well
RNA consist of A, C, G and U (uracil)
Types of RNA



Messenger RNA – mRNA
Transfer RNA – tRNA
Ribosomal RNA – rRNA
More Terminology

Translation of Messenger RNA (mRNA):


Proteins:


mRNA is translated into protein
linear polymers built from amino acids
The transfer of information from DNA to specific
protein via RNA takes place according to the
genetic code.



The RNA sequence is divided into blocks of three
letters
This block is called CODON
Each codon corresponds to the specific amino acid
More Terminology





Four different nucleotides are used to build DNA
and RNA molecules – A, G, C, T and A, G, C, U
20 different amino acids are used in protein
synthesis
Four nucleotides can be arranged in 64 different
combinations of three.
There are 64 = 4*4*4 different codons
Some codons are redundant and some have
special function – to terminate the translation
process
Translation



The process of going
from RNA to
polypeptide.
Three base pairs of
RNA (called a codon)
correspond to one
amino acid based on a
fixed table.
Always starts with
Methionine and ends
with a stop codon
Cell Information: Instruction book of Life


DNA, RNA, and
Proteins are examples
of strings written in
either the four-letter
nucleotide of DNA and
RNA (A C G T/U)
or the twenty-letter
amino acid of proteins.
Each amino acid is
coded by 3 nucleotides
called codon. (Leu, Arg,
Met, etc.)
Protein Synthesis: Summary

There are twenty amino
acids, each coded by threebase-sequences in DNA,
called “codons”


The central dogma
describes how proteins
derive from DNA


This code is degenerate
DNA  mRNA  (splicing?)
 protein
The protein adopts a 3D
structure specific to it’s amino
acid arrangement and
function
Proteins



Complex organic molecules made up of amino acid
subunits
20 different kinds of amino acids. Each has a 1 and
3 letter abbreviation.
http://www.ncbi.nlm.nih.gov/Class/MLACours
e/Modules/MolBioReview/iupac_aa_abbreviat
ions.html
Proteins are often enzymes that catalyze reactions.
 Also called “poly-peptides”
*Some other amino acids exist but not in humans.