Download IntroductionV

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA repair protein XRCC4 wikipedia , lookup

DNA replication wikipedia , lookup

DNA profiling wikipedia , lookup

DNA sequencing wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

DNA polymerase wikipedia , lookup

Replisome wikipedia , lookup

DNA nanotechnology wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
DNA alphabet
• DNA is the principal constituent of the genome. It
may be regarded as a complex set of instructions
for creating an organism.
• Four
different
bases
(nucleic
acid
bases/nucleotides) appear in DNA – adenine (A),
guanine (G), cytosine (C), thymine (T)
• Rule for basepairs (bp): A  T, T  A ,
G  C, C  G (four bp configuration). Each
comprises a single piece of information in the DNA
molecule (for the creation of amino acid)
DNA – double helix
The DNA molecule can be reconstructed from just one of the 2 strands.
CODONS
• The basic unit of the genetic code is the DNA bp. The
human gene can range in size from thousands to hundreds of
thousands of bps.
• Human DNA comprises of approximately 3 billion bps
(Human Genome Project – effort to decode all of the 3
billion nucleotide base pairs)
• Three DNA bps combine to form a codon which codes for
the production of an amino acid (low-level instruction), for
example, AGA represents A T, G C, A T.
• Sequences of codons code for the assembly of amino acids
into RNA, polypeptides, proteins, or functional RNA.
• The products so formed mediate the growth and
development of the organism.
DNA SEQUENCE
• A DNA sequence is a succession of letters representing the structure
of a DNA molecule or strand. The possible letters are A, C, G, and T,
representing the four nucleotide subunits of a DNA strand (adenine,
cytosine, guanine, thymine), and typically these are printed abutting
one another without gaps, as in the sequence AAAGTCTGAC. This
coded sequence is sometimes referred to as genetic information. A
succession of any number of nucleotides greater than four is liable to
be called a sequence.
• In genetics terminology, DNA sequencing is the process of
determining the nucleotide order of a given DNA fragment.
• The sequence of DNA encodes the necessary information for living
things to survive and reproduce. Determining the sequence is therefore
useful in 'pure' research into why and how organisms live, as well as
in applied subjects.
String Searching Algorithms
• A string of nucleotides is called DNA or RNA.
• String searching algorithms try to find a place
where one or several strings are found within a
larger string.
• Naïve string search: The simplest and least
efficient way to see where one string occurs inside
another is to check each place it could be, one by
one, to see if it's there. So, first we see if there's a
copy of the substring in the first few characters of
the text; if not, we look to see if there's a copy
starting at the second character of the text; if not,
we look starting at the third character, and so
forth.
DNA Sequence alignment
• Sequence alignment is an arrangement of two or more sequences,
highlighting their similarity. The sequences are padded with gaps
(usually denoted by dashes) so that wherever possible, columns
contain identical or similar characters from the sequences involved:
Example:
tcctctgcctctgccatcat- - -caaccccaaagt
|||| ||| ||||| |||||
| |||| ||| ||||
tcctgtgcatctgcaatcatgggcaaccccaaagt
• It is usually used to study the evolution of the DNA sequences from
a common ancestor. Mismatches in the alignment correspond to
mutations, and gaps correspond to insertions or deletions.
• The term sequence alignment may also refer to the process of
constructing such alignment or finding significant alignments in a
database of potentially unrelated sequences.
BIOINFORMATICS
• Bioinformatics was born of the need for high-powered computing
ability to help organize, analyze, and store biological information;
primarily DNA and protein sequence data.
• Gene sequence databases in the United States is called GenBank
administered by National Center for Biotechnology Information.
• Besides storing biological information, the database can be used to
help analyze genes, their functions, and evolution.
• A DNA that has been cloned and sequenced is entered in a search
computer program called BLAST to determine if
1) it has already been cloned;
2) it is related to an already known gene (if it is a new gene sequence, its
relatedness to other known sequences might help determine its
biological function)
• The BLAST program lines up the query sequence with each sequence
in the database in an alignment and shows similar nucleotides by
connecting them with a line. This gives an estimate of gene
relatedness.