Download amino acids

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene desert wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Biosynthesis wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Genetic code wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Molecular ecology wikipedia , lookup

DNA supercoil wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Molecular cloning wikipedia , lookup

Transposable element wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Genomic library wikipedia , lookup

Genetic engineering wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression wikipedia , lookup

Gene regulatory network wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Community fingerprinting wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Point mutation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
What is Life?
• life for living beings is their own existence,
• life – the phenomenon relying on the existence
of dynamic, self-organizing structures able to
multiply and evolution,
• life – a computational process whose algorithm
is encoded in DNA memory and performed by
the protein engine.
Elementary instructions
of protein subroutines
During protein construction the Nature uses 20
standard amino acids:
Ala, A — alanine
Asn, N — asparagine
Cys, C — cysteine
Glu, E — glutamic acid
His, H — histidine
Leu, L — leucine
Met, M — methionine
Pro, P — proline
Thr, T — threonine
Tyr, Y — tyrosine
Arg, R — arginine
Asp, D — aspartic acid
Gln, Q — glutamine
Gly, G — glycine
Ile, I — isoleucine
Lys, K — lysine
Phe, F — phenylalanine
Ser, S — serine
Trp, W — tryptophan
Val, V — valine
Program memory – DNA
double helix
Both strands run in opposite
directions to each other and are
glued by complementary bases:
4A – T
4C – G
Each living cell
posseses the whole
genetic information
about the organism.
DNA – information coding
Every protein has its gene – the fragment of DNA strand, where the
order of symbols A, C, T, G describes the sequence of amino acids.
Genetic code – one amino acid is determined by three successive
symbols on DNA strand (i.e. codon).
Transcription
Translation
Folding
DNA – gene regulatory
network
IN activation
IN: inhibition
regulatory region
OUT: transcription
coding region
nodes: genes/protein products
arches: regulatory relations
The module of gene expression
regulatory network controlling
some stage of sea urchin’s
embryogenesis.
Replication of DNA
4 occurs before every cell division,
4 specialized enzymes disentangle double helix and build new
strands, complementary to the already existing ones,
4 strains coping is performed with error correction in 3'→5'
direction,
4 two identical (almost …) double helixes are made.
Bioinformatic databases
Continually developed collections of information on genetic maps, DNA
sequences, proteins, their interaction, spatial structures, functions in the
organism, etc.
Entrez:
www.ncbi.nlm.nih.gov/Entrez
Ensembl:
www.ensembl.org
Genbank – nucleotide sequence
database, the current size of 1.1·1011
letters in 1.2·108 reported sequences,
its size doubles every 18 months.
Bioinformatic databases
Data available to the public on the Internet
Bioinformatic databases
Data available to the public on the Internet
Sequences comparison
We have learned the sequence of a gene:
4AGAGTCAATCCATAG
Question: what is its function?
Clue: check what is already known about the counterparts (homologues)
of this gene in other evolutionarily related species.
How to find them? We need a program to search other known genomes
for fragments that are very similar to given input (they have to be
transformed by the evolution from one to the other).
Sequences comparison
We have two DNA substrings with a common origin. How did the
evolution occur during the transformation?
4AGAGTCAATCCATAG
4CAGAGGTCCATCATG
Possible histories of the process may be shown by sequence alignments:
4-AGAG-TCAATCCATAG
4CAGAGGTCCATC-ATGor the other one:
4------AGAGTCAATCCATAG
4CAGAGG----TCCATCATG-Which one is more probable?
Sequences comparison
We introduce costs for editing operations e.g.:
4character unchanged: 0
4replacement: 2
4indel (insertion or deletion): 3
We search for the „cheapest” alignment as the most believable.
4-AGAG-TCAATCCATAG
4CAGAGGTCCATC-ATGCost: 4×3 + 2×2 = 16
4------AGAGTCAATCCATAG
4CAGAGG----TCCATCATG-Cost: 12×3 + 4×2 = 44
Sequences comparison
Problem: How to find the „cheapest" alignment of two sequences?
Example. Sequences CTG and CCCG.
Alignments correspond to all possible
ways:
Start → Stop.
Start
The problem is reduced to finding the
cheapest connection on the map.
C
Solution:
4-CTG
4CCCG
Cost: 5
C T G
C
C
G
Stop
Sequences comparison
BLAST (Basic Local Alignment Search Tool) – homologous sequence
finder.
Biological sequence
analysis
The way to understanding …
Gene prediction
Gene finding in DNA – a sophisticated computational problem.
4 some (quite differentiated) sequences mark the beginning and the end
of the gene transcription area ... difficult to find them,
4an eukaryote gene is divided into
alternate fragments: coding and noncoding (called introns) and the latter
are cut out from the RNA transcript
(splicing) before protein synthesis.
4 in the genome there are many
pseudogenes, i.e. "broken" copies
of old genes that have lost the
possibility of transcription – useless
relic of evolution.
Gene prediction
Gene finding in DNA – a sophisticated computational problem.
4 only ~1.5% of the human genome encodes proteins and ~80% is not
related to genes or their regulation.
Open Reading Frame (ORF) in DNA – sequence of nucleotide triplets
beginning with codon Start and ending with Stop, longer than implied by
the case. Potentially coding sequence.
Similar issue: finding of regulatory sequences and other functional
motifs.
Genomic trashcan
4 Transposons – „jumping genes" encode enzymes able to cut and
move them to another place of a chromosome. Often produce "large"
mutations.
4 Reverse transcriptase – enzyme performing „transcription”, but
from RNA into new double strand DNA (which then can be integrated
with the genome). Used by retroviruses, but also some retrotransposons
are capable of self-replication in cell nuclei ("integrated" with the
genome parasite gene).
Phylogenetic analysis
What is the similarity
between an archaeologist
and a geneticist?
Both like to dig
in garbage ☺
– trash after extinct
ancestors is a valuable
source of information.
– for example, the parasitic sequence (e.g. transposon), duplicated in
genome, preserved many self-copies in the DNA of descendent species
– the evidence of common origin.
Phylogenetic analysis
How do we know that whales and dolphins are … even-toed
ungulate? ☺
Examining the existence/lack of 20 different repetitive sequences
(arrows) in the DNA of a group of present-day species we see that this
information can be composed into only one phylogenetic tree:
Phylogenetic analysis
Inactive for millions of years retroviruses, integrated now with the
genomes of modern … monkeys, document human evolution:
Phylogenetic analysis
Phylogenetics based on homologous
genes sequences comparison: among
many possible hypothetical phylogenetic
trees we search such a history, for which
the probability of appearance of genes
under comparison is the greatest.
n=5
105
n=10 ~3.5·107
n=15 ~2.1·1014
n=20 ~8.2·1021
… hard computer analysis, methods of artificial intelligence…
Number of phylogenetic trees for
n species:
3·5 ·... ·(2n–3)
Phylogenetic analysis
Tree of Life Project
www.tolweb.org/tree/
What is Life?
Profesor Donald Knuth, author of "The Art of Computer
Programming":
I can't be as confident about computer science as I can about biology.
Biology easily has 500 years of exciting problems to work on.
… impossible without relying on computer analysis.