Download Brief overview of Bio backgound

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Interactome wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Gene regulatory network wikipedia , lookup

Community fingerprinting wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Genomic library wikipedia , lookup

Biosynthesis wikipedia , lookup

Biochemistry wikipedia , lookup

Genetic engineering wikipedia , lookup

Proteolysis wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Gene expression wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genetic code wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene wikipedia , lookup

Protein structure prediction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Transcript
Introduction to Bioinformatics
2. Genetics Background
Course 341
Department of Computing
Imperial College, London
© Simon Colton
Coursework

1 coursework – worth 20 marks
–
Work in pairs

Retrieving information from a database

Using Perl to manipulate that information
The Robot Scientist


Performs experiments
Learns from results
–
Using machine learning

Plans more experiments
Saves time and money

Team member:

–
Stephen Muggleton
Biological Nomenclature

Need to know the meaning of:
–
–
–
–
–
–
Species, organism, cell, nucleus, chromosome, DNA
Genome, gene, base, residue, protein, amino acid
Transcription, translation, messenger RNA
Codons, genetic code, evolution, mutation, crossover
Polymer, genotype, phenotype, conformation
Inheritance, homology, phylogenetic trees
Substructure and Effect
(Top Down/Bottom Up)
Species
Organism
Cell
Affects the
Behaviour of
Affects the
Function of
Nucleus
Protein
Chromosome
Amino Acid
DNA strand
Gene
Base
Prescribes
Folds
into
Cells


Basic unit of life
Different types of cell:
–
–

Cells produced by cells
–
–

Skin, brain, red/white blood
Different biological function
Cell division (mitosis)
2 daughter cells
Eukaryotic cells
–
Have a nucleus
Nucleus and Chromosomes


Each cell has nucleus
Rod-shaped particles inside
–
–

Different number for species
–
–
–

Are chromosomes
Which we think of in pairs
Human(46),tobacco(48)
Goldfish(94),chimp(48)
Usually paired up
X & Y Chromosomes
–
–
Humans: Male(xy), Female(xx)
Birds: Male(xx), Female(xy)
DNA Strands

Chromosomes are same in every cell of organism
–

Supercoiled DNA (Deoxyribonucleic acid)
Take a human, take one cell
–
–
–
Determine the structure of all chromosonal DNA
You’ve just read the human genome (for 1 person)
Human genome project


13 years, 3.2 billion chemicals (bases) in human genome
Other genomes being/been decoded:
–
Pufferfish, fruit fly, mouse, chicken, yeast, bacteria
DNA Structure

Double Helix (Crick & Watson)
–
–

Nitrogenous Base Pairs
–
–
–
–
–

2 coiled matching strands
Backbone of sugar phosphate pairs
Roughly 20 atoms in a base
Adenine  Thymine [A,T]
Cytosine  Guanine [C,G]
Weak bonds (can be broken)
Form long chains called polymers
Read the sequence on 1 strand
–
GATTCATCATGGATCATACTAAC
Differences in DNA

DNA differentiates:
–
–

We share DNA with
–
–

Species/race/gender
Individuals
Primates,mammals
Fish, plants, bacteria
Genotype
–
DNA of an individual


Genetic constitution
Phenotype
–
Characteristics of the
resulting organism

Nature and nurture
Genes

Chunks of DNA sequence
–
–

Large percentage of human genome
–

Is “junk”: does not code for proteins
“Simpler” organisms such as bacteria
–
–

Between 600 and 1200 bases long
32,000 human genes, 100,000 genes in tulips
Are much more evolved (have hardly any junk)
Viruses have overlapping genes (zipped/compressed)
Often the active part of a gene is spit into exons
–
Seperated by introns
The Synthesis of Proteins

Instructions for generating Amino Acid sequences
–
–
–
(i) DNA double helix is unzipped
(ii) One strand is transcribed to messenger RNA
(iii) RNA acts as a template



ribosomes translate the RNA into the sequence of amino acids
Amino acid sequences fold into a 3d molecule
Gene expression
–
–
Every cell has every gene in it (has all chromosomes)
Which ones produce proteins (are expressed) & when?
Transcription


Take one strand of DNA
Write out the counterparts to each base
–
–



G becomes C (and vice versa)
A becomes T (and vice versa)
Change Thymine [T] to Uracil [U]
You have transcribed DNA into messenger RNA
Example:
Start:
GGATGCCAATG
Intermediate: CCTACGGTTAC
Transcribed: CCUACGGUUAC
Genetic Code

How the translation occurs

Think of this as a function:
–
–
–

Input: triples of three base letters (Codons)
Output: amino acid
Example: ACC becomes threonine (T)
Gene sequences end with:
–
TAA, TAG or TGA
A=Ala=Alanine
Genetic Code
C=Cys=Cysteine
D=Asp=Aspartic acid
E=Glu=Glutamic acid
F=Phe=Phenylalanine
G=Gly=Glycine
H=His=Histidine
I=Ile=Isoleucine
K=Lys=Lysine
L=Leu=Leucine
M=Met=Methionine
N=Asn=Asparagine
P=Pro=Proline
Q=Gln=Glutamine
R=Arg=Arginine
S=Ser=Serine
T=Thr=Threonine
V=Val=Valine
W=Trp=Tryptophan
Y=Tyr=Tyrosine
Example Synthesis

TCGGTGAATCTGTTTGAT
Transcribed to:

AGCCACUUAGACAAACUA
Translated to:

SHLDKL
Proteins

DNA codes for
–

Amino acids strings
–
–
–
–

Fold up into complex 3d molecule
3d structures:conformations
Between 200 & 400 “residues”
Folds are proteins
Residue sequences
–

strings of amino acids
Always fold to same conformation
Proteins play a part
–
In almost every biological process
Evolution of Genes: Inheritance

Evolution of species
–

But actually, it is the genotype which evolves
–
–

Caused by reproduction and survival of the fittest
Organism has to live with it (or die before reproduction)
Three mechanisms: inheritance, mutation and crossover
Inheritance: properties from parents
–
–
–
Embryo has cells with 23 pairs of chromosomes
Each pair: 1 chromosome from father, 1 from mother
Most important factor in offspring’s genetic makeup
Evolution of Genes: Mutation

Genes alter (slightly) during reproduction
–
–




Caused by errors, from radiation, from toxicity
3 possibilities: deletion, insertion, alteration
Deletion: ACGTTGACTC  ACGTGACTC
Insertion: ACGTTGACTC  AGCGTTGACTC
Substitution: ACGTTGACTC  ACGATGACTT
Mutations are almost always deleterious
–
–
A single change has a massive effect on translation
Causes a different protein conformation
Evolution of Genes:
Crossover (Recombination)

DNA sections are swapped
–
From male and female genetic input to offspring DNA
Bioinformatics Application #1
Phylogenetic trees


Understand our evolution
Genes are homologous
–

By looking at DNA seqs
–
–

If they share a common ancestor
For particular genes
See who evolved from who
Example:
–
Mammoth most related to


African or Indian Elephants?
LUCA:
–
–
Last Universal Common Ancestor
Roughly 4 billion years ago
Genetic Disorders

Disorders have fuelled much genetics research
–
Remember that genes have evolved to function




Not to malfunction
Different types of genetic problems
Downs syndrome: three chromosome 21s
Cystic fibrosis:
–
–
–
Single base-pair mutation disables a protein
Restricts the flow of ions into certain lung cells
Lung is less able to expel fluids
Bioinformatics Application #2
Predicting Protein Structure

Proteins fold to set up an active site
–
–

Small, but highly effective (sub)structure
Active site(s) determine the activity of the protein
Remember that translation is a function
–
–
–
–
Always same structure given same set of codons
Is there a set of rules governing how proteins fold?
No one has found one yet
“Holy Grail” of bioinformatics
Protein Structure Knowledge

Both protein sequence and structure
–

1.3+ Million protein sequences known
–

Found with projects like Human Genome Project
20,000+ protein structures known
–

Are being determined at an exponential rate
Found using techniques like X-ray crystallography
Takes between 1 month and 3 years
–
–
To determine the structure of a protein
Process is getting quicker
Sequence versus Structure
500000
Protein sequence
Number
400000
300000
200000
100000
Protein structure
0
85
90
95
Year
00
Database Approaches

Slow(er) rate of finding protein structure
–

Structure is much more conservative than sequence
–

Still a good idea to pursue the Holy Grail
1.3m genes, but only 2,000 – 10,000 different conformations
First approach to sequence prediction:
–
–
–
Store [sequence,structure] pairs in a database
Find ways to score similarity of residue sequences
Given a new sequence, find closest matches


–
A good match will possibly mean similar protein shape
E.g., sequence identity > 35% will give a good match
Rest of the first half of the course about these issues
Potential (Big) Payoffs
of Protein Structure Prediction

Protein function prediction
–

Rational drug design
–

Protein interactions and docking
Inhibit or stimulate protein activity with a drug
Systems biology
–
–
Putting it all together: “E-cell” and “E-organism”
In-silico modelling of biological entities and process
Further Reading

Human Genome Project at Sanger Centre
–

Talking glossary of genetic terms
–

http://www.sanger.ac.uk/HGP/
http://www.genome.gov/glossary.cfm
Primer on molecular genetics
–
http://www.ornl.gov/TechResources/Human_Genome/publicat/primer/toc.html