* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 5` 3` - UTSA CS
Genetic engineering wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Community fingerprinting wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Genomic library wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Transformation (genetics) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Messenger RNA wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Biochemistry wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Molecular cloning wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
DNA supercoil wikipedia , lookup
Genetic code wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Epitranscriptome wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
Gene expression wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Biosynthesis wikipedia , lookup
CS5263 Bioinformatics Lecture 2: Introduction to molecular biology Polymer Monomer DNA Deoxyribonucleotides RNA Ribonucleotides Protein Amino Acid DNA • DNA: forms the genetic material of all living organisms • A string made from alphabet {A, C, G, T} – e.g. ACAGAACGTAGTGCCGTGAGCG • Each letter is called a base – A deoxyribonucleotides 5’ A 5’-AGCGACTG-3’ G C AGCGACTG G DNA A Many biological processes go from 5’ to 3’ e.g. DNA replication, transcription, etc. C T G 3’ 5 Phosphate 4 Base 1 Sugar 3 2 5’ A 3’ Base-pair: A=T Forward (+) strand G=C G 5’-AGCGACTG-3’ 3’-TCGCTGAC-5’ C G A AGCGACTG TCGCTGAC C T AGCGACTG G 3’ Backward (-) strand 5’ One strand is said to be reversecomplementary to the other DNA double helix RNA • Carry information from DNA to protein – Other functions have been found • a string made from alphabet {A, C, G, U} – e.g. ACAGAACGUAGUGCCGUGAGCG • Each letter is called a base – A ribonucleotides 5’ A 5’-AGUGACUG-3’ G U AGUGACUG G RNA A Many biological processes go from 5’ to 3’ e.g. transcription. C U G 3’ 5 Phosphate 4 Base 1 Sugar 3 2 RNA Secondary structures • RNAs are normally single-stranded • Can form complex structure by self-basepairing • A=U, C=G Protein • The actual “worker” for almost all processes in the cell • A string built from 20 letters – E.g. MGDVEKGKKIFIMKCSQCHTVEKGGKH • Each letter is called an amino acid Protein zoom-in • Composed of a chain of amino acids. Side chain R | H2N--C--COOH | Amino group H Carboxyl group Amino acid • 20 amino acids, only differ at side chains – Each can be expressed by three letters – Or a single letter: A-Y, except B, J, O, U, X – Alanine = Ala = A – Arginine = Arg = R – Asparagine = Asn = N – Lysine = Lys = K Amino acids => peptide R | H2N--C--COOH | H R | H2N--C--COOH | H R R | | H2N--C--CO--NH--C--COOH | | H H Peptide bond Protein R H2N R R R R R … N-terminal • • • • COOH C-terminal Has orientations Usually recorded from N-terminal to C-terminal Peptide vs protein: basically the same thing Conventions – Peptide is shorter (< 50aa), while protein is longer – Peptide refers to the sequence, while protein has 2D/3D structure Protein structure • Linear sequence of amino acids folds to form a complex 3-D structure. • The structure of a protein is intimately connected to its function. Genome and chromosome • Genome: the complete DNA sequences of an organism – May contain one (in prokaryotes) or more (in eukaryotes) chromosomes • Chromosome: a single large DNA molecule in an organism – May be circular or linear – Contain genes as well as “junk DNAs” – Highly packed! Formation of chromosome Formation of chromosome 50,000 times shorter than extended DNA Gene • Gene: unit of heredity in living organisms – A segment of DNA with information to make a protein Some statistics Chromosomes Bases Genes Human 46 3 billion 20k-25k Dog 78 2.4 billion ~20k Corn 20 2.5 billion 50-60k Yeast 16 20 million ~7k E. coli 1 4 million Marbled lungfish ? 130 billion ? ~4k Human genome • • • • 46 chromosomes: 22 pairs + X + Y 1 from mother, 1 from father Female: X + X Male: X + Y Human genome • Every cell contains the same genomic information – Except sperms and eggs – They only contain half of the genome • Otherwise your children would have 46 + 46 chromosomes • How does biology achieve that? Cell division: meiosis • A reproductive cell divides into four cells, each containing only half of the genomes – Diploid => haploid • Two haploid cells (sperm + egg) forms a zygote – Which will then develop into a multi-cellular organism by mitosis Cell division: mitosis • A cell duplicates its genome and divides into two identical cells • These cells build up different parts of your body Central dogma of molecular biology DNA replication is critical in both mitosis and meiosis DNA Replication • The process of copying a double-stranded DNA molecule – Semi-conservative 5’-ACATGATAA-3’ 3’-TGTACTAT-5’ 5’-ACATGATAA-3’ 5’-ACATGATAA-3’ 3’-TGTACTATT-5’ 3’-TGTACTATT-5’ • Mutation: changes in DNA base-pairs • Proofreading and error-correcting mechanisms exist to ensure extremely high fidelity DNA synthesis • Creating DNA synthetically in a laboratory • Chemical synthesis – Chemical reactions – Arbitrary sequences – Maximum length 160-200 • Cloning: make copies based on a DNA template – Biological reactions – Requires template – Many copies of a long DNA in a short time in vivo Cloning • Connect a piece of DNA to bacterial DNA, which can then be replicated together with the host DNA in vitro Cloning • Polymerase chain reaction (PCR) 5’ 5’ denature 5’ 5’ Primer (< 30 bases) 5’ 5’ 5’ 5’ DNA Polymerase dNTP 5’ 5’ 5’ 5’ Reaction Chemical synthesis Chemical In vivo cloning Biological In vitro cloning Biological Template No Yes Yes Speed Fast Length Very short Vary (rely Fast on host cell) Long Medium Some terms • Denaturation: a DNA double-strand is separated into two strands – By raising temperature • Renaturation: the process that two denatured DNA strands re-forms a double-strand – By cooling down slowly • Hybridization: two heterogeneous DNAs form a double-strand – may have mismatches – The rationale behind many molecular biological techniques including DNA microarray Central dogma of molecular biology Transcription • The process that a DNA sequence is copied to produce a complementary RNA – Called message RNA (mRNA) if the RNA carries instruction on how to make a protein – Called non-coding RNA if the RNA does not carry instruction on how to make a protein – Only consider mRNA for now • Similar to replication, but – Only one strand is copied Transcription (where genetic information is stored) DNA-RNA pair: A=U, C=G T=A, G=C (for making mRNA) Coding strand: 5’-ACGTAGACGTATAGAGCCTAG-3’ Template strand: 3’-TGCATCTGCATATCTCGGATC-5’ mRNA: 5’-ACGUAGACGUAUAGAGCCUAG-3’ Coding strand and mRNA have the same sequence, except that T’s in DNA are replaced by U’s in mRNA. The genetic code • There are four bases in DNA (A, C, G, T), and four in RNA (A, C, G, U), but 20 amino acids in protein • How are amino acids encoded in mRNA? – 4^1 = 4 – 4^2 = 16 – 4^3 = 64 • The actual genetic code used by the cell is a triplet. – Each triplet is called a codon – Redundancy – Universal The Genetic Code Third letter Translation • The sequence of codons is translated to a sequence of amino acids • Gene: -GCT TGT TTA CGA ATT• mRNA: -GCU UGU UUA CGA AUU • Peptide: - Ala - Cys - Leu - Arg - Ile – • Start codon: AUG – Also code Met – Stop codon: UGA, UAA, UAA Translation • Transfer RNA (tRNA) – a different type of RNA. – Freely float in the cytoplasm. – Every amino acid has its own type of tRNA that binds to it alone. • Anti-codon – codon binding crucial. tRNA tRNA More complexity Transcription factor RNA Polymerase Transcription starting site promoter gene • RNA polymerase binds to certain location on promoter to initiate transcription • Transcription factor binds to specific sequences on the promoter to regulate the transcription – Recruit RNA polymerase: induce – Block RNA polymerase: repress – Multiple transcription factors may coordinate More complexity promoter Transcription starting site gene transcription Pre-mRNA • Pre-mRNA needs to be “edited” to form mature mRNA intron intron Pre-mRNA 5’ UTR exon exon 3’ UTR exon Splice Mature mRNA (mRNA) Open reading frame (ORF) Start codon Stop codon DNA sequencing: Basic idea • PCR primer extension 5’-TTACAGGTCCATACTA 3’-AATGTCCAGGTATGATACATAGG-5’ • We need to supply A, C, G, T for the synthesis to continue • Besides A, C, G, T, we add some A*, C*, G*, and T* – Very similar to ACGT in all aspects, except that – The extension will stop if used DNA sequencing, cont DNA sequencing, cont