* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download BINF6201/8201 Basics of Molecular Biology
Histone acetylation and deacetylation wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Molecular cloning wikipedia , lookup
Cell-penetrating peptide wikipedia , lookup
Polyadenylation wikipedia , lookup
RNA silencing wikipedia , lookup
Gene regulatory network wikipedia , lookup
Messenger RNA wikipedia , lookup
Community fingerprinting wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
List of types of proteins wikipedia , lookup
Bottromycin wikipedia , lookup
Non-coding DNA wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Molecular evolution wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Expanded genetic code wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
Point mutation wikipedia , lookup
Protein structure prediction wikipedia , lookup
Non-coding RNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Epitranscriptome wikipedia , lookup
Genetic code wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Gene expression wikipedia , lookup
Biochemistry wikipedia , lookup
Deoxyribozyme wikipedia , lookup
BINF6201/8201 Basics of Molecular Biology 08-26-2016 Linear structure of nucleic acids Ø Nucleic acids are polymers of nucleotides Ø Nucleic acids Deoxyribonucleic acids (DNA) Ribonucleic acids (RNA) Phosphate Ø Nucleotide Ribose or deoxyribose Nucleoside Purines Base Adenine (A) Guanine (G) Thymine (T) Pyrimidines Uracil (U) Cytosine (C) RNA Mono-, di-, and tri-phosphate nucleotides DNA RNA Base Nucleoside Nucleotide Adenine (A) Deoxyadenosine dAMP dADP dATP Guanine (G) Deoxyguanosine dGMP dGDP dGTP Cytosine (C) Deoxycytidine dCMP dCDP dCTP Thymine (T) Deoxythymidine dTMP dTDP dTTP Adenine (A) Adenosine AMP ADP ATP Guanine (G) Guanosine GMP GDP GTP Cytosine (C) Cytidine CMP CDP CTP Uracil (U) Uridine UMP UDP UTP The pairing rule of the bases in nucleic acids: A-T/U and G-C. Ø A-T/U pairing forms two hydrogen bonds--weak bond. Ø G-C pairing forms three hydrogen bonds---strong bond. Ø Therefore, G-C pairing is more stable than A-T/U pairing. The double helical structure of DNA Ø Two complementary DNA strands run in antiparallel, and are coiled around each other, forming a double helical structure. Ø There are two grooves on the surface of the double helix: the major groove and minor groove. Ø Regulatory molecules bind to DNA in these grooves, changing the structure and function of DNA. Ø Cytosine residues in some regions in DNA can be modified by methylation,thereby changing their functional states. Higher level structures of DNA Ø In eukaryotic cells, DNA molecules are highly compacted by wrapping around the histone protein core, forming nucleosomes. Ø The histone core is made up of 2 copies of each of the four histone proteins (H2A, H2B, H3 and H4). Ø Nucleosomes are further coiled to form super coils. Ø The N-terminal tail of histones can be modified by methylation or acetylation on the lysine or arginine for controlling the open or close states of chromatin, and thus its functions. 3D structure of a nucleosome Structure of RNAs Ø RNAs are single stranded. Ø However, the complementary parts in a RNA molecule can form local double-stranded structures, thus, causing loops in the non-complementary regions. Structure of a tRNA-Ala 5’ 3’ Ø There are at least four major functional types of RNAs: 1. 2. 3. 4. mRNA tRNA rRNA Small regulatory RNAs, e.g., micro RNA (miRNA) and small interfering RNA(siRNA) Protein structure Ø Proteins are polymers of amino acids linked by peptide bonds; Ø There are twenty amino acids found in proteins; Ø Amino acids differ in their side chains: R groups; Ø The linear order of amino acid sequence of a protein is called its primary structure. Classification of amino acids according to the structure of side chains Classification of amino acids according to the structure of side chains Classification of amino acids according to the structure of side chains Classification of amino acids according to the structure of side chains Higher level structures of proteins Ø Secondary structure: the ways that the linear amino acid sequence forms specific structures: α-helix and β-sheets. α-helix β-sheet Ø Tertiary structure: the ways that the linear amino acids of a polypeptide chain form a specific 3D structure for a specific function. Ø Prediction of the 3D structure of a protein from its sequence is a challenging problem in computational biology. The Central Dogma of Molecular Biology Ø Genetic information is stored in DNA and passed from DNA to RNA to protein. DNA Reverse transcription replication Transcription mRNA Translation Protein What is a gene ? Ø A gene is a segment of DNA that contains the information necessary to make functional RNA and peptide molecules. Ø According to this definition, a gene includes transcribed sequence and non-transcribed regulatory sequences that control the transcription and translation of the gene product. Ø Genes can be classified as protein coding genes and RNA-specifying genes. Ø In bioinformatics and computational biology, a gene often refers to the DNA sequence that specifies the sequence of a protein (open reading frame, ORF) or a RNA molecule, and its regulatory sequences are treated separately. Structure of genes in prokaryotes Ø Adjacent genes of the same orientation in prokaryotes can be transcribed simultaneously, forming a structure, called an operon. Ø A typical operon contains the following elements: 1. Open reading frames; 2. Upstream regulatory elements 3. A downstream transcriptional terminator FT binding site -300 Promoter region -35 TSS Ribsome binding site Terminator -10 +1 Upstream regulatory region Ø Prediction of genes (ORFs) in prokaryotes has reached a high accuracy using machine learning algorithms. Structure of genes in eukaryotes Ø Due the complexity of gene structures in eukaryotes, accurate prediction of genes in these organisms is still a challenging problem. DNA replication Ø Chromosomes are replicated before each cell division; Ø DNA replication is semi-conservative: each of the two newly synthesized DNA molecules contains an original strand of DNA and a newly synthesized complementary strand; Ø The leading strand is synthesized continuously, while the lagging strand is produced in fragments (Okazaki fragments), which are later jointed; Ø Major enzymes involved: 1. Primase for the synthesis of RNA primers; 2. DNA polymerase III for extension; 3. DNA polymerase I for the excision of primers and filling the gaps; 4. Ligase for joining fragments. Ø Although both polymerase III and I have the capability of proofreading, incorrectly paired bases can still be incorporated, which is a major source of mutations. Transcription Ø Transcription is catalyzed by RNA polymerase using one of the DNA strands as the template — template strand or non-coding strand; Ø The opposite strand is called non-template stand or coding strand, because it has the same sequence as the transcribed RNA with a T replaced by a U. Coding strand Non-coding strand Transcription Ø Transcription is controlled by the interaction of trans-acting elements called transcription factors (TFs) and cis-acting elements of DNA. Ø Prediction of cis-acting elements or TF binding sites is a challenging problem in computational biology. Regulation of transcription in prokaryotes TF binding site Promoter region α α TF1 TF2 β β -300 -35 Ribosome TSS binding site σ -10 Terminator +1 Transcription 5’ UTR RNA 3’ UTR RNA processing in eukaryotes Ø A “cap” is added to the 5’ end, consisting of a methylated guanosine and cap-binding proteins Ø A string of bout 200 adenosines are added to the 3’ end. This poly-A tail is bound by poly-A binding proteins. Ø Splicing: introns are cut out, and exons are linked. • There can be many forms of splicing, generating different mRNAs —alternative splicing, so a gene can code for many proteins. • Splicing can be mediated by spliceosome or the RNA itself. • Prediction of alternative splicing sites is a challenging problem in computational biology. Translation Ø Translation starts by the association of ribosome with the ribosome binding site in the mRNA molecule, and the following components are involved : 1. Ribosome: consisting of a small and a large subunit, each is composed of a few rRNA and hundreds of protein molecules. 2. tRNA: carrying a specific amino acid, and recognizing a codon using its anti-codon through base paring. 3. Amino acyl-tRNA synthetase: attaching an amino acid to its tRNA. Transcription and translation are two highly coupled process in prokaryotes TF binding site Promoter region α α TF1 TF2 β β -300 -35 Ribosome TSS binding site σ -10 Terminator +1 Ribosome binding site Transcription 3’ UTR RNA 5’ UTR Proteins Translation Standard genetic codons Ø There are 61 sense codons and 3 non-sense (stop) codons; Ø Degeneracy of codons; Ø Some codons for the same amino acid are more frequently used than the others, a phenomenon called codon bias; Ø Mutations in the 1st and 2nd nucleotides in a codon often result in changes in amino acids, while a mutation in the 3rd nucleotide does not, thus it is called a wobble base. http://www.nature.com/scitable/content/The-genetic-code-consists-of-64-codons-42614