Download Secondary structure

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Molecular Biology Primer for CS and
engineering students
Alan Qi
January, 2010
CS69000 Advanced Bioinformatics
• Content description:
This seminar course will include
lectures by the instructor, presentations of readings by
students, and guest lectures by invited speakers. Students will
be expected to present at least one major paper during the
term. This course covers statistical learning algorithms for
computational and systems biology and studies important
research problems in these areas, with a focus on biological
network modeling and analysis.
Topics
• Biological problems including motif discovery, gene expression
analysis, biological network reconstruction, network analysis,
and phylogenetics.
Fundamental algorithmic techniques including hidden Markov
models, exponential-family random graph models (ERGM),
hierarchical clustering by hierarchical Dirichlet processes,
Generalized belief propagation, Markov networks, structure
learning for network models, and approximate inference on
graphical models
Grading policy
• Class participation: 15%
• Class presentations: 25%
• Papers evaluations: 30%
• Research project: 30%
– Preliminary report: 5%
– Class presentation: 5%
– Final report: 20%
Project report
• You are encouraged to collaborate on the
project. We expect a four page write-up about
the project, which should clearly and
succinctly describe the project goal, methods,
and your results. Each group should submit
only one copy of the write-up and describe
the contributions of each group member to
the project. A two person group will have 6
pages, a three person group will have 8 pages,
and so on.
Central Dogma
DNA
RNA
Protein
Genes control the making of cell parts
• The gene is a fundamental unit of inheritance
– DNA molecule contains tens of thousands of genes
– Each gene governs the making of one functional element, one
“part” of the cell machine
– Every time a “part” must be made, a piece of the genome is
copied, transported, and used as a blueprint
• RNA is a temporary copy
– The medium for transporting genetic information from the DNA
information repository to the protein-making machinery is an
RNA molecule
– The more parts are needed, the more copies are made
– Each mRNA only lasts a limited time before degradation
RNA: messager
From pre-mRNA to mRNA: Splicing
• In some species (e.g. eukaryotes), not every part
of a gene is coding
– Functional exons interrupted by non-translated
introns
– During pre-mRNA maturation, introns are
spliced out
– In humans, primary transcript can be 106 bp
long
– Alternative splicing can yield different exon
subsets for the same gene, and hence different
protein products
eukaryotes and prokaryotes
• Eukaryotes include animals, plants and fungis.
organisms whose cells are organized into complex
structures enclosed within membranes. The defining
membrane-bound structure that differentiates
eukaryotic cells from prokaryotic cells is the nucleus.
• The presence of a nucleus gives these organisms
their name, which comes from the Greek ευ (eu),
meaning "good/true," and κάρυον (karyon), "nut.“
http://en.wikipedia.org/wiki/Eukaryote
RNA can be functional
• Single Strand allows complex structure
– Self-complementary regions form helical
stems
– Three-dimensional structure allows
functionality of RNA
• Active research area: non-coding RNAs…
– Once upon a time, before DNA and protein,
RNA did all
Central Dogma
DNA
RNA
Protein
Condon
• The genetic code defines a mapping between tri-nucletide sequences
called codons and amino acids.
• Condon is defined by the initial nucleotide from which translation starts.
– For example, the string GGGAAACCC, if read from the first position, contains
the codons GGG, AAA and CCC; and if read from the second position, it
contains the codons GGA and AAC; if read starting from the third position,
GAA and ACC.
– Every sequence can thus be read in three reading frames. With doublestranded DNA there are six possible reading frames. three in the forward
orientation on one strand and three reverse (on the opposite strand).
– If the DNA is eukaryotic, the reading frame may contain introns.
• Start/stop codons
Translation starts with a chain start codon. The most common start codon
is AUG, which codes for methionine, so most amino acid chains start with
methionine. Nearby sequences and initiation factors are also required to
start translation.
Stop condons: UAG-amber, UGA-umber, and UAA-ochre.
Degeneracy of the genetic code
• The genetic code has redundancy but no ambiguity.
– Both Codons GAA and GAG -> glutamic acid (redundancy),
neither of them specifies any other amino acid (no
ambiguity).
• The codons encoding one amino acid may differ in any
of their three positions.
– the amino acid glutamic acid is specified by GAA and GAG
codons (difference in the third position),
– the amino acid leucine is specified by UUA, UUG, CUU,
CUC, CUA, CUG codons (difference in the first or third
position)
– the amino acid serine is specified by UCA, UCG, UCC, UCU,
AGU, AGC (difference in the first, second or third position).
Proteins carry out the cell’s chemistry
• More complex polymer
– Nucleic Acids have 4 building blocks
– Proteins have 20. Greater versatility
– Each amino acid has specific properties
• Sequence -> Structure -> Function
– The amino acid sequence determines the
three-dimensional fold of protein
– The protein’s function largely depends on
the features of the 3D structure
• Proteins play diverse roles
– Catalysis, binding, cell structure, signaling,
transport, metabolism
Protein structures
• Primary structure - the amino acid sequence of the peptide
chains.
• Secondary structure - highly regular sub-structures (alpha
helix and strands of beta sheet) which are locally defined,
meaning that there can be many different secondary motifs
present in one single protein molecule.
• Tertiary structure - Three-dimensional structure of a single
protein molecule; a spatial arrangement of the secondary
structures.
• Quaternary structure - complex of several protein
molecules or polypeptide chains, usually called protein
subunits in this context, which function as part of the larger
assembly or protein complex.
Summary
Related documents