Download Department of Health Information Management

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Protein design wikipedia , lookup

Protein wikipedia , lookup

Rosetta@home wikipedia , lookup

Proteomics wikipedia , lookup

Protein purification wikipedia , lookup

Cyclol wikipedia , lookup

Circular dichroism wikipedia , lookup

Western blot wikipedia , lookup

Protein folding wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Alpha helix wikipedia , lookup

Protein domain wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Structural alignment wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
Genomics and Personalized
Care in Health Systems
Lecture 9 RNA and Protein Structure
Leming Zhou, PhD
School of Health and Rehabilitation Sciences
Department of Health Information Management
Department of Health Information Management
Outline
• RNA structure
• Protein structure
• Pharmacogenomics
Department of Health Information Management
Two Types of Genes
• Protein coding genes
– Common patterns: promoter region, start codon, codons, stop
codon
– Translated to protein sequence
• RNA genes
– No consistent patterns common to all RNA genes
– Not translated to proteins
– Functional as RNA molecules
Department of Health Information Management
Types of RNA
• mRNA: messager RNA
• tRNA: transfer RNA for providing codons and amino
acids
• rRNA: ribosomal RNA for protein translation
• miRNA: MicroRNAs are small (22 nucleotides) noncoding RNA gene products that seem to regulate
translation
• snRNAs: small nuclear RNAs
– Spliceosomal RNAs found in spliceosome which is involved in
splicing
– Small nucleolar RNA located in the nucleolus
Department of Health Information Management
RNA Genes
• RNA has various functions
• There are software developed to search for RNA genes in
the genome.
– tRNAscan searched for tRNA
Department of Health Information Management
RNA Databases
• Ribosomal RNA database
– Ribosomal Database Project: http://rdp.cme.msu.edu/
• tRNA Databases
– Genomic tRNA Database: http://gtrnadb.ucsc.edu/
• snoRNA Databases
– Yeast snoRNA Database:
http://people.biochem.umass.edu/fournierlab/snornadb/main.php
Department of Health Information Management
Secondary and Tertiary Structure
• RNA sequence  RNA structure
– folding and pairing of bases within the sequence
• Canonical pairing: G-C and A-U
– G-C pairing give more energetic stability (3 bonds)
• Non-canonical pairing: G-U (very common), A-C, A-G,
etc.
• Double stranded regions and loop regions are the
secondary structure elements
• Tertiary structure is the interaction between secondary
structure elements
Department of Health Information Management
RNA Secondary Structure
• For RNAs, secondary structures are conserved, but
primary sequences are not necessarily conserved
http://rnajournal.cshlp.org/content/10/10/1541/F1.expansion
Department of Health Information Management
RNA Structure Prediction Methods
• Sequence and base pairing patterns
• Energy minimization
– Find the energetically most stable structure
– Energy calculations based on base pairings
– All possible structures are sampled using the Monte Carlo method
– Zuker and Stiegler (1981) used dynamic programming and energy
rules to get the energetically most favorable structure.
– Mfold is software developed by Zuker and co-workers. It is very
computationally expensive and can be used on a maximum of
about 1000 nucleotides.
Department of Health Information Management
Exercises
Use mfold to predict the secondary structure of a
RNA sequence
GTTTCCGTAGTGTAGTGGTTATCACGTTCGCCTCACACGCGAAAGG
TCCCCGGTTCGAAACCGGGCGGAAACA
http://mfold.rna.albany.edu/?q=mfold
Protein Structure
Department of Health Information Management
Four Levels of Protein Structure
• Primary Structure – Sequence of amino acids
• Secondary Structure – Local Structure such as
alpha-helices and beta-sheets
• Tertiary Structure – Arrangement of the secondary
structural elements to give 3D structure of a protein
• Quaternary Structure – Arrangement of the subunits to
give a protein complex its 3D structure
Department of Health Information Management
Protein Basic Structure
• A protein is made of a chain of amino acids
• A amino acid sequence is generally reported from the Nterminal end to the C-terminal end
J. Biol. Chem. 1973, 248, p. 7670
Department of Health Information Management
Secondary Structure (Helices)
Department of Health Information Management
Helix Examples
Department of Health Information Management
Secondary Structure (Beta-sheets)
Department of Health Information Management
Beta Sheet Examples
Parallel beta sheet
Anti-parallel beta sheet
Department of Health Information Management
Beta Sheet Examples (Cont’d)
Department of Health Information Management
Protein Structure Example
Beta Sheet
Helix
Loop
ID: 12as
2 chains
Protein Classification
Department of Health Information Management
Domain and Motif
• Domain: a discrete portion of a protein assumed to fold
independently of the rest of the protein and possessing
its own function.
– Most proteins have multiple domains
• Motif:
– Frequently occurring structure patterns among multiple proteins
Department of Health Information Management
Protein Classification
• Family: the proteins in the same family are homologous,
evolved from the same ancestor. Usually, the identity of
two sequences are very high.
• Super Family: distant homologous sequences, evolved
from the same ancestor. Sequence identity is around 25%30%.
• Fold: only shapes are similar, no homologous
relationship. Usually, sequence identity is very low.
• Protein classification databases: SCOP, CATH
Department of Health Information Management
SCOP
• The SCOP database aims to provide a detailed and
comprehensive description of the structural and
evolutionary relationships between all proteins whose
structure is known.
• Proteins are classified to reflect both structural and
evolutionary relatedness.
– Many levels exist in the hierarchy
– The principal levels are family, super family and fold
Department of Health Information Management
CATH
• CATH is novel hierarchical classification of protein
domain structures, which clusters proteins at four major
levels:
– Class
– Architecture
– Topology
– Homologous super family
Department of Health Information Management
CATH-Protein Structure Classification
Class
Architecture
Topology
Protein Structure Determination
Department of Health Information Management
Experimental Methods for Protein
Structure Determination
• X-ray crystallography
– Crystallize proteins
– Measure X-ray diffraction pattern
• NMR spectroscopy
– NMR – Use nuclear magnetic resonance to predict distances
between different Functional groups in a protein in solution.
– Calculate possible structure using these distances.
• Neutron diffraction
• Electron microscopy
• Atomic force microscopy
Department of Health Information Management
Limitations of Experimental Methods
• X-ray Diffraction
– Only a small number of proteins can be made to form crystals
– A crystal is not the protein’s native environment
– Very time consuming
• NMR Distance Measurement
– Not all proteins are found in solution
– This method generally looks at isolated proteins rather than
protein complexes
– Very time consuming
Department of Health Information Management
Computational Structure Prediction
• The functions of a protein is determined by its structure.
• Experimental methods to determine protein structure are
time-consuming and expensive.
• Big gap between the available protein sequences and
structures.
Department of Health Information Management
Observations
• Sequences determine structures
• Proteins fold into minimum energy state.
• Structures are more conserved than sequences. If two
protein sequences share 30% identical residues, then they
have a very good chance to have the same fold.
Department of Health Information Management
Prediction Methods
• Ab initio folding: build a structure without
referring to an existing structure
• Homology Modeling: sequence-based method
• Protein Threading: sequence-structure alignment
• Consensus Method: vote a prediction from some
candidates generated by several prediction
programs
Department of Health Information Management
Ab Initio Folding
• Based on the “first-principle”
• Build structures purely from protein sequences, no
templates used
• Unaffordable computing demands
• Paradigm is changing, knowledge-based methods are
proposed
Department of Health Information Management
Secondary Structure Prediction
• Three-state model: helix (H), strand (E), coil (L)
• Given a protein sequence:
– NWVLSTAADMQGVVTDGMASGLDKD…
• Predict are secondary structure sequence:
– LLEEEELLLLHHHHHHHHHHLHHHL…
– Accuracy: 50-85%
Department of Health Information Management
Predict Protein Secondary Structure
Using PredictProtein
• Protein Sequence
>gi|22330039|ref|NP_683383.1| unknown protein; protein id:
At1g45196.1 [Arabidopsis thaliana]
MPSESSYKVHRPAKSGGSRRDSSPDSIIFTPESNLSLFSSASVSVDRCSSTSDAHDRDD
SLISAWKEEFEVKKDDESQNLDSARSSFSVALRECQERRSRSEALAKKLDYQRTVSLDL
SNVTSTSPRVVNVKRASVSTNKSSVFPSPGTPTYLHSMQKGWSSERVPLRSNGGRSPPN
AGFLPLYSGRTVPSKWEDAERWIVSPLAKEGAARTSFGASHERRPKAKSGPLGPPGFAY
YSLYSPAVPMVHGGNMGGLTASSPFSAGVLPETVSSRGSTTAAFPQRIDPSMARSVSIH
GCSETLASSSQDDIHESMKDAATDAQAVSRRDMATQMSPEGSIRFSPERQCSFSPSSPS
PLPISELLNAHSNRAEVKDLQVDEKVTVTRWSKKHRGLYHGNGSKM
• PredictProtein web server:
– http://www.predictprotein.org
Department of Health Information Management
Read the Results
Department of Health Information Management
Evolutionary Methods
• Taking into account related sequences helps in
identification of “structurally important”residues.
• Algorithm:
– Find similar sequences
– Construct multiple alignment
– Use alignment profile for secondary structure prediction
• Additional information used for prediction
– Mutation statistics
– Residue position in sequence
– Sequence length
Department of Health Information Management
Sequence Similarity Methods for
Structure Prediction
• These methods can be very accurate if there is >50%
sequence similarity
• They are rarely accurate if the sequence similarity <30%
• They use similar methods as used for sequence alignment
such as the dynamic programming algorithm, hidden
markov models, and clustering algorithms.