Download Intro to Bioinformatics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein wikipedia , lookup

Magnesium transporter wikipedia , lookup

Protein phosphorylation wikipedia , lookup

Protein (nutrient) wikipedia , lookup

List of types of proteins wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein moonlighting wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Welcome to Chem 434
Bioinformatics
Sept 20, 2012
Review of course prerequisites
Review of syllabus
Review of CSULA Bioinformatics Course website
Course logistics
Course Website
(http://www.calstatela.edu/faculty/jmomand
/Bioinformaticscourse.html)
Rationale for offering bioinformatics
1.
Need to understand how popular
bioinformatics algorithms operate (Clustal
W, BLAST, PSIPRED).
2.
A programming assignment gives a taste
of what it is like to be a developer.
Definition of Bioinformatics

Use of computers to catalog and organize
biological information into meaningful entities.
Learning Outcomes
1)
2)
3)
4)
5)
6)
7)
Retrieve gene sequence information from GenBank.
Use BLAST to conduct gene similarity searches.
Align multiple sequences with Clustal W software.
Predict secondary structures with PSIPRED.
Display and compare protein structures.
Write software programs that perform queries a database
with a protein sequence.
Understand the theory that led to the development of
scoring methods commonly used to measure sequence
similarities.
How is Bioinformatics Used?
Bioinformatics is used to help “focus”
the experiments of the benchtop scientist
Bioinformatics isn’t going to replace
lab work anytime soon
Experimental proof is still the
“Gold Standard”.
Useful textbooks on the subject
Beginning Python: From Novice
to Professional, Apress 2008
ISBN: 1-50059-982-9
Bioinformatics – Why to Do It
Richard Karp’s Motivation:
"Find genetic basis of complex diseases so
that we can develop more effective modes
of treatment."
Bioinformatics – How to Do It
“… solving biological problems requires far
more than clever algorithms: it involves a
creative partnership between biologists and
mathematical scientists to arrive at an
appropriate mathematical model, the
acquisition and use of diverse sources of data,
and statistical methods to show that the
biological patterns and regularities that we
discover could not be due to chance."
-- Richard Karp
Who is Richard Karp?
UC Berkeley Professor
Recipient of



Turing Award (1985)
The Benjamin Franklin Medal in Computer and
Cognitive Science (2004)
The Kyoto Prize (2008)
Turing award citation

For his continuing contributions to the theory of
algorithms … most notably, contributions to the theory
of NP-completeness. Karp introduced the now standard
methodology for proving problems to be NP-complete
which has led to the identification of many theoretical
and practical problems as being computationally
difficult.
Recent work on transcriptional regulation of genes,
discovering conserved regulatory pathways,
analyzing genetic variations in humans.
Basis of molecular life sciences
Hierarchy of relationships (some exceptions):
Genome
Gene 1
Gene 2
Gene 3
Gene X
Protein 1
Protein 2
Protein 3
Protein X
Function 1
Function 2
Function 3
Function X
Structure of a nucleotide within DNA
B.
A.
The structure of DNA.
5’ACTG
3’TGAC
Table 1.1. Single letter abbreviations used for DNA nucleotide sequences
One letter
abbreviation
Nucleotide name
Base name
Category
A
Adenosine
monophosphate
Adenine
Purine
C
Cytidine
monophosphate
Cytosine
Pyrimidine
G
Guanosine
monophosphate
Guanine
Purine
T
Thymidine
monophosphate
Thymine
Pyrimidine
N
R
Y
- or *
Any nucleotide
A or G
C or T
--------
Any base
A or G
C or T
-----
NA
Purine
Pyrimidine
Gap
human
drosophila
GCTGTCCCTCACTGTTGAATTTTCTCTAACTTCAAGGCCCATATCTGTGAAATGCT
GCTATTAGT--ATCTTAAGTTTGTATTA--------GTCCTTGTTCGTAAGGCGTT
RNA-the intermediary
B.
A.
OH
Central Dogma of Molecular
Biology
Reverse transcription
The genetic code
Amino acids-the building blocks
of proteins
V
I
L
N
Q
E
F
M
H
K
R
D
G
A
S
T
W
Y
C
P
Table 1.2. Abbreviations used for ambiguous and rare amino acids
1-letter
abbreviation
3-letter abbreviation
Meaning
B
Asn or Asp
Asparagine or aspartic acid
J
Xle
Isoleucine or leucine
O
Pyr
Pyrrolysine
U
Sec
Selenocysteine
Z
Gln or Glu
Glutamine or glutamic acid
X
Xaa
Any amino acid
- or *
---
No corresponding residue (gap)
Levels of protein structure
Levels of protein structure II
Sickle cell anemia
Paper chromatography separation
of hemoglobin peptides.