Download Bioinformatics course 10.09.15

Document related concepts

Real-time polymerase chain reaction wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Genomic library wikipedia , lookup

Interactome wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Biochemistry wikipedia , lookup

Expression vector wikipedia , lookup

Molecular ecology wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Gene regulatory network wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Community fingerprinting wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene wikipedia , lookup

Point mutation wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Gene expression wikipedia , lookup

Silencer (genetics) wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Transcript
Introduction to
bioinformatics
Bioinformatics course
10.09.15
Proposed room and time
1. N 12-14 room 107 and N 14-16 room 224 in Jakobi 2
2. E 8-10 and 10-12 room 405 in Liivi 2.
3. K 14-16 and 16-18 room 405 in Liivi 2.
Life on earth
LIFE: the condition that distinguishes animals and
plants from inorganic matter, including the capacity
for growth, reproduction, functional activity, and
continual change preceding death.
Life on earth
https://letstalkaboutscience.wordpress.com/2012/07/30/understanding-deep-time/
Biology
is a natural science concerned with the study of life
and living organisms, including their structure, function,
growth, evolution, distribution, and taxonomy
AEROBIOLOGY, AGRICULTURE, ANATOMY, ASTROBIOLOGY, BIOCHEMISTRY, BIOENGINEERING,
BIOINFORMATICS, BIOMATHEMATICSOR, MATHEMATICAL BIOLOGY, BIOMECHANICS, BIOMEDICAL RESEARCH,
BIOPHYSICS, BIOTECHNOLOGY, BUILDING BIOLOGY, BOTANY, CELLBIOLOGY, CONSERVATION BIOLOGY,
CRYOBIOLOGY, DEVELOPMENTAL BIOLOGY, ECOLOGY, EMBRYOLOGY, ENTOMOLOGY, ENVIRONMENTAL
BIOLOGY, EPIDEMIOLOGY, ETHOLOGY, EVOLUTIONARY BIOLOGY, GENETICS, HERPETOLOGY, HISTOLOGY,
ICHTHYOLOGY, INTEGRATIVE BIOLOGY, LIMNOLOGY, MAMMALOGY, MARINE BIOLOGY, MICROBIOLOGY,
MOLECULAR BIOLOGY, MYCOLOGY, NEUROBIOLOGY, OCEANOGRAPHY, ONCOLOGY, ORNITHOLOGY,
POPULATION BIOLOGY, POPULATION ECOLOGY, POPULATION GENETICS, PALEONTOLOGY, PATHOBIOLOGY OR
PATHOLOGY, PARASITOLOGY, PHARMACOLOGY, PHYSIOLOGY, PHYTOPATHOLOGY, PSYCHOBIOLOGY,
SOCIOBIOLOGY, STRUCTURAL BIOLOGY, VIROLOGY
Stamp collecting
Domain - Eukaryota
Kingdom - Animalia
Phylum - Chordata
Vertebrata (Subphylum)
Class - Mammalia
Order - Primates
Anthropoidea (Suborder)
Hominoidea (Superfamily)
Family - Hominidae
Genus - Homo
Species - sapiens
Species
•
Defined as a group of living organisms consisting
of similar individuals capable of exchanging genes
or interbreeding
http://www.nature.com/news/2011/110823/full/news.2011.498.html
Evolution
Connection between species
Life evolved from “simple” into
more complex systems
Levels of complexity
http://www.nature.com/scitable/topicpage/biological-complexity-and-integrative-levels-of-organization-468#
Biology
•
CELL - basic unit of life
•
GENE - basic unit of heredity
•
EVOLUTION - driving engine
https://en.wikipedia.org/wiki/Biology
Molecular biology
Cell size
http://learn.genetics.utah.edu/content/cells/scale/
Eukaryotic cell
https://bhavanajagat.files.wordpress.com/2012/02/cell-structure-and-functions.jpg
DNA
The Watson and Crick paper entitled “A
Structure for Deoxyribose Nucleic Acid”
written on the 2nd of April, 1953 and
published in “Nature” on the 25th April
http://www.ba-education.com/for/science/dnadiscovery.html
Bioinformatics
Biolog
e
c
ien
y
m
o
C
c
s
p
Definitions of Bioinformatics
•
The term bioinformatics was coined in 1978
•
︎Bioinformatics is the application of information technology and
computer science to the field of molecular biology
•
︎The science of using / developing computer software and algorithms to
record, analyse and merge biologically related data
•
︎Using computer technology to manage large amounts of biological
data
•
︎Bioinformatics involves the use of techniques including applied
mathematics, informatics, statistics, computer science, artificial
intelligence, chemistry, and biochemistry to solve biological problems
usually on the molecular level
http://www.google.com/search?q=define%3ABioinformatics
Definitions of Bioinformatics
•
The collection, organisation, storage, analysis, and
integration of large amounts of biological data
using networks of computers and databases
•
︎Bioinformatics involves the integration of
computers, software tools, and databases in an
effort to address biological questions
•
︎In summary, the use of computer science to solve
biological problems
http://www.google.com/search?q=define%3ABioinformatics
Bioinformatic focus
ANALYSIS AND INTERPRETATION OF VARIOUS
TYPES OF BIOLOGICAL DATA INCLUDING:
NUCLEOTIDE AND AMINO ACID SEQUENCES,
PROTEIN DOMAINS, AND PROTEIN STRUCTURES.
http://bip.weizmann.ac.il/course/introbioinfo/lecture1/
Bioinformatic focus
DEVELOPMENT OF NEW ALGORITHMS AND
S TAT I S T I C S W I T H W H I C H T O A C C E S S
B I O L O G I C A L I N F O R M AT I O N , S U C H A S
RELATIONSHIPS AMONG MEMBERS OF LARGE
DATA SETS.
http://www.nature.com/msb/journal/v3/n1/images/msb4100163-f4b.jpg
http://bip.weizmann.ac.il/course/introbioinfo/lecture1/
Bioinformatic focus
DEVELOPMENT AND IMPLEMENTATION OF
TOOLS THAT ENABLE EFFICIENT ACCESS AND
MANAGEMENT OF DIFFERENT TYPES OF
INFORMATION, SUCH AS VARIOUS DATABASES,
INTEGRATED MAPPING INFORMATION
http://www.jofwidata.com/images/database-design-development.jpg
http://wolfson.huji.ac.il/expression/detective.jpg
http://bip.weizmann.ac.il/course/introbioinfo/lecture1/
M.Alroy Mascrenghe
Bioinformatic challenges
•
•
Explosion of information
•
Need for faster, automated analysis to process large amounts of data
•
Need for integration between different types of information
(sequences, literature, annotations, protein levels, RNA levels etc…)
•
Need for “smarter” software to identify interesting relationships in very
large data sets
Lack of “bioinformaticians”
•
Software needs to be easier to access, use and understand
•
Biologists need to learn about the software, its limitations, and how to
interpret its results
Examples of biological data
Name the numbers
Examples of biological data
Central dogma of molecular
biology
http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/nucacids.htm
Central dogma of molecular
biology
http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/nucacids.htm
Central dogma of molecular
biology
http://www.uic.edu/classes/bios/bios100/lectures/centraldogma.jpg
Examples of biological data
Genome
•
︎Is the entirety of an organism’s hereditary information
•
︎The genome includes both the genes and non-coding
sequences of DNA/RNA
•
︎In 1995, Haemophilus influenzae or was the first genome
of a living organism to be sequenced in July 1995
•
︎1 830 140 base pairs of DNA in single circular
chromosome that contains 1740 protein-coding gene, 58
transfer RNA genes and 18 other RNA genes
Genome sizes
Genome sizes
Completely sequenced
genomes
Human genome
DNA
The Watson and Crick paper entitled “A
Structure for Deoxyribose Nucleic Acid”
written on the 2nd of April, 1953 and
published in “Nature” on the 25th April
http://www.ba-education.com/for/science/dnadiscovery.html
Relative proportions (%) of
bases in DNA
DNA vs RNA
http://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/Images3/dna_rna1.gif
•
︎Raw DNA sequence
•
︎Coding or non-coding
•
︎Parses into genes
•
︎4 nucleotide bases ATGC
•
>ENST00000539570 cdna:known chromosome:GRCh37:15:63889592:63893885:1
gene:ENSG00000259662 gene_biotype:protein_coding transcript_biotype:protein_coding
ATGTGGCCACTGCTCACCATGCACATAACCCAGCTCAACCGGGAGTGCCTGCTGCACCTCTTCTCCTTCCTA
GACAAGGACAGCAGGAAGAGCCTTGCCAGGACCTGCTCCCAGCTCCACGACGTGTTTGAGGACCCCGCA
CTCTGGTCCCTGCTGCACTTCCGTTCCCTCACTGAACTCCAGAAGGACAACTTCCTCCTGGGCCCGGCACTC
CGCAGCCTCTCCATCTGCTGGCACTCCAGCCGCGTGCAGGTGTGCAGCATTGAGGACTGGCTCAAGAGTG
CCTTCCAGAGAAGCATCTGCAGCCGGCACGAGAGCCTGGTCAATGATTTCCTCCTCCGGGTGTGCGACAG
GCTTTCTGCTGTGCGCTCCCCACGGAGGCGGGAGGCGCCTGCACCGTCCTCGGGGACTCCGATCGCCGTT
GGACCGAAATCACCTCGGTGGGGAGGACCTGACCACTCGGAGTTCGCCGACTTGCGCTCGGGGGTGACG
GGGGCCAGGGCTGCCGCGCGCAGGGGTCTGGGGAGCCTCCGGGCGGAGCGACCCAGCGAGACCCCGC
CGGCTCCCGGAGTGTCCTGGGGACCGCCACCTCCAGGAGCCCCGGTGGTGATCTCGGTGAAGCAGGAGG
AGGGGAAGCAGGGGCGCACGGGCAGAAGGAGCCACCGAGCCGCTCCTCCTTGCGGTTTTGCCCGCACG
CGCGTCTGCCCGCCCACCTTTCCTGGGGCGGATGCGTTCCCGCAGTGA
A Gene
DNA
•
Protein coding genes cover only 1.5% of human
genome
•
What does the rest do ?
DNA
•
Simple sequence analysis
•
database searching
•
pairwise analysis…
•
Regulatory regions
•
Gene finding
•
Whole genome annotations
•
Comparative genomics (analysis between species and strains)
http://bip.weizmann.ac.il/course/introbioinfo/lecture1/introbioinfo11.htm
Examples of biological data
Transcription
http://www.youtube.com/watch?v=ztPkv7wc3yU
Alternative splicing
Types of RNA
Types of RNA
RNA
•
Splice variants
•
Tissue specific expression
•
Structure
•
Single gene analysis (various cloning techniques…)
•
Experimental data involving thousands of genes
simultaneously
•
DNA chips, microarray and expression array analyses
Examples of biological data
From transcription to
translation
Translation
Translation initiation
Translation termination
Amino acids - the protein
building blocks
(IUPAC nomenclature)
http://biology.stackexchange.com/questions/19314/essential-amino-acid-codons
http://mcmanuslab.ucsf.edu/node/276
Amino acids
Codon Wheel
T == U
http://sciencewords.tumblr.com/post/78190871261/x-for-all-you-biochemists-this-should-help
Protein sequence
>sp|P48431|SOX2_HUMAN Transcription factor SOX-2 OS=Homo sapiens GN=SOX2 PE=1 SV=1
MYNMMETELKPPGPQQTSGGGGGNSTAAAAGGNQKNSPDRVKRPMNAFMVWSRGQRRKMA
QENPKMHNSEISKRLGAEWKLLSETEKRPFIDEAKRLRALHMKEHPDYKYRPRRKTKTLM
KKDKYTLPGGLLAPGGNSMASGVGVGAGLGAGVNQRMDSYAHMNGWSNGSYSMMQDQLGY
PQHPGLNAHGAAQMQPMHRYDVSALQYNSMTSSQTYMNGSPTYSMSYSQQGTPGMALGSM
GSVVKSEASSSPPVVTSSSHSRAPCQAGDLRDMISMYLPGAEVPEPAAPSRLHMSQHYQS
GPVPGTAINGTLPLSHM
Protein domains
Protein domains
Protein
•
Proteome of an organism
•
Mass spec
•
2D structure
•
3D structure
•
4D structure (interactions)
Summary
GENE
TRANSCRIPTION,
TRANSLATION
AND PROTEIN
SYNTHESIS
http://compbio.pbworks.com/f/central_dogma.jpg
Central Dogma
Bioinformatic questions
•
︎To identify an unknown gene of interest
•
︎Sequence matching
•
︎Is there a match to known sequence in the database
•
︎Which protein family does it match to
•
︎How to identify more family members
•
︎I have an similar structure, how to identify its potential ligands
•
︎How to identify if my gene/protein is found present also in other
species
•
︎How can I identify genes that are inherited together in a specific
region
Bioinformatic questions
•
︎I have to constructed a artificial gene, how do I design the
primers, how to check if I have the right sequence?
•
︎To know structure of an poorly expressed RNA sequence
•
︎To identify the structure and function of a protein sequence
•
︎To cluster protein sequences into families of related
sequences and develop models
•
︎To generate phylogenetic trees to identify the evolutionary
relationships using similar proteins/DNA
•
︎To identify which other proteins interacts with sequence of
interest.
Bioinformatic questions
•
︎Find genes that have similar expression in specific
conditions
•
︎Find transcription factors that regulate specific
genes
•
︎Visualise different gene and protein networks ︎
•
Describe the regulation of genes
Practice session
Make a script or two that take in either DNA or mRNA
sequence and perform “translation” action i.e. output
protein sequence (single letters)
For example use
http://www.ncbi.nlm.nih.gov/nuccore/NC_007362.1?from=22&to=1728&report=fasta