Download What is a Genome? - Mainlab Bioinformatics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene regulatory network wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Molecular ecology wikipedia , lookup

Transposable element wikipedia , lookup

Biochemistry wikipedia , lookup

Personalized medicine wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Genetic engineering wikipedia , lookup

Biosynthesis wikipedia , lookup

Expression vector wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genetic code wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Gene expression wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene wikipedia , lookup

Point mutation wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Protein structure prediction wikipedia , lookup

Genomic library wikipedia , lookup

Homology modeling wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Non-coding DNA wikipedia , lookup

RNA-Seq wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Bioinformatics for Research
Module 1
Introduction to Genomics
and Bioinformatics
January 12, 2017
Mainlab Bioinformatics
Washington State University
1
Introduction to Genomics
Learning Outcomes
• Refresh your knowledge of basic
genomic concepts and terminology
• Understand conceptually the different
areas of genomic research
• Know the basic tools of genomics
2
Prokaryotes vs. Eukaryotes
Prokaryote
Eukaryote
No nucleus
Nucleus
Circular or linear
chromosomes, plasmids
Chromosomes in nucleus,
mitochondria and
chloroplasts also have
genomes
Polycistronic operons
(multiple genes controlled
by single promoter)
Monocistronic operons
(one gene, one promoter)
No introns
Introns
3
The central dogma of genetics
mRNA
protein
trait
translation
expression
transcription
4
DNA nucleotides
Standard Bases
Abbreviation
Base
A
Adenosine
C
Cytidine
G
Guanosine
T
Thymidine
U
Uridine
Degenerate Bases
Abbreviation
W
S
M
K
R
Y
B
D
H
V
Base
A, T
C, G
A, C
G, T
A, G
C, T
C, G, T
A, G, T
A, C, T
A, C, G
5
Genes and ORFs
• Gene
• A DNA segment that encodes a specific
protein that contributes to the expression
of a trait
• Open Reading Frame (ORF)
• Section of mRNA without stop codons that
is translated
6
Structure of a Gene
• Regulatory regions: up to 50 kb upstream of +1 site
• Exons: protein coding and untranslated regions (UTR)
• 1 to 178 exons per gene (mean 8.8)
• 8 bp to 17 kb per exon (mean 145 bp)
• Introns: splice acceptor and donor sites, other DNA
• average 1 kb – 50 kb per intron
7
DNA to RNA to Protein
http://www.carolguze.com/images/Human%20Genome/dna-rna-protein.jpg
8
Amino Acids
• mRNA is translated
into protein which
is a series of amino
acids
• Each amino acid is
coded for by a 3
nucleotide codon
• Each amino acid
has a unique
structure and
chemical properties
http://www.carolguze.com/text/102-3biomolecules2.shtml
9
Amino Acid Codon Table
http://upload.wikimedia.org/wikipedia/commons/c/cc/Codontable1.PNG
10
Substitution Matrices
http://swift.cmbi.ru.nl/teach/ALIGN/Align_8.html
11
Protein structure
• Properties of amino
acids determine the
structure of the
protein
• Structure is important
for protein function
• Mutations that alter
structure can
destabilize/inactivate
the protein
http://www.accessexcellence.org/RC/VL/
12
GG/images/protein.gif
What is a Genome?
• The DNA content of an organism. Contains all
the biological information needed to construct
and maintain an organism
• In eukaryotic organisms, it is measured in
haploid equivalents
• Size is most commonly measured in base pairs
(e.g. Mb)
• Genome sizes vary widely in size and do not
correspond to the complexity of an organism
13
Basic Genome Statistics
•
•
•
•
•
Chromosome number and ploidy
GC content
Genome size
Codon bias
Gene content and order
What is the chromosome number of your favorite
organism and how many genes does it have?
14
Genomics vs Genetics
• Genomics is the study of
• Genetics is the study of
15
Genomics Comprises
• Structural Genomics
The study of genome structure and organization on a large
scale
• Functional Genomics
The study of gene (and protein) function on a large scale
• Translational Genomics
The adaptation of information derived from genome
technologies for organism improvement
What about Comparative Genomics?
16
Structural Genomics
• The study of genome structure and organization on a
large scale
• Tools of Structural Genomics
1.
2.
3.
17
Functional Genomics
• The study of gene function and expression on
a large scale
• Tools of Functional Genomics
• EST libraries (cDNAs)
• RNA-Seq technology
• Next Generation Sequencing Technology
• Real time PCR
18
Translational Genomics
•
•
Transferring the knowledge gained from one species
to another or translating basic knowledge to applied
knowledge.
As we identify gene(s) associated with interesting
traits, markers can be identified for marker-assisted
selection.
19
Traditional Breeding
X
Waiting years to
select for trees
Wild species
Cultivar
undesirable fruit
low yield
disease resistance
desirable fruit
high yield
disease susceptible
Successive
Backcrosses
Improved cultivar
20
Molecular Breeding
X
X
Wild species
Cultivar
undesirable fruit
low yield
disease resistance
desirable fruit
high yield
disease susceptible
Select desired progeny
long before any fruit is grown
using molecular markers for the trait
Improved cultivar
21
Assignment – Extra Credit (10 Pts)
• There are many different types of “-omics” that have emerged in
the last few years. Please define each of the types below in one
paragraph (do not copy from a website). Email to Jodi by Friday
Sept 4
•
•
•
•
Transcriptomics is the study of ….
Proteomics is the study of ….
Metabolomics is the study of ….
Phenomics is the study of …..
22
Overview of Bioinformatics
Learning Outcomes
• Understand the broad concept and
approaches used in bioinformatics
23
What is Bioinformatics ?
Bioinformatics Working Definition
• The application of information technology, computer science,
mathematics and statistics to the organization, processing,
storage, analysis, visualization and dissemination of genomic,
genetic and breeding data.
What is the Range of Bioinformatics ?
• Mathematical modeling of biological systems
• Developing algorithms for sequence and network analysis
• Building databases and web tools
24
Bioinformatics Approach
• Mathematical Modeling: Abstraction of biological
systems
- DNA is a “String”
• Developing Algorithms for Sequence Analysis
- Analysis of “Strings”
• Sequence alignment
• Sequence composition
• Building databases and web tools
- Dissemination and data mining of “Strings”
25
Bioinformatics Approach
• Mathematical Modeling: Abstraction of biological systems
- DNA is a “String”
TAAGTTATTATTTAGTTAATACTTTTAACAATATT
ATTAAGGTATTTAAAAAATACTATTATAGTATTTA
ACATAGTTAAATACCTTCCTTAATACTGTTAAATT
ATATTCAATCAATACATATATAATATTATTAAAAT
ACTTGATAAGTATTATTTAGATATTAGACAAATAC
TAATTTTATATTGCTTTAATACTTAATAAATACTA
CTTATGTATTAAGTAAATATTACTGTAATACTAAT
AACAATATTATTACAATATGCTAGAATAATATTGC
TAGTATCAATAATTACTAATATAGTATTAGGAAAA
TACCATAATAATATTTCTACATAATACTAAGTTAA
TACTATGTGTAGAATAATAAATAATCAGATTAAAA
AAATTTTATTTATCTGAAACATATTTAATCAATTG
AACTGATTATTTTCAGCAGTAATAATTACATATGT
ACATAGTACATATGTAAAATATCATTAATTTCTGT
TATATATAATAGTATCTATTTTAGAGAGTATTAAT
TATTACTATAATTAAGCATTTATGCTTAATTATAA
GCTTTTTATGAACAAAATTA
26
Bioinformatics Approach
• Mathematical Modeling: Abstraction of biological
systems
- DNA is a “String”
• Developing Algorithms for Sequence Analysis
- Analysis of “Strings”
• Sequence alignment
• Sequence composition
• Building databases and web tools
- Dissemination and data mining of “Strings”
27
Sequence Alignment
• Pairwise Sequence Comparison is the cornerstone of bioinformatics
•
Infer function (homology)
•
•
Orthologs (occur in separate species, common ancestors)
Parology (Gene duplication independent of speciation)
•
Build Evolutionary Trees
•
Do whole genome comparisons
• Infer structure
A A
A X X
T
T
G
C
A X X
A X X
C
T
G
A X X
G
T
C
T G C A
X
X
X
X
X
X
X
X
X
X
X
X
X
X
T C T G A
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
G C C
X
X X
X X
X
X
X X
28
Assembly
Algorithms: Newbler, Velvet,
Mira, Celera, CAP3,
PHRAP, etc.
e.g. GDR Unigenes
29
Multiple Sequence Alignment
30
Phylogenetic Analysis
Fragaria: Member of
the distantly-related
Rosoideae (x = 7)
Malus and Prunus:
Members of the
Spireaoideae (x = 17)
and (x = 8) respectively
31
Domain Prediction
32
PC 1 ----- -PC 8
Malus
MC 1 -----------------MC17
Fragaria
FC 1 --- --FC
7
Genome Mapping/Comparison
•
The innermost circle
represents the nine
ancestral chromosomes
of Rosaceae.
•
The eight chromosomes
of peach are repeated,
each section showing
the regions that are
orthologous to each
ancestral chromosome.
•
Concentric circle
enables us to identify
the ancestral
relationships and
origins (breakage and
fusion).
Prunus
Rosaceae
N=9
33
Genome Annotation
34
Visualization Tools
Comparative Mapping: CMap
Genome Browsers: GBrowse/JBrowse, 35
etc
Structural Bioinformatics
36
Statistical Analysis of Functional Genomics Data
Arrays
•
What statistical measures
can be used to quantify up
and down regulation of genes
•
Technical and biological error
RNA Seq
37
Bioinformatics Approach
• Mathematical Modeling: Abstraction of biological
systems
- DNA is a “String”
• Developing Algorithms for Sequence Analysis
- Analysis of “Strings”
• Sequence alignment
• Sequence composition
• Building databases and web tools
- Dissemination and data mining of “Strings”
38
Database Resources
39
Database Similarity Searching
Primary databases
and community databases
40
NCBI : Primary Database for Genomics Data
www.ncbi.nih.gov
41
Querying Databases
42
Querying Databases
43