Download Genome variation informatics: SNP discovery, demographic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Artificial gene synthesis wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genome evolution wikipedia , lookup

Community fingerprinting wikipedia , lookup

RNA-Seq wikipedia , lookup

Personalized medicine wikipedia , lookup

Molecular ecology wikipedia , lookup

Molecular evolution wikipedia , lookup

Transcript
Computational Biology
and Genomics at Boston
College Biology
Gabor T. Marth
Department of Biology, Boston College
[email protected]
http://clavius.bc.edu/~marthlab/MarthLab
Computational research labs
Prof. Peter Clote
RNA secondary structure and energy landscape
Protein motif recognition
Prof. Jeffrey Cheung
Human mutation landscape
Regulatory networks
Prof. Gabor Marth
Genetic polymorphism discovery
Population Genetics
Medical Genetics
Resources
• CLAVIUS – a multi-CPU UNIX
computer cluster
• UNIX development servers
• A teaching laboratory equipped with PC laptop
computers running LINUX over VMWARE
• A professional new server room under construction
The CompBio teaching program
• Currently part of the Biology graduate program (PhD only)
• We have 2 Bioinformatics graduate students with a
larger class expected for Fall 2006
• Curriculum combines Biology, Computer Science, Math
and Statistics courses
• We are working towards an inter-departmental
Bioinformatics / Computational Biology PhD program
The Computational Genetics Lab
http://clavius.bc.edu/~marthlab/MarthLab
Sequence variations (polymorphisms)
The Human Genome Project has
determined a reference sequence of
the human genome
However, every individual is
unique, and is different
from others at millions of
nucleotide locations
sequence polymorphisms
Why are sequence variations important?
source of phenotypic
difference
cause inherited
diseases
allow tracking ancestral
human history
1. Polymorphism discovery tools
Polymorphism discovery in clonal sequences
P( SNP ) 

all var iable
P( S N | RN )
P( S1 | R1 )
 ... 
 PPr ior ( S1 ,..., S N )
PPr ior ( S1 )
PPr ior ( S N )
P
( SiN | R1 )
P
(
S
|
R
)
i
1
S
1
 ... 
 PPr ior ( Si1 ,..., SiN )
 ... 
PPr ior ( SiN )
S i1 [ A ,C ,G ,T ] S iN [ A ,C ,G ,T ] PPr ior ( S i1 )
Homozygous C
Heterozygous C/T
Homozygous T
Marth et al.
Nature Genetics 1999
Automated detection of somatic
mutations in diploid individual samples
2. Mining genetic variation data
Cataloguing all naturally occurring
normal sequence polymorphisms
Marth et al.
Nature Genetics 2001
Genetic and epigenetic changes in cancer
nucleotide changes, short
insertions / deletions
copy number changes,
chromosomal
rearrangements
DNA methilation,
histone modification
3. Demographic inference
Data – statistical distributions
1. marker density (MD): distribution of
number of SNPs in pairs of sequences
0.3
0.2
Clone 1
Clone 2
# SNPs
AL00675
AL00982
8
0.1
0
AS81034
AK43001
0
CB00341
AL43234
2
0
1
2
3
4
5
6
7
8
9
10
2. allele frequency spectrum (AFS):
distribution of SNPs according to
allele frequency in a set of samples
0.1
0.05
0
1
2
“rare”
3
4
5
6
7
8
9
10
“common”
SNP
Minor allele
Allele count
A/G
A
1
C/T
T
9
A/G
G
3
Models – mathematical and simulation
stationary
past
collapse
expansion
bottleneck
history
present
MD
(simulation)
0.3
0.3
0.3
0.3
0.2
0.2
0.2
0.2
0.1
0.1
0.1
0.1
0
0
0
AFS
(direct form)
1
2
3
4
5
6
7
8
9
10
0
0
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
0
10
0.1
0.1
0.1
0.1
0.05
0.05
0.05
0.05
0
0
1
2
3
4
5
6
7
8
9
10
Marth et al.
PNAS 2003
0
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
9
10
0
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
Conclusions based on model fitting
European data
African data
bottleneck
modest but
uninterrupted
expansion
Marth et al.
Genetics 2004
4. Medical Genetics
The polymorphism
structure of
individuals follow
strong patterns
http://pga.gs.washington.edu/
3. An international project is under way to
map out human polymorphism structure…
However, the variation
structure observed in the
reference DNA samples…
… often does not match the
structure in another set of
samples such as those used in
a clinical case-control
association study to find
disease genes and diseasecausing genetic variants
… we build computational tools to test sampleto-sample variability for clinical studies
Instead of genotyping additional
sets of (clinical) samples with
costly experimentation, and
comparing the variation structure
of these consecutive sets directly…
… we generate additional samples
with computational means, based
on our Population Genetic models
of demographic history. We then
use these samples to test the
efficacy of gene-mapping
approaches for clinical research.
5. We develop methods to connect genotype
and clinical outcome in simple gene systems
genetic marker (haplotype)
in genome regions of drug
metabolizing enzyme
(DME) genes
computational prediction
based on haplotype
structure
functional allele (known
metabolic polymorphism)
clinical endpoint
(adverse drug reaction)
molecular phenotype (drug
concentration measured in
blood plasma)