Download Aspects of Genetic and Genomics in Cancer Research

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cancer epigenetics wikipedia , lookup

Tay–Sachs disease wikipedia , lookup

Gene expression profiling wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Gene therapy wikipedia , lookup

Gene expression programming wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Medical genetics wikipedia , lookup

Neocentromere wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Genomic imprinting wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

X-inactivation wikipedia , lookup

Skewed X-inactivation wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Tag SNP wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

NEDD9 wikipedia , lookup

Population genetics wikipedia , lookup

BRCA mutation wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Designer baby wikipedia , lookup

Genetic drift wikipedia , lookup

RNA-Seq wikipedia , lookup

Microevolution wikipedia , lookup

Oncogenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Genome (book) wikipedia , lookup

Transcript
Aspects of Genetics and
Genomics in Cancer Research
Li Hsu
Biostatistics and Biomathematics Program
Fred Hutchinson Cancer Research Center
Outline
•
•
•
Cancer facts
Linkage analysis of family studies
Genome-wide association studies
Etiology of Cancer
• The etiology of cancer is multifactorial, with
genetic, environmental, medical, and lifestyle
factors interacting to produce a given
malignancy.
• The breakthroughs in high throughput
genotyping technologies have made it possible
for systematically identifying genes that are
responsible for disease occurrence.
BRCA1 and Breast Cancer
• BRCA1 (breast cancer 1) is a human gene that
belongs to a class of genes known as tumor
suppressors, which maintains genomic integrity
to prevent uncontrolled proliferation. Variations
in the gene have been implicated in a number of
hereditary cancers, namely breast, ovarian and
prostate. The BRCA1 gene is located on the
long (q) arm of chromosome 17 at 38Mb.
Probability of developing breast cancer by age (Chen et al.
2009)
carriers
Non-carriers
Probability of Developing Breast Cancer for
BRCA1 carriers
Average Person
BRCA1 Carrier
Age 50
2.1%(1.7%-2.7%)
18.8%(8.2%-2.3%)
Age 60
4.1%(3.4-5.0%)
31.3%(14.3%-61.2%)
Age 70
7.2%(6.0%-9.0%)
45.4%(22.7%-74.3%)
Age 80
10.2%(8.4%-12.5%)
54.9%(30.4%-81.4%)
• How was BRCA1 found?
Linkage Analysis
3/4
1/2
1/3
3/4
3/2
2/4
1/4
1/4
1/2
3/2
Assume disease gene (D) is rare with full
penetrance
3/4
1/2
D/d
d/d
1/3
2/4
d/D
3/4
3/2
D/d
D/d
d/d
1/4
d/d
1/4
D/d
1/2
3/2
d/d
D/d
Linkage Analysis (continued)
• Disease allele (D) originally in
chromosome with allele 3
• How often does D co-segregate with
allele 3 (non-recombinant)?
Assume disease gene (D) is rare with full
penetrance
3/4
1/2
D/d
d/d
1/3
2/4
d/D
3/4
3/2
D/d
D/d
d/d
1/4
d/d
1/4
D/d
1/2
3/2
d/d
D/d
Linkage Analysis (continued)
• Disease allele (D) originally in
chromosome with allele 3
• How often does D co-segregate with
allele 3 (non-recombinant)?
– 5 meiosises
• How often is D separated from allele 3
(recombinant)?
Assume disease gene (D) is rare with full
penetrance
3/4
1/2
D/d
d/d
1/3
2/4
d/D
3/4
3/2
D/d
D/d
d/d
1/4
d/d
1/4
D/d
1/2
3/2
d/d
D/d
Linkage Analysis (continued)
• Disease allele (D) originally in
chromosome with allele 3
• How often does D co-segregate with
allele 3 (non-recombinant)?
– 5 meiosises
• How often is D separated from allele 3
(recombinant)?
– 1 meiosis
Likelihood function
• Set a parameter θ which measures the
distance between allele 3 and D by how
frequently they recombine.
• The likelihood function L(θ) = (1- θ)5 θ
• The maximum likelihood estimate is 1/6
• LOD = log10 L(1/6)/L(1/2)
= 0.63
• LOD for 7 families = 7x0.63 = 4.41
Issues
• Linkage analysis has narrowed down to a region
about 1Mb. However it took another four years
before the BRCA1 gene was mapped.
• Reduced penetrance, phenocopy, and genetic
heterogeneity are among the factors that limit
the success of the linkage analysis.
• Relevance of the findings to the population at
large.
Genome-Wide Association
Studies(GWAS)
• The Human Genome Project began in 1990 and
completed in 2003.
Part of sequence from Chromosome 7
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
AGACGGAGTTTCACTCTTGTTGCCAACCTGGAGTGCAGTGGCGTGATCTCAGCTCACTGCACACTCCGCTTTCC/TGG
TTTCAAGCGATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGACTACAGTCACACACCACCACGCCCGGCTAATTTTTG
TATTTTTAGTAGAGTTGGGGTTTCACCATGTTGGCCAGACTGGTCTCGAACTCCTGACCTTGTGATCCGCCAGCCTCT
GCCTCCCAAAGAGCTGGGATTACAGGCGTGAGCCACCGCGCTCGGCCCTTTGCATCAATTTCTACAGCTTGTTTTCTT
TGCCTGGACTTTACAAGTCTTACCTTGTTCTGCCTTCAGATATTTGTGTGGTCTCATTCTGGTGTGCCAGTAGCTAAAA
ATCCATGATTTGCTCTCATCCCACTCCTGTTGTTCATCTCCTCTTATCTGGGGTCACA/CTATCTCTTCGTGATTGCATTC
TGATCCCCAGTACTTAGCATGTGCGTAACAACTCTGCCTCTGCTTTCCCAGGCTGTTGATGGGGTGCTGTTCATGCCT
CAGAAAAATGCATTGTAAGTTAAATTATTAAAGATTTTAAATATAGGAAAAAAGTAAGCAAACATAAGGAACAAAAAG
GAAAGAACATGTATTCTAATCCATTATTTATTATACAATTAAGAAATTTGGAAACTTTAGATTACACTGCTTTTAGAGAT
GGAGATGTAGTAAGTCTTTTACTCTTTACAAAATACATGTGTTAGCAATTTTGGGAAGAATAGTAACTCACCCGAACA
GTGTAATGTGAATATGTCACTTACTAGAGGAAAGAAGGCACTTGAAAAACATCTCTAAACCGTATAAAAACAATTACA
TCATAATGATGAAAACCCAAGGAATTTTTTTAGAAAACATTACCAGGGCTAATAACAAAGTAGAGCCACATGTCATTT
ATCTTCCCTTTGTGTCTGTGTGAGAATTCTAGAGTTATATTTGTACATAGCATGGAAAAATGAGAGGCTAGTTTATCAA
CTAGTTCATTTTTAAAAGTCTAACACATCCTAGGTATAGGTGAACTGTCCTCCTGCCAATGTATTGCACATTTGTGCCC
AGATCCAGCATAGGGTATGTTTGCCATTTACAAACGTTTATGTCTTAAGAGAGGAAATATGAAGAGCAAAACAGTGCA
TGCTGGAGAGAGAAAGCTGATACAAATATAAATGAAACAATAATTGGAAAAATTGAGAAACTACTCATTTTCTAAATT
ACTCATGTATTTTCCTAGAATTTAAGTCTTTTAATTTTTGATAAATCCCAATGTGAGACAAGATAAGTATTAGTGATGGT
ATGAGTAATTAATATCTGTTATATAATATTCATTTTCATAGTGGAAGAAATAAAATAAAGGTTGTGATGATTGTTGATTA
TTTTTTCTAGAGGGGTTGTCAGGGAAAGAAATTGCTTTTTTTCATTCTCTCTTTCCACTAAGAAAGTTCAACTATTAATT
TAGGCACATACAATAATTACTCCATTCTAAAATGCCAAAAAGGTAATTTAAGAGACTTAAAACTGAAAAGTTTAAGATA
GTCACACTGAACTATATTAAAAAATCCACAGGGTGGTTGGAACTAGGCCTTATATTAAAGAGGCTAAAAATTGCAATA
AGACCACAGGCTTTAAATATGGCTTTAAACTGTGAAAGGTGAAACTAGAATGAATAAAATCCTATAAATTTAAATCAA
AAGAAAGAAACAAACTA/GAAATTAAAGTTAATATACAAGAATATGGTGGCCTGGATCTAGTGAACATATAGTAAAGA
TAAAACAGAATATTTCTGAAAAATCCTGGAAAATCTTTTGGGCTAACCTGAAAACAGTATATTTGAAACTATTTTTAAA
Genome-Wide Association Study
• 550,000 SNPs on an array
• 2000 diseased individuals (colon cancer
cases) and 2000 normal individuals
• Genotype all DNAs for 550,000 SNPs
• That is 2 billion genotyping!
GWAS on Type 2 Diabetes (Steinthorsdottir
et al., 2007, Nature Genetics)
Cases
Controls
AA
809
3049
3858
2426
Aa
509
1917
2426
277
385
aa
81
305
385
5271
6669
1398
5271
6669
Cases
Controls
AA
751
3107
3858
Aa
539
1887
aa
108
1398
•
•
Expected count for cases if AA is not associated with the disease. First,
calculate the frequency of AA genotype in both cases and controls
combined:
freq = 3858/6669 = 57.85%
For 1398 cases, we expect to see 1398*57.85%=809 individuals having
genotype AA.
GWAS on Type 2 Diabetes
• The chi-square statistic is calculated by finding the
difference between each observed and expected for
each cell, squaring them, dividing each by the expected,
and taking the sum of the results.
(757-809)^2/809+(3107-3049)^2/3049+…
• Compare the value to a standard chi-square distribution
with degrees of freedom (# rows-1)*(# col -1) = 2.
• The p-value for this SNP is 6.772e-5.
Issues
• Too many SNPs!
• Identifying gene-gene and geneenvironmental interactions are now
possible.
• Germline mutations account for only a small portion of
cancer cases.
http://envirocancer.cornell.edu/FactSheet/General/fs48.inheritance.cfm
Summary
• The amount of the data that have been
generated increases exponentially in the
last few years.
• This creates a great demand on efficient
and valid computational and statistical
methods and tools for picking the needles
from a haystack.