Download Transcriptome Profiling in Human Congenital Heart Disease

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genomic imprinting wikipedia , lookup

Copy-number variation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Non-coding DNA wikipedia , lookup

Frameshift mutation wikipedia , lookup

DNA sequencing wikipedia , lookup

Gene wikipedia , lookup

Point mutation wikipedia , lookup

Gene expression programming wikipedia , lookup

Minimal genome wikipedia , lookup

Oncogenomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Human genome wikipedia , lookup

Public health genomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genome (book) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Genomic library wikipedia , lookup

Genome editing wikipedia , lookup

Human Genome Project wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Genomics wikipedia , lookup

Metagenomics wikipedia , lookup

Exome sequencing wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Transcriptome Profiling
of Human Cardiac Tissues
in Hypoplastic Left Heart
Syndrome
Karl D. Stamm, MS
Donna K. Mahnke, MS; Mary A. Goetsch, MS;
D. Woodrow Benson, MD, PhD; Xing Li, PhD;
Aoy Tomita-Mitchell, PhD; Timothy J. Nelson, MD, PhD;
James S. Tweddell, MD; Michael E. Mitchell, MD
September 2013 Research Update
Overview
• Medical Research
• Trouble with humans
• Rare diseases are common in a large enough population
• Next-Generation Sequencing Tech
• Illumina HiSeq methodology
• Differential expression
• Further Mining
• Principle components analyses
• Gene profiles and the self-organizing-map
Trouble with Humans
• Small sample sizes
• Low statistical power
• High interpersonal variability
• Ethnic backgrounds imply metabolic differences
• Phenocopy
• Multiple distinct diseases showing identical presentation
• Confounds clustering or association studies
• Ruins Case/Control study power
• PHI – Private/Protected Health Information
• Data security is paramount
• Cross-disciplinary collaborations are limited
• DNA is theoretically but not practically identifiable
Congenital Heart Defect
Incidence
• Down Syndrome 1:700 live births
• 50-60% have some structural heart defect
• 22qD Syndrome 1:4000 live births
• 75-90% have some structural heart defect
• ‘Healthy’
99:100 live births
• 0.8% have some structural heart defect
Proportion
Explained:
C.H.D. in particular
Hypoplastic Left Heart Syndrome
1 in 40 CHD cases are HLHS
2.5 : 10000 of all births
• Complex developmental
disorder
• 100% fatal before the
invention of the
Norwood Procedure
1981
• No multigenerational
pedigrees
• Spontaneous mutation:
immune to detection by
genetic linkage
All sequencing costs for this study provided by
Generate Reads – Illumina Tech
10 to 500 million short reads are generated in pairs, 2x50 to 2x100 bp each.
http://seqanswers.com/forums/showthread.php?t=21
Align Reads to Reference
•
•
•
•
Which one?
NCBI #37.3 has 3.1 billion bases across 190 contiguous scaffolds
UCSC hg19 has 3.2 billion bases across 163 contiguous scaffolds
Haploid reference contains disease alleles and chimeric sequence
like an A+B+O blood type.
Image of patches modifying the CHR17 reference from 2011 according to Ensembl
http://www.ensembl.info/blog/2011/05/20/accessing-non-reference-sequences-in-human/
Millions of Variants
• The 1000 Genomes project found 38 million SNPs, 1.4 million
short insertions or deletions, and more than 14 thousand
larger deletions
• The NHLBI Exome Sequencing Project targeted 22MBases
across 2,440 individuals and found 563,700 variants, 82% of
which were novel. They averaged 200 novel, coding mutations
per person.
• We find about 150-300 thousand SNVs in an exome, 10% of
which are nonsynonymous
• SAMTOOLS is the software of choice for variant calling relative
to your reference genome.
• CCG/Proline -> CTG/Leucine
• HOPX is a gene known to regulate heart development!
• Very common mutation
RNA-Seq vs. Whole Genome
1.
2.
3.
4.
Extract and purify
mRNA by
polyadenylation
Convert spliced
mRNA to DNA
fragments
Run standard
genome
sequencing on the
product
Result: Expression
level dependent
sequence coverage
Image found at
http://www.pacificu.edu/optometry/ce/courses/20591/armdpg3.cfm
RNA-Seq Reconstructs Transcripts
From the CuffLinks paper, Trapnell et al.
http://www.nature.com/nbt/journal/v28
/n5/abs/nbt.1621.html
Nature Biotechnology Volume: 28, Pages:
511–515 Year published: (2010)
IGV – aligned reads viewer
CoverageBED
Simple arbitrary feature read depth counting.
-Count by gene, exon, whatever
BEDTOOLS : a flexible suite of utilities for comparing genomic features.
http://code.google.com/p/bedtools/
Example of bad alignment
Variance and mean linked by local regression - for robust parameter estimation.
• Negative Binomial
• Models count as ‘binomial successes until a set number of failures’ which
better fits the RNA-Seq fragment generation (limited reagent)
• Allows/captures the ‘overdispersion’ seen in RNA-Seq experiments.
Scale the totals for compatible
means
Mean-Variance Connection
Detection in Low Values
Per-gene mean by difference ratio
DESeq
• Starting from 18,000 Rsids minus 1200 NA
• 1000 entries p<0.05
Theme
•
•
•
•
Big lists
Noisy data
Complex correlation
Heterogeneous background
Precious Tissue Samples
• Collecting tissue during surgery
is an extra burden placed on
overloaded surgical teams.
• Samples must be processed
carefully to avoid degradation of
sensitive molecules.
• Many steps and costs prior to
gene sequencing.
• Collaborators have provided 35
patients’ atrial septal tissues.
• Still no ethical source of healthy
control.
• Hope to see separation between red/notred or solid/notsolid points
• Lack of discrimination in major variation dimensions
• Implying uncontrolled heterogeneity dominates
Therefore, more difference person to person than between subtypes
Top25 Consistent Genes
• Anyone know what it means when Adducin2 and HomeoboxA4 are
overexpressed? Is it significant that a dehydrogenase is under-expressed?
Group Profiles
at Selected Dimensions
Self-Organizing Map
•
•
•
•
Kohonen 1990
Halfway between neural networks and k-means (horrible oversimplification)
Enforced grid layout and local neighborhood similarity
Data points (here 25-dimensional vectors) lay out in natural organization
Stochastic - Iteration
Pairwise Similarity
• Co-clustering frequency determines sample similarity
• Sub-clusters are identified organically
Results
• Lists of genes differential across conditions
• Many conditions, uncertain homogeneity
• List cutoff subjective
• No healthy control group
• We can mine these lists for pathways or biological processes
• Resulting in more lists of more complex results
Transcriptome Project
Future Work
• A few more samples are coming… Can we build a classifier?
• Predict non-measured variables? Signatures of immune
response point towards treatment targets.
• Predict compensatory effects? Samples are taken just days
after birth, but 8 months after the heart started beating.
• How else we could look at this rich, unique dataset?
Thanks for listening