* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction to Next-Generation Sequence analysis
Nucleic acid analogue wikipedia , lookup
Heritability of IQ wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Human genetic variation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Medical genetics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Epigenomics wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Transposable element wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Ridge (biology) wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Oncogenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of human development wikipedia , lookup
DNA sequencing wikipedia , lookup
Gene expression profiling wikipedia , lookup
Non-coding DNA wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Human genome wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Helitron (biology) wikipedia , lookup
Public health genomics wikipedia , lookup
Genome (book) wikipedia , lookup
Genome editing wikipedia , lookup
Designer baby wikipedia , lookup
Human Genome Project wikipedia , lookup
Pathogenomics wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Minimal genome wikipedia , lookup
Genomic library wikipedia , lookup
History of genetic engineering wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Genome evolution wikipedia , lookup
From genetics to the next generation sequencing McGill University and Génome Québec Innovation Centre Mathieu Bourgey, Ph.D Senior Bioinformatician [email protected] June 2nd 2013 Introduction to Genetics Genetics is the Key to Biology • Genetics – The scientific study of heredity – Geneticists study how traits and diseases are passed from one generation to the next – Understanding what genes are, how they are passed from one generation to the next, and how they work is essential to understanding life What Are Genes and How Do They Work? • Gene – The fundamental unit of heredity - made of DNA. – DNA is comprised of a polymer (linked string) of chemical subunits called nucleotides. • Genetic code – There are four different nucleotides in DNA • • • • Adenine = A Thymine = T Guanine = G Cytosine = C – Combinations of these four nucleotides define which amino acids will be used to make specific proteins in the cell DNA (DeoxyriboNucleic Acid) • Genes are comprised of sequences of nucleotides contained on a doublestranded helical DNA molecule Three-Dimensional Structure of a Protein http://mach7.bluehill.com/proteinc/graphics/alphpro.jpg Traits • Any observable property of an organism is a trait – Actions of gene products (proteins) produce visible traits such as eye color and hair color How Are Traits Transmitted from Parents to Offspring? • Gregor Mendel’s experiments showed that genes are passed from parents of offspring – Each parent carries two genes that control a trait – Each parent contributes one copy from each pair – Pairs of genes separate from each other during the formation of egg and sperm (meiosis) – When egg and sperm fuse during fertilization, genes from mother and father become a new gene pair Chromosomes • Genes are contained on chromosomes – Chromosomes are found in the nucleus of human cells and other higher organisms – Meiosis separates chromosomes pairs during formation of egg and sperm Image: dream designs / FreeDigitalPhotos.net How Do Scientists Study Genes? (1) • Many different model organisms have been used ranging from bacteria to plants to insects to humans. • Transmission genetics – Study inheritance patterns and how traits are passed from generation to generation • Pedigree analysis – Construction of family trees used to follow transmission of genetic traits in families (inheritance) Pedigree Analysis • A pedigree represents the inheritance of a trait through several generations of a family. How Do Scientists Study Genes? (2) • Cytogenetics – Study of the organization and arrangement of genes on a chromosome – Study of chromosome number and structure • Karyotype – A complete set of chromosomes from a cell that has been photographed during cell division and arranged by size and shape in a standard order Karyotype • A karyotype arranges the chromosomes in a standard format so they can be evaluated for abnormalities How Do Scientists Study Genes? (3) • Molecular genetics – The study of genetic events at the molecular level – Identification, isolation, and analysis of specific genes • Population genetics – The study of inherited variation in populations of individuals – Forces, such as environment, that result in changing gene frequencies over generations Genetics in Basic and Applied Research • Recombinant DNA technology – Techniques whereby DNA fragments are linked to self-replicating vectors, which are replicated in a host cell, often bacteria – Genetically modified organisms – Carry and express genes from another species • Clone – Genetically identical molecules, cells, or organisms, all derived from a single source or parent – Gene therapy – Normal genes are transplanted into humans with defective copies to treat genetic diseases Applied Biotechnologies • Medicine – Vaccines – Customized proteins for treating disease • Agriculture – Increased crop yields – Lower fat content – Disease-resistant crops Genetic Testing • Genes associated with hundreds of genetic diseases have been cloned and are used to develop genetic tests – Cystic fibrosis – Sickle cell anemia – Muscular dystrophy – Phenylketonuria (PKU) From Genetics to Genomics Origin of terms Genomes and Genomics • The term genome was used by German botanist Hans Winker in 1920 • Collection of genes in haploid set of chromosomes • Now it encompasses all DNA in a cell • In 1986 mouse geneticist Thomas Roderick used Genomics for “mapping, sequencing and characterizing genomes” Why should we study genomes? Each and everyone is a unique creation! Life’s little book of instructions DNA blue print of life! Human body has 1013 cells and each cell has 6 billion base pairs (A, C, G, T) • A hidden language/code determines which proteins should be made and when • This language is common to all organisms • • • • Genome sequence can tell us… Everything about the organism's life Its developmental program Disease resistance or susceptibility How do we struggle, survive and die? Where are we going and where we came from? • How similar are we to apes, trees, and yeast? • • • • • Genomics is the study of all genes present in an organism Science of Genomics? • A marriage of molecular biology, robotics, and computing • Tools and techniques of recombinant DNA technology – e.g., DNA sequencing, making libraries and PCRs • High-throughput technology – e.g., robotics for sequencing • Computers are essential for processing and analyzing the large quantities of data generated Genomics relies on high-throughput technologies • Automated sequencers • Fluorescent dyes • Robotics – Microarray spotters – Colony pickers • High-throughput genetics Technology revolution Sequencing genomes in Months and Years Projects cost: Billions $ Sequencing genomes in HOURS/Minutes !! Thousands $ Bioinformatics: computational analysis of genomics data • Uses computational approaches to solve genomics problems – Sequence analysis – Gene prediction – Modeling of biological processes and network Introduction to the next generation sequencing Four Major Players Roche: 454 Life technology: SOLiD / ion torrent Illumina: Genome Analyzer / Hiseq / Miseq Pacific Bioscience: PacBio Technology comparison instrument Method Pacbio Ion Torrent 454 Single-molecule Ion Pyrosequencing in real-time semiconductor Illumina SOLiD synthesis Ligation Read length 3kb average 200 bp 700 bp 50 to 250 bp 50+35 or 50+50 bp Error type indel indel indel substitution A-T bias single-Pass Error rate % 13 ~1 ~0.1 ~0.1 ~0.1 Reads per run 35000–75000 up to 4M 1M up to 3.2G 1.2 to 1.4G Time per run 30 minutes to 2 hours 2 hours 24 hours 1 to 10 days, 1 to 2 weeks Cost per 1 million bases (in US$) $2 $1 $10 $0.05 to $0.15 $0.13 Advantages Longest read length. Fast. Less expensive high sequence Long read size. equipment. yield, cost, Fast. Fast. accuracy Low cost per base. Low yield at Slower than Runs are high accuracy. Equipment can other methods, Homopolymer expensive. Disadvantages Equipment can be very read length, errors. Homopolymer be very expensive. longevity of the errors. expensive. plateform Applications Equipment Genome Quebec number 454 3 (1) Small de novo genome sequencing Amplicon sequencing Metagenomics Ion Torrent 1 Small de novo genome sequencing Amplicon sequencing Metagenomics SOLiD 0 Transcriptome sequencing (RNA-Seq) Whole Exome Sequencing Whole Genome Sequencing Illumina MiSeq 1 Small de novo genome sequencing Amplicon sequencing Metagenomics Validation Illumina HiSeq 2000/2500 12 Transcriptome sequencing (RNA-Seq) Whole Exome Sequencing Whole Genome Sequencing Pacific Biosciences 1 Small genomes, Long haplotype sequencing, Epigenomics Current Applications Different type of sequencing libraries From Glenn TC, Mol Ecol Resour. 2011 adatped for 2013 What the NGS problem is about ? • Strings of 100 to ≈1kb letters • Puzzle of 3,000,000,000 letters • Usually have 120,000,000,000 letters you need to fit • Many pieces don’t fit : – sequencing error/SNP/Structural variant • Many pieces fit in many places: – Low complexity region/microsatellite/repeat Basecalling • How do we translate the machine readouts to base calls? • How do we estimate and represent sequencing errors? From MICHAEL STRÖMBERG 34 Trimming based on qualities Will generate input sequence data of various size !! low qualtity bases can bias subsequent anlaysis (i.e, SNP and SV calling, …) 35 Assembly vs mapping contig1 contig2 assembly all vs all reads mapping Reference all vs reference Assembly vs mapping • Mapping: – useful for interrogating the “known” genome • Assembly: – Essential if no genome sequence – unbiased ascertainment of variation in known genome SNP Discovery: Goal sequencing errors SNP An accurate SNP dicovery is closely linked with a good base quality and a suffisent depth of coverage Mopdified from Bionformatics.ca Structural variation • Indel: – Short insertion or deletion events < 50 bp • Structural variations: – Large insertion – Large deletion – TE insertion – Inversion – Interspersed duplication – Tandem duplication Strucutral variant detection From Alkan et al. 2011 Conclusion • NGS offers a variety of technologies and methods • NGS is still an open fields where many area are under constructs • NGS analyses requires both mathematics and informatics skills • The major challenge is actually link to the compute and storage capacities Lincoln Stein (http://goo.gl/TD4tE) Acknowledgment • Guillaume Bourque • Louis Létourneau " The $1,000 genome, the $100,000 analysis?" Elaine R. Mardis