* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction to your genome
Cancer epigenetics wikipedia , lookup
Medical genetics wikipedia , lookup
Population genetics wikipedia , lookup
Epigenomics wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Primary transcript wikipedia , lookup
Metagenomics wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Frameshift mutation wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Human genetic variation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Transposable element wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genetic engineering wikipedia , lookup
Genome (book) wikipedia , lookup
Pathogenomics wikipedia , lookup
Oncogenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Point mutation wikipedia , lookup
Designer baby wikipedia , lookup
Minimal genome wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Microevolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Non-coding DNA wikipedia , lookup
History of genetic engineering wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Genomic library wikipedia , lookup
Public health genomics wikipedia , lookup
Helitron (biology) wikipedia , lookup
Human genome wikipedia , lookup
Human Genome Project wikipedia , lookup
Introduction to your genome CSE291: Personal Genomics for Bioinformaticians 01/10/17 The personal genomics revolution 23andMe: >1 million customers ($200) Genographic Project: >800,000 customers ($150) Family Tree DNA: >800,000 in database ($99) Genome sequencing is quickly becoming a commodity! The power of commercial genome databases Survey: are you a morning a night Survey: whator color are person? the stripes? Can perform a GWAS on hundreds of thousands of people in a matter of days! Hu et al. Nature Communications 2016 I have a long standing interest in genetics… Age: 20 Age: 1 Extra credit: which one is me? Outline • Why analyze your genome? • Course overview • History of analyzing genomes • Basic biology intro • Basic human genetics intro • Discuss problem set 1 Why analyze your genome? Mutations have implications in human health Example: Cystic Fibrosis - Caused by mutations in the gene CFTR, most common mutation is Δ508. - Results in salty skin, poor growth, accumulation of thick, sticky mucus, frequent chest infections. - Life expectancy: 37 years - ~1 in 25 Europeans is a carrier 01 01 Pre-natal carrier testing of parents can now identify couples at risk 1 1: 25% 0 0: 25% 11 and inform reproductive 0 1: 50% options Our genomes contain a record of human history Recent history Familial relationships Ancient history Populations Human migration, ancient humans Parents, siblings, cousins, etc. https://aliciarmartin.com/research/migration_map_revised-2/ Novembre, et al. 2008 Your genome is uniquely identifying Your genome can help science! Interpreting one genome requires tens of thousands of genom - Daniel MacArthur vs. e.g. latest schizophrenia genome wide association study used >100,000 control genom Course overview Course objectives • Gain basic bioinformatics skills needed to analyze a personal genome using the UNIX command line • Gain the ability to critically read and interpret basic science and translational literature relevant to personal genomics • Demonstrate knowledge and understanding of the social impacts of the personal genomics revolution • Gain skills and experience necessary to carry out original research related to personal genomics Grading • Participation 10% • Attendance 10% • Problem set 1 5% • Problem set 2 10% • Problem set 3 10% • Problem set 4 10% • Problem set 5 10% • Project proposal 5% • Final Project 30% Analyzing your own genome • You are welcome and encouraged to explore your own genome (e.g. from 23andMe) through the problem sets. • If you want to do that, order ASAP, it takes several weeks to get the data back. • Your grade does not depend in any way on whether you analyze your own genome. • You do not need to tell me if you analyze your own genome. • We cannot offer to pay for the test, or provide any counseling A whirlwind history of human genetics Mendel establishes heredity as a principle (~1865) Green peas Yellow peas GG YY F1 Generation 100% Yellow YG YG YG YG F2 Generation 75% Yellow 25% Green YY YG GY Conclusions: 1. Inheritance is determined by “units” (now called genes) 2. An individual inherits one such unit from each parent for each trait 3. A trait my “skip” a generation GG mid-1900s: DNA is the genetic material • Griffith experiment (1928): showed bacteria can transfer genetic information • Avery-MacLeod-McCarty experiment (1944): showed that DNA was key component of Griffith’s experiment • Hershey-Chase experiment (1952): used radioactive labeling to show DNA, not protein, transfers genetic information • DNA structure identified (1953) by Watson, Crick (using data from Rosalind Franklin) First disease gene mapped (1983) George Huntington’s paper (1872) Huntington’s Disease • Progressive neurodegenerative disease • Loss of motor control, jerky movements • Age of onset: typically 30-45 years old • Caused by expansion of a CAG repeat, encoding polyglutamine, in the gene HTT Gusella et al. 1983 The human genome is sequenced (2001) • $3 Billion public project beginning in 1990 • In 1998, Craig Venter started competing private project at Celera • “Draft” published in 2000. We still do not have a complete genome sequence! • >70% from a single male donor from Buffalo, NY (RP11). At least 4 individuals included. Toward the $1000 Genome The personal genomics revolution Hair color Eye color >1 million customers $200 to genotype 1.5 million genomic positions Ancestry Biology Intro Bird’s eye view of the human genome Nucleus Cell Autosomes Sex chromosomes http://missinglink.ucsf.edu/lm/genes_and_genomes/content.html DNA (deoxyribonucleic acid) structure Bases: Base pairing Watson-Crick Cytosine C Guanine G Adenine A Thymine T 3’ 5’ C G A T G C T A Other components: Phosphate Deoxyribose (sugar) 5’ 3’ Forward strand: 5’-TGAC-3’ Reverse strand: 5’-GTCA-3’ (reverse complement) The central dogma DNA GENE DNA Transcription RNA mRNA Translation Protein Protein The genetic code http://www.chemguide.co.uk/organicprops/aminoacids/dna4.html The structure of a gene TF Promoter Exon 1 Exon 2 Exon 3 Intron 1 Intron 2 DNA Transcription ACACUAUCGAUGCAGAUAAAGUUGAGUAGCUGUCUCGGUCGAGCGUACGUAUAAAUCACUAC Splicing 3’ UTR 5’ UTR ACACUAUCGAUGCAGAUAAAUAGCUGUCUCGCGUACGUAUAAATCACUAC RNA mRNA Translation M Q I N S Start codon (AUG=Methionine) C L A Y V * Protein Stop codon (UGA, UAA, UAG) Organization of the human genome ~30,000 protein coding genes in the human genome http://book.bionumbers.org/how-many-genes-are-in-a-genome/ Cell division – mitosis (somatic) DNA replication Mitosis Two diploid cells Cell division – meiosis (germline) DNA replication Homologous recombination Meiosis I Meiosis II Four haploid cells Recombination https://www.reddit.com/r/askscience/comments/3hq4zl/does_crossover_occur_in_all_4_nonsister/ Human genetics intro Mutations – the bread and butter of genetics! SNP Short indel (1-20bp) ACGACTCGAGCG ACGACTCGAGCG ACGACACGAGCG ACGAC-CGAGCG μSNP: 1.20 × 10-8 /loc/gen μINDEL: 0.68 × 10-9 /loc/gen Alu retrotransposition Short tandem repeat CAGCAG---CAGCAGCA Struct. Var /CNV (>20bp) ~75+ ~75+ STR STR 0.05 0.05 0.2 0.2 33 Alu Alu CAGCAGCAGCAGCAGCA Alu 50 50 SNP SNP 75 75 50 50 25 25 00 # de novo/gen 100 100 μSTR: 10-2-10-5 /loc/gen SV SV Indel Indel How do mutations affect proteins? But also… • Regulatory regions • Large structural variations • Alternative splicing • Many others… http://www.nbs.csudh.edu/chemistry/faculty/nsturm/CHEMXL153/DNAMutationRepair.htm Intro to Mendelian genetics Back to Mendel’s peas… x YG YG F2 Generation 75% Yellow 25% Green GY YG YY Y Parent 2 GG G Y YY YG G GY GG Parent 1 Modes of inheritance - dominant aa Aa Example – Marfan Syndrome • Tall and slender build • Long arms, legs, and fingers • Heart murmurs, other cardiovascular defects • Nearsightedness Aa Aa aa aa Caused by loss of function mutations in FBN1 >=1 copies of dominant allele: affected 0 copies of dominant allele: unaffected Unless de novo, at least one parent is affected http://www.mayoclinic.org/diseases-conditions/marfan-syndrome/symptoms-causes/dxc-20195415 Modes of inheritance - recessive Aa Aa AA Aa aA aa Example – Cystic Fibrosis • Caused by mutations in the gene CFTR, most common mutation is Δ508 (in frame deletion). • Results in salty skin, poor growth, accumulation of thick, sticky mucus, frequent chest infections. • Life expectancy: 37 years • ~1 in 25 Europeans is a carrier Caused by loss of function mutations in 2 copies of recessive allele: affected CFTR <=1 copies of recessive allele: unaffected Often, both parents unaffected https://hutchbio.wordpress.com/2012/11/07/cystic-fibrosis/ Modes of inheritance – X linked recessive XX’ XY X’Y XY XX’ Example – Hemophilia A • Blood doesn’t clot properly • Heavy bleeding even from small cuts • Bruise easily • Some female carriers show symptoms XX Caused by loss of function mutations in clotting Factor VIII Need at least one unaffected copy of X to be unaffected X’Y, X’X’ affected (X’X’ lethal for some disorders) Typically affects only males Heterozygous females are called “carriers” http://reference.medscape.com/features/slideshow/hemophilia-a Example recessive trait – red hair https://blog.23andme.com/health-traits/no-im-not-irish/ Example recessive trait – blue eyes IrisPlex: predicts eye color from 6 SNPs All blue eyes have a single common ancestor with a regulatory change in HERC2 Walsh, et al. 2010 Sturm, et al. 2008 Beyond Mendelian – complex traits Example: height Fisher hypothesized that Mendelian traits could explain continuous traits if many genes each contribute additively to a phenotype. Sir Ronald Fisher Example complex trait: schizophrenia Heritability: 80% i.e. 80% of twin pairs concordant for SCZ status Schizophrenia Working Group of the Psychiatric Genomics Consortium Problem set 1 SNP array data • This is the type of data you’ll get from 23andMe and other companies • As opposed to whole genome sequencing, which sequences the entire genome, genotype arrays genotyped a pre-determined set of known polymorphic positions • E.g. 23andMe genotypes ~1.5 million variants BB • Probes for allele “A” and “B” • By comparing intensities, can infer genotype (e.g. AA, AB, BB) AB AA Getting started https://gymreklab.github.io/teaching/personal_genomics/ps1_resources.html Before you go: • Sign up for an XSEDE account • Get started on PS1