* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Human Genome Structure and Organization
X-inactivation wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Oncogenomics wikipedia , lookup
Medical genetics wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Copy-number variation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene therapy wikipedia , lookup
Behavioural genetics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Population genetics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene desert wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Transposable element wikipedia , lookup
Genomic library wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genomic imprinting wikipedia , lookup
Pathogenomics wikipedia , lookup
Genetic engineering wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene expression programming wikipedia , lookup
Helitron (biology) wikipedia , lookup
Human genetic variation wikipedia , lookup
Quantitative trait locus wikipedia , lookup
History of genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Human genome wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Minimal genome wikipedia , lookup
Human Genome Project wikipedia , lookup
Genome editing wikipedia , lookup
Public health genomics wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G. Genetic Variation Phenotype Expression of the genotype (modified by the environment). The structural or functional nature of an individual. Includes: appearance, physical features, organ structure biochemical, physiologic nature Genotype Genetic status, the alleles an individual carries. Learning Objectives Recap and Update Public and Private Human Genome Project Status Provide Reminders of Necessary Background for Genetic Disease Association and Linkage Studies Definitions • Penetrance - The probability that an individual who is ‘atrisk’ for the disorder (ie- carries the gene) develops (expresses) the condition. May be age dependent. • Expression - The characteristics of a trait or disease that are outwardly expressed. Eg-myotonic dystrophy: myotonia, cataracts, narcolepsy, frontal balding, infertility. • Ascertainment – The method used in gathering genetic data. Study conclusions differ depending on how affected individuals entered the study. • Phenocopy – Individuals whose phenotype, under the influence of non-genetic agents, has become like the one normally caused by a specific genotype in the absence of nongenetic agents. • Pleiotropy - The quality of an allele to produce more than one effect; ie- to manifest its expression in the structure and/or function of more than one organ system or tissue • Recurrence Risk – Likelihood that a relative of a proband for a rare disease will have the same disease. Penetrance and Expressivity • Penetrance: Proportion that expresses a trait – Complete: P=1.0 or 100% – Incomplete (“reduced”): P<1.0 or < 100% • Expressivity: Severity of the phenotype – Expressivity may vary • Between families (interfamilial) or • Within families (intrafamilial) • TRY NOT TO CONFUSE “VARIABLE EXPRESSIVITY” WITH “INCOMPLETE PENETRANCE” Chromosomes, Genes and Proteins Genes are on Chromosomes Genes may encode proteins or RNA Non-coding RNA ‘genes’ • tRNAs (497 were counted, 821 when count genes and pseudogenes) – tRNAs found are consistent with Wobble – Codon bias only roughly correlated with tRNA distribution • • • • • • • rRNAs small nucleolar RNAs (snoRNAs) snRNAs (spliceosome constituents) 7SL RNA telomerase RNA Xist transcript Vault RNA tRNAs Some chromosomes are richer in genes than others 3500000 3000000 Number of Nucleotides in Exons 2500000 2000000 1500000 1000000 500000 0 1 3 5 7 9 11 13 15 17 19 21 Chromosomes X HOXA, HOXB, HOXC and HOXD are in regions with a particularly low density of repeats: This is believed to result from the presence of Cis-acting elements in this vicinity. Proteins demonstrate patterns and similarity of function Functionally and Structurally similar proteins are organized into families e.g.- E.C., SWISS-PROT, TrEMBL, In silico approaches to characterize genes include: • PFAM, searchable via HMMER • Other in silico collections include: – – – – PRINTS PROSITE SMART BLOCKS • Creation of an Integrated Protein Index (IPI) How many genes are there? Estimates from the Public Program – – – – – – – – – RefSeq Exons Introns Average Sizes Coding Sequences (CDS) Alternative splice products (about 3%) Creation of an Integrated Gene Index (IGI) Genscan to Ensembl to Pfam via GeneWise (31,778) Could be as low as 24,500 using overprediction corrections. Estimates from Celera 25,086 in Assembly 3 • 25,086 in Assembly 3 Pre-existing estimates • W. Gilbert’s back of the envelope calculation • Reassociation Kinetics • Estimates from Double Twist using Promoter Inspector plus • Unpublished estimates from Human Genome Sciences Size of Genes: • • • • • Largest: Dystrophin 2.7 Mb Titin 80,780 bp coding 178 exons largest single exon 17,106 GENE HOMOLOGS, ORTHOLOGS, PARALOGS • • • • • • Vaculolar sorting machinery in yeast ABC gene superfamily Ig gene superfamily FGF superfamily Intermediate filament superfamily PROTEIN FAMILY EXPANSION APPEARS TO BE A PRIMARY EVOUTIONARY MECHANISM The proteome • • • • • Functional categories PRINTS Prosite Pfam Interpro (http://www.ebi.ac.uk/interpro/) GENE ONTOLOGY • Standard Vocabulary • Hierarchy of terms (Directed ACYCLIC Graph) • Ashburner Nature Genetics 25:25-29 (2000) • ‘Bushy’ model Horizontal Transfer controversy • One of the major conclusions of the Public Genome effort, published in Feb. 15, 2001 Nature was: “Hundreds of human genes appear likely to have resulted from horizontal transfer from bacteria at some point in the vertebrate lineage. Dozens of genes appear to have been derived from transposable elements” • This has now been widely disputed and is believed to result from: – Microbial contaminants in the sequence. – Bacterial gene integration into pre-vertebrates – And • “The more probable explanation for the existence of genes shared by humans and prokaryotes, but missing in nonvertebrates, is a combination of evolutionary rate variation, the small sample of nonvertebrate genomes, and gene loss in the nonvertebrate lineages. “ -Salzberg et. al., Science Splice Pattern, 98% GT-AG Chromatin Structure • Euchromatin • Heterochromatin • Nucleosomes Chromosome Facts • Chromosomes replicate during S phase • Chromosomes recombine during Pachytene • Recombination is an obligate activity • Sex chromosomes recombine with each other Cytogenetics is done by Karyotyping • Chromosomes are chemically frozen in metaphase • Must be carried out on dividing cells • Microfilament inhibitors • Microtubule inhibitors • Membrane lysis • Pronase, trypsin digest • Giemsa stain • G-bands correspond to regions of relatively low GC content http://genome.ucsc.edu/goldenPath/mapPlots/ http://genome.ucsc.edu/goldenPath/hgTracks.html Cell Division: Meiosis – Segregation • Defined: Alleles are paired; gametes receive one of each. • Exceptions: trisomy and uniparental disomy – Independent Assortment • Gene Pairs segregate independently • Exception: linkage Meiosis Creates Gametes And provides a basis for genetic recombination! Genetic Recombination • Crossing Over • Resolution • Recombinant Chromosomes – OBLIGATE ACTIVITY – FEMALE RECOMB. RATES HIGHER THAN MALE – INCREASED RATES AT TELOMERES – PARADOX: SHORT ARMS SHOW MORE THAN LONG ARMS – 1cM is 1 Mb on long arms, but short arms are 2 cM per Mb and the Yp-Xp pseudoautosomal region is 20 cM per Mb. INCREASED RATES AT TELOMERES PARADOX: SHORT ARMS SHOW MORE THAN LONG ARMS Genes • Units of heredity • Encode proteins (and some RNAs) • Human genetics is the study of gene variation in humans • ‘Gene’ as a term is used ambiguously to refer both to the ‘locus’ and the ‘allele’ ie- There is only one locus but two alleles in a given individual. • Sequencing in both genome projects took place upon multiple alleles; this has led to some assembly confusions. • Ultimately want a haploid genome map. The Human Genome Project • International public effort commencing in 1990 to sequence the entire human genome by 2005. • STS approach chosen in 1991 • Private effort launched in 1996 by Celera using ‘Shotgun’ cloning BAC clones, sequenced into BAC end reads, and assembled into ‘contigs’ Markerless ‘contigs’ in the Celera assembly are called ‘Scaffolds’ Markers are BAC ends in the ‘shotgun’ Mate pair reads provided the core of Celera sequence Draft human genome sequences complete by February 2001. • Published simultaneously in Feb. 2001 – Public Sequence in NATURE (409: 745-964) – Celera Sequence in SCIENCE (291: 11451434) Greater than 50% of sequence is repetitive 45% of the human genome is derived from transposable elements • Long Interspersed Elements: LINEs (21% of genome) – LINE1 – Some Still Active, Autonomous, consist of two ORFs (one is a pol). – LINE2 – LINE3 • Short Interspersed Elements: SINEs (13% of genome) – ALU – Some still active, use L1 enzymes to replicate – MIR – Ther2/MIR3 • LTR Retroposons – Consist of gag and pol – Protease, rt, RNAseH, integrase all encoded – Reverse transcription occurs cytoplasmically, using a tRNA to prime replication • DNA Transposons 98.5 % of sequence is non-coding. Approximately 1/3 of the human genome is transcribed (public guess). Allelism • • • • • Alternate forms of a gene e.g.- Sickle Cell, CFTR Recessive disease e.g. Achondroplasia, Tuberous Sclerosis Dominant Disease Heterozygote or Homozygote • 1,2 or 1,1 • homogeneity of alleles at a locus Genetic Markers • • • • • • • RFLPs VNTRs (STRs) Microsatellites STSs SNPs “Tools” used to find disease genes “Flags” with locations throughout the genome Polymorphism Information Content versus Heterozygosity (PIC vs. het) • Determining heterozygosity from SNP rare allele frequency • Information Content in SNPs versus STRs Typology of SNPs • Type I- Coding, non-synonymous, nonconservative • Type II- Coding, non-synonymous, conservative • Type III- Coding, synonymous • Type IV- Non-coding, 5’-UTR • Type V- Non-coding, 3’UTR • Type VI- Other non-coding • Type I and Type II SNPs have lower heterozygosity than other SNPs, presumably as a result of selective pressure. – About 25% of type I and type II SNPs have minor allele frequencies > 15% – About 60% have minor allele frequencies < 5% Mutation • Occurs more often during male meiosis • Occurs more often in ‘long genes’ • More easily detected in Dominant Diseases – Achondroplasia – Duchenne Muscular Dystrophy • May often involve CpG mutating to TpG Autosomal Recessive Inheritance • Two copies of a gene required to be affected • Carriers have one copy of the mutation and are unaffected • 25% of offspring of two carriers will be affected • Males and females affected in equal number • Eg. Sickle Cell, beta-thal., CF X Linked Recessive (Sex Linked) • Females rarely affected • No male to male transmission • Affected males transmit gene to all daughters • Eg- Duchenne Muscular Dystrophy, Hemophilia A Autosomal Dominant Inheritance • • • • Each child at 50% risk Does not skip generations Often, lethal in double dose Large genetic load X-linked Dominant Pedigree • Example is Hypophosphatemic, Vitamin D Resistant Rickets • Distinguished from Autosomal Dominant by: – No male-to-male transmission – All daughters of affected fathers are affected IMPORTANT NOTE: Dominant and Recessive refer to the phenotypic expression of alleles, NOT to intrinsic characteristics of gene loci. Inheritance Pattern Complexities • Pseudodominant Transmission of a Recessive • Pseudorecessive Transmission of a Dominant – Misassigned paternity, causal heterogeneity, incomplete penetrance, germline mosaicisim • Mosaicism • Mitochondrial Inheritance • Penetrance and Expressivity – Semi-dominant, gender- influenced, age-related, transmission-related, imprinting • Uniparental Disomy (UPD) • Environmental effects, phenocopies Preview of linkage analysis • Characterizing Human Genetics: – – – – Long generation time Inability to control matings Inability to control study population Inability to control exposures to environmental conditions – It is possible to define phenotypes well! – Can study genetic structures through family history – Link phenotypes and genetic structures through statistical methods