* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download No Slide Title
Gene expression programming wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Essential gene wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
Point mutation wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Human genetic variation wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Metagenomics wikipedia , lookup
Transposable element wikipedia , lookup
Oncogenomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Ridge (biology) wikipedia , lookup
Microevolution wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Public health genomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Non-coding DNA wikipedia , lookup
History of genetic engineering wikipedia , lookup
Pathogenomics wikipedia , lookup
Genome (book) wikipedia , lookup
Genomic library wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Helitron (biology) wikipedia , lookup
Human genome wikipedia , lookup
Genome editing wikipedia , lookup
Human Genome Project wikipedia , lookup
The Human Genome Project • Main reference: Nature (2001) 409, 860-921 • http://www.abdn.ac.uk/~gen155/lectures/hgpcore.ppt • http://www.nature.com/ng/web_specials/ • Whole issue also available from Nature Genome Gateway www.nature.com/genomics/human/ • Describes the publicly funded project; Celera’s private HGP published in Science Main points • • • • • • • Basic genome statistics Genome browsers e.g. UCSC, Ensembl Genomic “landscape” Repeated DNA as a “fossil record” Number of genes Polymorphism Applications The Strategy • The genome sequence was a multinational collaboration involving 100s of scientists, millions of dollars, many countries • The strategy was “top-down” using methods developed on small genomes (e.g. yeast) • Figure 2 in the Nature paper Genome statistics • Total size = 3290 Mb • 212 Mb of heterochromatin • Chromosomes range from 279 Mb (#1) to 45 Mb (#21) (fig 9, table 8 in paper) • Total “raw” sequence 23,000 Mb • Number of genes = about 31,000 • About 30% of the genome is transcribed • About 1.5% of the genome is protein coding Repeat DNA “fossils” • Genomes are full of repeated DNA sequences of various kinds (table 11/12) • Each type of repeat has a single origin and has replicated many times within the genome, transposing to new sites and accumulating mutations • By comparing copies of the repeat to see how much they have diverged, can get an idea of how old repeat is (fig 18) Humans versus worms and flies • Humans have only about twice as many genes as worms or flies (table 23) • But human genes are subject to more alternative splicing (60% vs 22%; average 3 different transcripts per gene) • So humans probably have about 5 times as many proteins as worms or flies • Complexity is not proportional to numbers of genes or proteins, but to the number of interactions they can have Index of human genes and proteins • 3 basic methods to predict genes from the genomic DNA: Comparison with ESTs, mRNAs Homology with other known genes/proteins Purely computational methods based on Hidden Markov Models (HMMs) • Started with predictions by Ensembl, combined with other information….. The Human Proteome • Key database is InterPro, which combines information on all known protein domains • Only 94 of the 1262 InterPro types (7%) are vertebrate-specific - so most domains are older than common ancestor of all animals - new ones are not “invented” very often • Many of these are concerned with defence/immunity and the nervous system • Most novelty is generated by new protein “architectures”, combining old domains in new ways (fig 42/45) Genome History • Mouse and human diverged about 100Mya, so there is 200My of evolution between them • Chromosome translocations are involved in the formation of new species • By comparing locations in the genome of homologous genes, can define regions of synteny (fig 46) • Breakage seems to occur randomly, but tends to be in gene-poor regions • No convincing evidence for whole-genome duplications Polymorphism • More than a million SNPs (single nucleotide polymorphisms were found • Average 1 SNP per 1.9kb or 15 SNPs per gene • Combinations of closely linked SNP alleles form haplotypes • Not all possible haplotypes are found in population - e.g about 4-5 per gene (theoretically could have 215 = about 32000) • HapMap – the haplotype mapping project • A paper (Trends in Genetics) on the subject of haplotype blocks Applications in medicine • Having the genome sequence, and databases of genes, makes it much easier to find disease genes by positional cloning (e.g. BRCA2 for breast cancer) • Sequence reveals new drug targets: e.g. a new type of serotonin receptor, predicted from sequence, shown to be a candidate for treating mood disorders and schizophrenia Latest - the Y chromosome • Nature paper