Download Click Here

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene therapy of the human retina wikipedia , lookup

Genomic imprinting wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Oncogenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Point mutation wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

NEDD9 wikipedia , lookup

Gene therapy wikipedia , lookup

Human genetic variation wikipedia , lookup

NUMT wikipedia , lookup

Genetic engineering wikipedia , lookup

Copy-number variation wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene desert wikipedia , lookup

Transposable element wikipedia , lookup

Gene nomenclature wikipedia , lookup

Metagenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Gene wikipedia , lookup

Non-coding DNA wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Microevolution wikipedia , lookup

History of genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Whole genome sequencing wikipedia , lookup

ENCODE wikipedia , lookup

Genomic library wikipedia , lookup

Pathogenomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Minimal genome wikipedia , lookup

Designer baby wikipedia , lookup

Genome (book) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Human genome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome editing wikipedia , lookup

Genomics wikipedia , lookup

Human Genome Project wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
ENSEMBL ANNOTATION OF HUMAN GRCh37. Better algorithms for a better human.
J. Fernandez-Banet1, B. Aken1, S. Fairley1, M. Ruffier1, M. Schuster2, S. Searle1, A. Tang1, J. Vogel1, S. White1,
A Zadissa1, T. Hubbard1.
Wellcome Trust Sanger Institute1 and European Bioinformatics Institute2, Wellcome Trust
Genome Campus, Cambridge, CB10 1SA, UK.
In February 2009 the Genome Reference Consortium released a new human genome
assembly, GRCh37. This new assembly improved the general quality of the whole genome
sequence. In addition it also includes alternative assemblies for a number of haplotypic
regions.
EnsEMBL aims to produce a set of annotation rapidly, whilst at the same time introducing
new algorithms that improve its quality.
Traditionally Ensembl has only used protein to genome alignment to build CDS structures
with UTR added from cDNA alignments. Here we present how combining the models
obtained from protein alignments with those obtained from cDNAs using exonerate's
cdna2genome model has helped us produce a more refined gene set which exactly matches
a higher percentage of the protein sets distributed by RefSeq and SwissProt databases.
The introduction of a higher number of haplotypes for GRCh37 added an extra level of
complexity to the gene annotation process as only the best alignment for each individual
sequence gets selected for gene annotation. We have developed a method to project the
annotation between the reference chromosome and the haplotypes described for the same
region improving the number of models build in non reference sequences, and making it
possible to link alternative alleles of genes.
Custom methods have been used to annotate immunoglobulins and selenocysteine
containing genes.
The CCDS gene set and manual annotation from Havana have also been incorporated in the
annotation set.
The quality checks performed in EnsEMBL human annotation for GRCh37 show it as our
highest quality annotation on a human genome to date.