* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Praktikum Information Integration - HU
Copy-number variation wikipedia , lookup
Ridge (biology) wikipedia , lookup
Pathogenomics wikipedia , lookup
Minimal genome wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
X-inactivation wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genetic engineering wikipedia , lookup
History of genetic engineering wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Point mutation wikipedia , lookup
Public health genomics wikipedia , lookup
Gene therapy wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Protein moonlighting wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene desert wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genome evolution wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
The Selfish Gene wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Helitron (biology) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression programming wikipedia , lookup
Genome (book) wikipedia , lookup
Microevolution wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene nomenclature wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Information Integration Assignment 1: Database Set-Up Ulf Leser Background • We start with genes, their location, and their function • Types of information – Genes: Have a taxon ID (organism), have an ID, have a preferred name, have multiple other names, have multiple functional annotations, have a connected protein (with a protein_id and a protein_version_id), have a status, are on a chromosome, have a start and end position, and a chromosomal location – Gene function: Are described by a taxonomy of terms which forms a DAG; each term has an ID, a name, a description, and can be annotated to multiple genes – Gene – Function relationship: Has an evidence code Ulf Leser: Information Integration, Praktikum, WS 2008/2009 Task 1: Create the Schema • Create a relational model for the information described on the previous page • Implement this model in the Oracle database – One account per group; access information will be sent out asap • Datatypes: Look at the data – Be conservative – Use VARCHAR2 – Length: Guess a reasonable number which leaves some buffer Ulf Leser: Information Integration, Praktikum, WS 2008/2009 Task 2: Fill the Schema (partly) • From the web page, download the following files – Gene2refseq [Tax_Id, Gene_Id, status, protein_accession, protein_accession_version, Start, End] – gene2go [tax_id, geneID, GOID, Evidence] – gene_info [tax_id, geneID, symbol, Synonyms, chromosome, map location] • Import the given attributes into your schema – This requires some processing – Gene2refseq and gene_info should be merged – gene_info.synonyms should be normalized • The structure of the GO will be imported later Ulf Leser: Information Integration, Praktikum, WS 2008/2009 Task 3: Some Queries • Formulate and answer the following queries – – – – How many How many How many How many synonym) – How many genes does your gene table have? relationships between genes and a GO term are there? distinct GO terms are annotated to at least one gene? gene names are present (either preferred name or synonyms are assigned to more than one gene? Ulf Leser: Information Integration, Praktikum, WS 2008/2009 Deliverables • By Monday 3.11., or Wednesday 5.11., 23:59 o’clock • Send by mail as ASCII – Create table statements – Graphical image of your model with tables as boxes and “foreign key – primary key” links as arcs • Any standard format (for a Windows user) – Queries and answers of task 3 Ulf Leser: Information Integration, Praktikum, WS 2008/2009