Download Praktikum Information Integration - HU

Information Integration Assignment 1: Database Set-Up Ulf Leser Background • We start with genes, their location, and their function • Types of information – Genes: Have a taxon ID (organism), have an ID, have a preferred name, have multiple other names, have multiple functional annotations, have a connected protein (with a protein_id and a protein_version_id), have a status, are on a chromosome, have a start and end position, and a chromosomal location – Gene function: Are described by a taxonomy of terms which forms a DAG; each term has an ID, a name, a description, and can be annotated to multiple genes – Gene – Function relationship: Has an evidence code Ulf Leser: Information Integration, Praktikum, WS 2008/2009 Task 1: Create the Schema • Create a relational model for the information described on the previous page • Implement this model in the Oracle database – One account per group; access information will be sent out asap • Datatypes: Look at the data – Be conservative – Use VARCHAR2 – Length: Guess a reasonable number which leaves some buffer Ulf Leser: Information Integration, Praktikum, WS 2008/2009 Task 2: Fill the Schema (partly) • From the web page, download the following files – Gene2refseq [Tax_Id, Gene_Id, status, protein_accession, protein_accession_version, Start, End] – gene2go [tax_id, geneID, GOID, Evidence] – gene_info [tax_id, geneID, symbol, Synonyms, chromosome, map location] • Import the given attributes into your schema – This requires some processing – Gene2refseq and gene_info should be merged – gene_info.synonyms should be normalized • The structure of the GO will be imported later Ulf Leser: Information Integration, Praktikum, WS 2008/2009 Task 3: Some Queries • Formulate and answer the following queries – – – – How many How many How many How many synonym) – How many genes does your gene table have? relationships between genes and a GO term are there? distinct GO terms are annotated to at least one gene? gene names are present (either preferred name or synonyms are assigned to more than one gene? Ulf Leser: Information Integration, Praktikum, WS 2008/2009 Deliverables • By Monday 3.11., or Wednesday 5.11., 23:59 o’clock • Send by mail as ASCII – Create table statements – Graphical image of your model with tables as boxes and “foreign key – primary key” links as arcs • Any standard format (for a Windows user) – Queries and answers of task 3 Ulf Leser: Information Integration, Praktikum, WS 2008/2009

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Praktikum Information Integration - HU