Download Praktikum Information Integration - HU

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epistasis wikipedia , lookup

Copy-number variation wikipedia , lookup

Ridge (biology) wikipedia , lookup

Pathogenomics wikipedia , lookup

Minimal genome wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

X-inactivation wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genetic engineering wikipedia , lookup

History of genetic engineering wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Point mutation wikipedia , lookup

Public health genomics wikipedia , lookup

Gene therapy wikipedia , lookup

RNA-Seq wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Protein moonlighting wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Gene desert wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genome evolution wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

NEDD9 wikipedia , lookup

The Selfish Gene wikipedia , lookup

Gene wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Helitron (biology) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome (book) wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene nomenclature wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Information Integration
Assignment 1: Database Set-Up
Ulf Leser
Background
• We start with genes, their location, and their function
• Types of information
– Genes: Have a taxon ID (organism), have an ID, have a preferred
name, have multiple other names, have multiple functional
annotations, have a connected protein (with a protein_id and a
protein_version_id), have a status, are on a chromosome, have a
start and end position, and a chromosomal location
– Gene function: Are described by a taxonomy of terms which forms
a DAG; each term has an ID, a name, a description, and can be
annotated to multiple genes
– Gene – Function relationship: Has an evidence code
Ulf Leser: Information Integration, Praktikum, WS 2008/2009
Task 1: Create the Schema
• Create a relational model for the information described on
the previous page
• Implement this model in the Oracle database
– One account per group; access information will be sent out asap
• Datatypes: Look at the data
– Be conservative
– Use VARCHAR2
– Length: Guess a reasonable number which leaves some buffer
Ulf Leser: Information Integration, Praktikum, WS 2008/2009
Task 2: Fill the Schema (partly)
• From the web page, download the following files
– Gene2refseq
[Tax_Id, Gene_Id, status, protein_accession,
protein_accession_version, Start, End]
– gene2go
[tax_id, geneID, GOID, Evidence]
– gene_info
[tax_id, geneID, symbol, Synonyms, chromosome, map location]
• Import the given attributes into your schema
– This requires some processing
– Gene2refseq and gene_info should be merged
– gene_info.synonyms should be normalized
• The structure of the GO will be imported later
Ulf Leser: Information Integration, Praktikum, WS 2008/2009
Task 3: Some Queries
• Formulate and answer the following queries
–
–
–
–
How many
How many
How many
How many
synonym)
– How many
genes does your gene table have?
relationships between genes and a GO term are there?
distinct GO terms are annotated to at least one gene?
gene names are present (either preferred name or
synonyms are assigned to more than one gene?
Ulf Leser: Information Integration, Praktikum, WS 2008/2009
Deliverables
•
By Monday 3.11., or Wednesday 5.11., 23:59 o’clock
•
Send by mail as ASCII
– Create table statements
– Graphical image of your model with tables as boxes and “foreign
key – primary key” links as arcs
•
Any standard format (for a Windows user)
– Queries and answers of task 3
Ulf Leser: Information Integration, Praktikum, WS 2008/2009