Download demo

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transcriptional regulation wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Protein moonlighting wikipedia , lookup

Gene expression wikipedia , lookup

RNA-Seq wikipedia , lookup

Molecular evolution wikipedia , lookup

List of types of proteins wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Gene therapy wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene expression profiling wikipedia , lookup

Biochemical cascade wikipedia , lookup

Genome evolution wikipedia , lookup

Gene desert wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene regulatory network wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Midterm Project
Database Schema
 GeneIDTable


Information about “gene” and corresponding
“protein”
gene_id, gene_name, gene_seq, protein_id,
protein_name, protein_seq, gene_type

gene_id – primary key (type varchar(255))

gene_type type varchar(255)

All other entries are of type longtext
Database Schema
 GeneFuncTable

Information about “gene functions”

gene_id, gene_fun, comment

gene_id – foreign key

All entries are of type longtext
Database Schema
 ProteinFuncTable

Information about “protein functions”

protein_id, protein_fun, comment

All entries are of type longtext
Database Schema
 PathwayFuncTable


Information about “pathway functions”
pathway_id, pathway_name, pathway_fun,
pathway_loc, comment
All entries are of type longtext
Database Schema
 PathwayTable

Information about “gene pathway association”

gene_id, pathway_id

gene_id type varchar(255)

pathway_id type longtext
Database Schema
 BiologicalProcessTable


Gene Ontology related table
Information about “biological processes” of a
particular gene

gene_id, GO_num, biological_process

gene_id – foreign key (type varchar(255))

All other entries are of type longtext
Database Schema
 CellularComponentTable

Gene Ontology related table

Information about “cellular component”

gene_id, GO_num, cellular_component

gene_id – foreign key (type varchar(255))

All other entries are of type longtext
Database Schema
 MolecularFunctionTable

Gene Ontology related table

Information about “molecular functions”

gene_id, GO_num, molecular_function

gene_id – foreign key (type varchar(255))

All entries are of type longtext
Steps to Follow – Step 1

Get the RefSeq Accession Number of your
species from the NCBI Genome database
 e.g.
NC_000913 for Escherichia Coli K12
Steps to Follow – Step 2

Downloading files needed using the NCBI ftp
site (ftp://ftp.ncbi.nlm.nih.gov)
 genomes/Bacteria/[species
name]/[RefSeq
#].gbk (main information for genes and proteins
and GO functions)

e.g.
genomes/Bacteria/Escherichia_coli_k12/NC_000913.
gbk
 genomes/Bacteria/[species
name]/[RefSeq #].ffn
(gene sequence)

e.g.
genomes/Bacteria/Escherichia_coli_k12/NC_000913.
ffn
Steps to Follow – Step 3
Go to KEGG selected organisms
(http://www.genome.jp/kegg/catalog/org_li
st.html)
 Find your species and click the second
column of the species (e.g. eco for E Coli)
 Go to “pathway maps” to get pathway
information to put into the PathwayFunc
table

Steps to Follow – Step 4




Use eutils function of NCBI Entrez to get the file
that contains gene pathway association
(http://eutils.ncbi.nlm.nih.gov/entrez/eutils/)
Use esearch to search your species in the gene
database
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearc
h.fcgi?db=database&term=query&usehistory=y
Use efetch to fetch the result file
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.f
cgi?db=database&WebEnv=WebEnvString&que
ry_key=key
Steps to Follow – Step 5
Edit .gbk file to remove the beginning and
the end part
 Parse the .gbk and the .ffn file to fill all the
tables except the PathwayFunc table and
Pathway table
 Link to the sample parser file
 Parse.java

Steps to Follow – Step 6
Parse the eutils resulting file to get the
gene pathway association
 Link to the sample parsePath file
 ParsePath.java

Database Name Format
Example species Escherichia Coli K12
 Species name: Escherichia_Coli_K12
 Database name: escherichia_coli_k12

Sample Output File
outputFile.txt (output file after parsing .gbk
and .ffn files)
 outputPath.txt (output file after parsing
gene pathway association file)
 PathwayFunc.txt (output file after
analyzing KEGG pathways)

To Find the Number of Genes
Search your species in NCBI gene
database
 e.g. Escherichia Coli K12 [orgn]
 Check the number of genes in your result
with this number

Submit your project (the 3 output files, the
parsers if any changes) to:
 [email protected]

Any questions:
 [email protected][email protected]