* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download demo
Transcriptional regulation wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Protein moonlighting wikipedia , lookup
Gene expression wikipedia , lookup
Molecular evolution wikipedia , lookup
List of types of proteins wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Gene therapy wikipedia , lookup
Community fingerprinting wikipedia , lookup
Gene expression profiling wikipedia , lookup
Biochemical cascade wikipedia , lookup
Genome evolution wikipedia , lookup
Gene desert wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Midterm Project Database Schema GeneIDTable Information about “gene” and corresponding “protein” gene_id, gene_name, gene_seq, protein_id, protein_name, protein_seq, gene_type gene_id – primary key (type varchar(255)) gene_type type varchar(255) All other entries are of type longtext Database Schema GeneFuncTable Information about “gene functions” gene_id, gene_fun, comment gene_id – foreign key All entries are of type longtext Database Schema ProteinFuncTable Information about “protein functions” protein_id, protein_fun, comment All entries are of type longtext Database Schema PathwayFuncTable Information about “pathway functions” pathway_id, pathway_name, pathway_fun, pathway_loc, comment All entries are of type longtext Database Schema PathwayTable Information about “gene pathway association” gene_id, pathway_id gene_id type varchar(255) pathway_id type longtext Database Schema BiologicalProcessTable Gene Ontology related table Information about “biological processes” of a particular gene gene_id, GO_num, biological_process gene_id – foreign key (type varchar(255)) All other entries are of type longtext Database Schema CellularComponentTable Gene Ontology related table Information about “cellular component” gene_id, GO_num, cellular_component gene_id – foreign key (type varchar(255)) All other entries are of type longtext Database Schema MolecularFunctionTable Gene Ontology related table Information about “molecular functions” gene_id, GO_num, molecular_function gene_id – foreign key (type varchar(255)) All entries are of type longtext Steps to Follow – Step 1 Get the RefSeq Accession Number of your species from the NCBI Genome database e.g. NC_000913 for Escherichia Coli K12 Steps to Follow – Step 2 Downloading files needed using the NCBI ftp site (ftp://ftp.ncbi.nlm.nih.gov) genomes/Bacteria/[species name]/[RefSeq #].gbk (main information for genes and proteins and GO functions) e.g. genomes/Bacteria/Escherichia_coli_k12/NC_000913. gbk genomes/Bacteria/[species name]/[RefSeq #].ffn (gene sequence) e.g. genomes/Bacteria/Escherichia_coli_k12/NC_000913. ffn Steps to Follow – Step 3 Go to KEGG selected organisms (http://www.genome.jp/kegg/catalog/org_li st.html) Find your species and click the second column of the species (e.g. eco for E Coli) Go to “pathway maps” to get pathway information to put into the PathwayFunc table Steps to Follow – Step 4 Use eutils function of NCBI Entrez to get the file that contains gene pathway association (http://eutils.ncbi.nlm.nih.gov/entrez/eutils/) Use esearch to search your species in the gene database http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearc h.fcgi?db=database&term=query&usehistory=y Use efetch to fetch the result file http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.f cgi?db=database&WebEnv=WebEnvString&que ry_key=key Steps to Follow – Step 5 Edit .gbk file to remove the beginning and the end part Parse the .gbk and the .ffn file to fill all the tables except the PathwayFunc table and Pathway table Link to the sample parser file Parse.java Steps to Follow – Step 6 Parse the eutils resulting file to get the gene pathway association Link to the sample parsePath file ParsePath.java Database Name Format Example species Escherichia Coli K12 Species name: Escherichia_Coli_K12 Database name: escherichia_coli_k12 Sample Output File outputFile.txt (output file after parsing .gbk and .ffn files) outputPath.txt (output file after parsing gene pathway association file) PathwayFunc.txt (output file after analyzing KEGG pathways) To Find the Number of Genes Search your species in NCBI gene database e.g. Escherichia Coli K12 [orgn] Check the number of genes in your result with this number Submit your project (the 3 output files, the parsers if any changes) to: [email protected] Any questions: [email protected] [email protected]