Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mauli Prasad Primary Advisor: Dr.Qunfeng Dong Secondary Advisor : Dr.Haixu Tang School of Informatics, Indiana University, Bloomington. © 2002 by Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter. http://stemcells.nih.gov/StaticResources/info/scireport/images/figurea6.jpg © 2002 by Bruce, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter. Head to Head GENE 2 5’ 3’ Head to Head 3’ 5’ 3’ 5’ 3’ 5’ GENE 1 Head to Tail For a promoter to be called BIDIRECTIONAL it should satisfy two conditions [1] 1. Adjacent genes should be in the Head to Head orientation 2. Their transcription start sites should be not more than 1000bp apart 5’ 3’ 3’ 5’ Tail to Tail 5’ 3’ 3’ 5’ Promoters of many co-expressed Bidirectional Gene Pairs are capable of initiating transcription in both directions. [Human Genome ] Trinklein et al (2004) [2] Compared to Tail to Tail, Head to Head gene arrangement is more conserved. [Vertebrates]. Yang et al (2008) [3] Co-expression of adjacent gene pairs [Yeast]. S. Kruglyak and H. Tang (2000) [4] Orientation affects co-expression of neighboring genes. [Arabidopsis thaliana] Williams et al (2004) [5] Search and Analysis possible bidirectional promoters in Arabidopsis Thaliana Arabidopsis thaliana (Wall cress/Mouse-ear cress) Model Organism for plants Herbaceous dicot (Brassicaciae family) Plants of economic importance – Cabbage, Broccoli, Turnips, Mustard, Rapeseed Mining adjacent Gene pairs Microarray data could suggest co-regulation If the gene pairs are co-expressed 1. What is the most prevalent intergenic distance? 2. Any common Motifs ? 3. Identification of Transcription factor. 4. Head to Head vs the rest. 5. Distance conservation and orientation patterns in Brassica rapa Dataset - Gene annotation in Gff format from The Arabidopsis Information Resource ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR8_genome_release Head to Head Head to Tail Tail to Tail All Pairs within 500bp Remove Pseudogenes, Transposons, RNA’s Duplicates bl2seq – evalue cutoff 1e-5 Head To Head Head to Tail Tail to Tail 1369 3807 2674 Dataset - Pre-processed expression data for 22810 probe sets on the Affymetrix Arabidopsis ATH1 (25K) array across 1436 hybridization experiments. ftp://ftp.arabidopsis.org/home/tair/Microarrays/analyzed_data/affy_data_1436_10132005.zip • • Start with 1436 Affymetrix Arabidopsis 25K arrays obtained from NASCArrays and AtGenExpress. Normalize the data using the robust multi-array average (RMA) method. Match probes to the gene pairs obtained For each pair calculate the correlation coefficient Plot % gene pairs against its correlation coefficient Based on appropriate cut-off for correlation coefficient select Highly Coexpressed gene pairs. No: Pairs matching probes H_H H_T T_T Non-Adjacent 842 2367 1642 624 Cor. Pairs %Cor. Pairs 55 80 43 45 6.5 3.3 2.6 0.9 H_H [H_T]+[T_T] >=60% 55 122 <=60% 787 3887 H_T [H_H]+[T_T] >=60% 79 98 <=60% 2288 2386 T_T [H_H]+[H_T] >=60% 43 134 <=60% 1599 3075 We want to test if the Highly Co-expressed genes significantly correlated to the H_H (potentially containing a bi-directional promoter). The test is used to examine the significance of the association between two variables in a 2 x 2 contingency table. Here the Sample is divided into H_H and non H_H (the 1st variable) vs. Highly Coexpressed gene pairs and the remaining gene pairs (the 2nd variable). Fishers Exact P-Value 6.20E-06 Fishers Exact P-Value 0.2836 Fishers Exact P-Value 0.005855 Categories 1 0-50 2 50-100 3 100-150 4 150-200 5 200-250 6 250-300 7 300-350 8 350-400 9 400-450 10 450-500 •If the intergenic distance distribution in Highly Co-expressed gene pairs vary significantl y from gene pairs having Low Co-expression • Leave one out technique was used to see which one of the distance categories contributed more. 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 >=60 <=60 >=60 <=60 >=60 <=60 H_H H_T T_T All 0.0009866 0.5259 0.3219 0-50 0.0006511 0.5844 0.2452 50-100 0.0006366 0.4407 0.2894 100-150 0.3647 0.6722 0.491 150-200 0.00038 0.5111 0.2689 200-250 0.001641 0.4542 0.2722 250-300 0.002395 0.4269 0.623 300-350 0.0004404 0.4951 0.2666 350-400 0.0011 0.4406 0.3198 400-450 0.0007382 0.5026 0.2819 450-500 0.002538 0.6561 0.22958 6% 0% 2% 8% Unkn Mol Func 25% Enzyme Activity Binding 38% 21% Transporter Transription Factor Struct Molecule Other Mol Funct Total = 126 3% 2% 3% 3% Unkn Mol Func 25% Enzyme Activity Binding 34% Transporter 30% Transription Factor Struct Molecule Other Mol Funct Total = 174 3% 4% 3% 2% Unkn Mol Func 16% Enzyme Activity Binding 39% 33% Transporter Transription Factor Struct Molecule Other Mol Funct Total = 118 Intergenic regions of highly co-expressed pairs in Head to Head was provided to MEME with the following parameters Any number of repetitions of the motif was allowed E-value cutoff 0.1 E= 8.6e-037 E= 3.6e-007 E= 1.9e-002 Regulate gene expression during initiation of axillary bud outgrowth in Arabidopsis Ascorbate oxidase gene (AO) promoter; Found in silencer region; AOBP (AGTA repeat binding protein) has DOF domain required for repression of expression of AO gene. Light responsive element (LRE) found in the parsley (P.c.) CHS-1 (chalcone synthase-1) gene promoter. No: pairs with Pairs Without Total Number Motif 1 Motif 1 Head to Head 20 35 55 Head to Tail 12 67 79 Tail to Tail 3 40 43 H_H [H_T]+[T_T] #enriched 20 #not enriched 35 H_T 15 107 0.0004077 [H_H]+[T_T] #enriched 12 23 #not enriched 67 75 T_T #enriched #not enriched Fishers Exact P-value [H_T]+[H_H] 3 32 40 102 Fishers Exact P-value 0.1883 Fishers Exact P-value 0.0277 Position Specific Probability Matrix from MEME was provided to TESS along with intergenic regions of highly correlating gene pairs in all orientations. AT1G09760-AT1G09770 AT5G23080-AT5G23090 AT5G64670-AT5G64680 AT2G40650-AT2G40660 AT3G46030-AT3G46040 AT1G23280-AT1G23290 AT1G76400-AT1G76405 AT5G05670-AT5G05680 AT2G20480-AT2G20490 AT3G56990-AT3G57000 [protein binding, response to cold]-[DNA binding, transcription factor activity, regulation of transcription-defense response signaling pathway] [RNA binding, RNA processing ]-[intracellular, transcription factor activity, regulation of transcription ] [ribosome, structural constituent of ribosome, translation, ribosome biogenesis and assembly ]-[ribosome, structural constituent of ribosome, translation, ribosome biogenesis and assembly ] [ binding, RNA processing]-[tRNA binding, tRNA aminoacylation for protein translation] [nucleus, DNA binding, nucleosome assembly, nucleosome ]-[structural constituent of ribosome, translation, cytosolic small ribosomal subunit ] [MAK16 protein-related]-[Encodes a ribosomal protein L27A, a constituent of the large subunit of the ribosomal complex] [endoplasmic reticulum, oligosaccharyl transferase activity, protein amino acid glycosylation ]-[similar to chloroplast channel forming outer membrane protein [Pisum sativum] (GB:CAB58442.1)] [endoplasmic reticulum, signal recognition particle binding ]-[nuclear pore complex protein-related;] [ similar to Os09g0446000 [Oryza sativa (japonica cultivar-group)] (GB:NP_001063306.1)]-[Cajal body, nucleolus, RNA binding, polar nucleus fusion ] [EDA7 (embryo sac development arrest 7)]-[nucleolar essential protein-related] [glycolate oxidase activity, electron transport, metabolic process ]-[chloroplast thylakoid lumen, serine-type peptidase activity, trypsin activity, proteolysis, photosystem II repair ] AT4G18360-AT4G18370 [protein serine/threonine phosphatase activity]-[chloroplast, 3-deoxy-7-phosphoheptulonate synthase activity, aromatic amino acid family biosynthetic process, chorismate biosynthetic process ] AT4G33500-AT4G33510 AT4G35440-AT4G35450 [membrane, voltage-gated chloride channel activity, chloride transport]-[protein folding, defense response to bacterium, incompatible interaction, protein targeting to chloroplast, integral to chloroplast outer membrane ] [chloroplast thylakoid membrane, structural molecule activity]-[shikimate kinase-related] AT2G35490-AT2G35500 AT2G37310-AT2G37320 AT1G13030-AT1G13040 AT1G14270-AT1G14280 AT3G16990-AT3G17000 AT1G04070-AT1G04080 [pentatricopeptide (PPR) repeat-containing protein]-[pentatricopeptide (PPR) repeat-containing protein] [unknown,sphere organelles protein-related; similar to hypothetical protein [Brassica rapa] (GB:ABQ50545.1); contains domain PTHR15197 (PTHR15197)]-[pentatricopeptide (PPR) repeatcontaining protein] [prenyl-dependent CAAX protease activity ]-[Encodes phytochrome kinase substrate 2. PKS proteins are critical for hypocotyl phototropism. ] [TENA/THI-4 family protein; Identical to Seed maturation protein ]-[ubiquitin-protein ligase activity] [P-P-bond-hydrolysis-driven protein transmembrane transporter activity, protein targeting to mitochondrion ]-[regulation of timing of transition from vegetative to reproductive phase, ] [ P-P-bond-hydrolysis-driven protein transmembrane transporter activity, protein targeting to mitochondrion ]-[ribosome, structural constituent of ribosome, translation ] AT1G27390-AT1G27400 Dataset Finished BAC’s of Brassica rapa in FASTA format ftp://149.155.100.41/pub/brassica/KBr_finished.fasta Protein sequences of the Highly Correlating Genes Arabidopsis ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR8_blasts ets/TAIR8_pep_20080412 Blastall [Program – tblastn] [Database -Brassica BAC’s] [Query - Arabidopsis Protein Sequences] [E-value cutoff - 1e-20] Head to Head Head to Tail Tail to Tail Different BAC’s Total Head to Head 16 1 0 13 30 Head to Tail 0 19 0 23 42 Tail to Tail 0 0 6 19 25 Percentage of highly correlating pairs more in Head to head Highly co-relating pairs in Head to Head fall within 100-150bp Head to Head pairs mostly RNA/DNA/Protein binding UP1ATMSD motif enriched in Head to Head Orientation seems to be conserved in B.rapa but intergenic distance seems to have lower conservation 1. 2. 3. 4. 5. Adachi N, Lieber MR: Bidirectional gene organization: a common architectural feature of the human genome. Cell 2002, 109(7):807-809 Trinklein, Nathan D., Aldred, Shelley Force, Hartman, Sara J., Schroeder, Diane I., Otillar, Robert P., Myers, Richard M. An Abundance of Bidirectional Promoters in the Human Genome Genome Res. 2004 14: 62-66 Yang MQ, Taylor J, Elnitski L: Comparative analyses of bidirectional promoters in vertebrates. BMC Bioinformatics 2008, 9 Suppl 6:S9 Kruglyak, Semyon., Tang, Haixu. Regulation of adjacent yeast genes . Trends in Genetics 2000 , 16 (3):109-111. Williams, Elizabeth J.B., Bowles, Dianna J.Coexpression of Neighboring Genes in the Genome of Arabidopsis thaliana. Genome Res. 2004 14: 1060-1067 Dr.Qunfeng Dong Dr.Haixu Tang Ashwini Oke Linda Hostetter