Download Computing Co-Expression Relationships

Computing Co-Expression Relationships Wen-Dar Lin Contents • Motivation • Basic Idea • Case Studies – An Example of Single Experiment – An Example of Time-Course Experiment • Potential Applications • Availability • Future Works Motivation • Given a set of differentially displayed genes that are reported by an array experiment. – We would like to know relationships among these genes. – These relationships may recover important modules or motifs with respect to the experiment. Motivation • Co-expression relationships are one kind of the most biologically meaningful and easily computable relationships. – Co-expression relationships form modules that may infer important biological information. – They can be computed from a large amount of publicly available array data. Basic Idea • Array data can be retrieved from publicly available data repository – like the NASCarrays, NCBI GEO, EMBL-EBI ArrayExpress • They should be normalized before computing the co-expression relationships. – e.g. normalized by the RMA method Basic Idea • Defining co-expression relationships – We define that a coexpression relationship between two genes exists if the pearson correlation coefficient between their normalized expression levels is greater than or equal to a certain threshold. slide # 1 2 3 4 … gene X 1 2 10 3 … gene Y 5 2 12 4 … Y X Basic Idea negative correlation • Properties of pearson correlation coefficient – Let Correl(A, B) be the pearson correlation coefficient between normalized expression levels of gene A and gene B. – 0   Correl(A, B)   1 from http://www.gseis.ucla.edu/courses/ed230bc1/notes1/var1.html Basic Idea • The computational assistance – Given a set of interested genes – Compute co-expression relationships among them – Identify co-expression clusters Case Studies • We have implemented aforementioned ideas into a tool kit and applied it to two case studies. – A single experiment – A time-course experiment A Single Experiment • In this example, an array experiment was performed – 178 differentially displayed genes were identified. – Based on RMA array data of 300 ATH1 slides downloaded from the NASCarrays • sample of each slide was derived nonexclusively from roots • Threshold for pearson correlation coefficient = 0.7 A Single Experiment One minor subcluster Two larger clusters A Single Experiment • We may compute co-expression relationships based on all kinds of array experiment data – Based on RMA array data of 1436 ATH1 slides downloaded from the TAIR, co-expression relationships were identified • Threshold for pearson correlation coefficient = 0.7 A Single Experiment Two larger clusters A Single Experiment • Is there any difference between the graphs based on root-array data and that based on all-array data? – By differentially marking clusters of one graph onto the other graph. A Single Experiment One cluster that should be root-specific Two clusters mapped by the other graph A Single Experiment Cluster size: 9 Cluster sizes: 47 & 14 A Single Experiment • Some remarks – The number of differentially displayed genes reported by the experiment is 178 – The number of clustered genes is 47+14+9 = 70 • Reduced by more than 50% – The co-expression relationships are recovered • Each cluster may be a module that usually work together. – Finding tissue-specific co-expression relationships • Can be done by mapping the graph based on all-array data onto the graph based on tissue-related-array data. A Single Experiment • In addition to cluster genes according to co-expression relationships, we also fished genes that may potentially co-expressed. – These genes may not be identified as differentially displayed in the experiment. A Single Experiment • A GO enrichment analysis was also carried out – using the GOBU software (gobu.iis.sinica.edu.tw) – which should give a conceptual view of clustered genes. A time-course experiment • In this example, a time-course array experiment was performed – Three time points – About 800 genes differentially displayed at least one time point. – Based on array data of 300 ATH1 slides extracted from RMA array data of about 2600 ATH1 slides downloaded from the NASCarrays • Threshold for pearson correlation coefficient = 0.8 A time-course experiment Time point 1 About 100 genes About 100 genes A time-course experiment Time point 2 About 100 genes About 100 genes A time-course experiment Time point 3 About 100 genes About 100 genes A time-course experiment • Though this clustering and time-course expression data shows some biological meaning, – this size of clustered genes (more than 200) • makes the graph too complex and • is too large to be realized in a short time. A time-course experiment • Reducing the size of clustered genes may help – reducing complexity of the graph and – realizing revealed co-expression module • We reduced the graph by removing co-expression relationships that generally exist in the entire plant – based on RMA array data of about 2600 ATH1 slides downloaded from the NASCarrays – Threshold for pearson correlation coefficient = 0.7 A time-course experiment • Edges (relationships) to be removed Y root-related others X A time-course experiment • Edges (relationships) to be retained Y root-related others X A time-course experiment Time point 1 About 20 genes About 50 genes About 60 genes A time-course experiment Time point 2 About 20 genes About 50 genes About 60 genes A time-course experiment Time point 3 About 20 genes About 50 genes About 60 genes A time-course experiment • Some remarks – The number of differentially displayed genes at least one time point is about 800. – The number of clustered genes is about 60+50+20 = 130 • Reduced by more than 80% – The retained graph contains edges, i.e., gene pairs, that are co-expressed in root but not in the entire plant • The recovered clusters should be root specific. Potential Applications • We have created a tool kit that – computes co-expression relationships based on array data • where probe names can be replaced by aliases made by something like orthologous mapping • can be used for studying non-model organism using array data of a model organism. Potential Applications • We have created a tool kit that – fills colors according to graphs by • intensity fold-changes, or • clusters in another graph Potential Applications • We have created a tool kit that – removes/retains co-expression relationships in another graph – finds specific or common co-expression relationships 200 genes 120 genes Potential Applications • We have created a tool kit that – fishes genes that are potentially co-expressed with assigned bait Future Works • Incorporate pathway database – like the AraCyc – for finding relationships between co-expression clusters and known pathways • A user-friendly interface which would – facilitate using this tool kit and – help manage output data Availability • The tool kit is now an open-source project – http://maccu.sourceforge.net – Project name: MACCU • Multi-Array Correlation Computation Utility – A detailed description of each program module has been created. – A running script with example is provided. Special Thanks • I would like to thank – Drs. Chang (Bill), Schmidt & Wu • for raising this idea, • the initial implementation, and • valuable comments. Thank you!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Computing Co-Expression Relationships