Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Comparative transcriptomic analysis of fungi Group Nicotiana Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki Research objective To study differences in gene expression in related fungus species Studies species: - Reference genome - RNA reads > 100 bp - Preferably: Paired-end - Related species - Similar conditions Comparison Comparison between different species - Saccharomyces cerevisiae (yeast) - Komogataella pastoris (Pichia, yeast) - Aspergillus oryzae (fungus) Methods – Data [Daan] RNA-seq: SRA Genome and annotation: Ensembl Fungi Read quality analysis performed with FastQC Methods - Data processing Cleaning reads: Mapping reads: Assembly/Quantification: Optional replicate assembly: Extracting transcript seqs: Selection of top 100 genes: SolexaQA TopHat Cufflinks Cuffmerge gffread Linux Methods – Gene properties Property Explanation Tool (input datafile) Expression Count of mapped reads Perl script (fasta) Length Count of base pairs of whole gene Perl script (fasta) Intron length Count of base pairs within introns Perl script (gtf) GC content GC count/Length Perl script (fasta) Nc Ratio: 20-61; 20 = one codon per amino acid; 61: random codon use CodonW (fasta) CG3s GC content of 3RD synonymous codon position CodonW (fasta) Methods – Interaction Top 100 genes were mapped to the interactome file and visualised through Cytoscape. Hypothesis for yeast - Validation • GC-content correlates positively with gene length. • Negative correlation with gene length and degree of codon bias. • Codon bias is more extreme in highly expressed genes. • Genes with longer introns show higher bias in codon usage. • The overall codon usage matches the known bias. GO-terms and gene locations GOBPID Pvalue OddsRatio ExpCount Count Size Term 1 GO:0002181 3.58E-97 54.125 6.305508 95 171 cytoplasmic translation 2 GO:0044238 1.80E-14 3.670035 51.04701 96 3 GO:0071843 2.06E-12 3.421344 21.49811 57 4 GO:0006407 7.92E-12 37.62835 0.700612 11 19 rRNA export from nucleus 5 GO:0070925 4.44E-11 8.76313 2.949945 19 80 organelle assembly 1319 primary metabolic process cellular component biogenesis at cellular 587 level The top 5 most over-represented GO-terms for all the found genes Chromosome I II III IV V VI VII VIIIIX X XI XII XII XIV XV XVI Nro. of genes 3 15 5 46 5 16 14 5 0 11 12 37 17 11 1 2 The chromosomes the genes are found in. Results – Correlations Gene expression vs. Gene length Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris Results – Correlations Gene expression vs. Intron length Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris Results – Correlations Gene expression vs. Effective Nr of codons Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris Results – Correlations Effective Nr of Codons vs. GC-cont. 3rd pos. Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris Results – Correlations Gene length vs. Effective Nr of Codons Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris Results – Correlations Gene length vs. GC-content Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris Results – Correlations Gene length vs. Intron length Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris Results – Correlations Intron length vs. Nc Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris Results – Correlations Overall: - Within species: Few correlations between gene properties - Between species: Different patterns(?) Cytoscape • GO terms Top100 genes show different interactive network in GO terms Results - First choice Yeast Interactome Project for S. cerevisiae •high-throughput yeast two-hybrid (Y2H) provides high-quality binary interaction information. •high-throughput Y2H dataset covering ~20% of all yeast binary interactions. •This binary map is enriched for transient signalling interactions and inter-complex connections with a highly significant clustering between essential proteins. Database choosing • interactions from CCSB-YI1 1,809 interactions among 1,278 proteins Second choice YeastNet v. 2 •a probabilistic functional gene network of yeast genes, constructed from ~1.8 million expermental observations from DNA microarrays, physical protein interactions, genetic interactions, literature, and comparative genomics methods. • In total, YeastNet v.2 covers 102,803 linkages among 5,483 yeast proteins •a modified Bayesian integration of diverse data types, with each data type weighted according to how well it links genes that are known to share functions. (LLS) Database choosing • All the top 100 genes could find interactors in the Yeastnet v.2. • We could find 9896 possibilities among 102,803 linkages The end Questions? Results – Correlations Gene expression vs. CG content Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris