Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
RBP1 Splicing Regulation in Drosophila Melanogaster 03-711 - Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla This presentation available at http://www.jjoseph.org/biology/ Alternative Splicing in Dros. RBP1 Regulation Involved in dsx splicing and Rbp1 auto-regulation Suspected in many other related pathways Genome Data Sequence of all introns of known splice variants Two annotated genomes available D. Melanogaster D. Pseudoobscura As the gene names for D. Mel. and D. Pseu. differ, a list of gene orthologs was also obtained Computational Approach Create profile HMM for each motif (B-B, B-A) Select the end of every intron (~50 bases) Perform an HMM search for each intron segment, in both D. Mel. and D. Pseu. Keep matches found in both species Keep matches at the end of introns (~15 bases) Return alignment of both species Examine biological similarity of matches Data Summary Hidden Markov Profile (HMM) and HMMer We needed an HMM profiler and search program. Revised version of what Krogh/Haussler model called Plan 7 Not only global alignment HMMer Advantages Possible Alignments Classic global alignment Classic local alignment Global Profile, Local Sequence alignment Fully local “multihit” alignment. Ex: Scoring Raw alignment score E-value, showing the significance of the alignment HMMer Create HMM for multiple alignment of each B-B and B-A motif Genome is scanned for high scoring matches Only hits within a distance of 15 base pairs of the 3’ splice site are considered Results: B-A Motif CG30271-RC-in_5 (27 - 39), GA15740-in_5 (27 - 39) ctgttgaatcacttggaaagcaatcaGTCGACAATTGTTtacttttacag | |||||||||| ||||||||||||||||||||||||||||||||||| cctttgaatcactcggaaagcaatcaGTCGACAATTGTTtacttttacag score: -6 CG30020-RA-in_3 (25 - 37), GA15581-in_9 (24 - 36) ccgtcccagtgacttacaatacgaTTCTACTATTTTTtgtacgcttacag | | | | | ||||| |||| | | taaggctcttcatactttatcaaATCTACAATTTCTcaatgtaattgcag score: -8 Klp3A-RA-in_3 (31 - 43), GA21186-in_3 (26 - 38) score: -9 ttgaagttcgaaaactcctgaaactaattgTTCCACAATTTTTttttatt | || || || ||| || ||||| | | tgttcaattcttaaataaaaccaatTTCGACTCTTTTTctcttctttcag na-RB-in_0 (33 - 45), GA13546-in_2 (25 - 37) score: -9 tctggtgcactgagagaaatgccatctacttcATCGATACTCTTTtgcag | | || | | || || | tgtaaacactcgttgcaaacacaaATTTACAATCAATttccatgttttat CG30428-RA-in_2 (33 - 45), GA15840-in_1 (25 - 37) score: -9 ggtaaggaagcgtaaaaataaattctttttttATCACCAATATTTttcag | || || ||||| |||| ||||| aaaatatcaagccgaaacaaatttATGTACAATTTTTtttttatggaaag CG2199-RB-in_0 (36 - 48), GA15296-in_0 (33 - 45) score: -10 ttgctactgccattataggtagtttaaaaactgttTTCTACACTCTTTct | | | | | || ||||| | | aacaaaaacaaaaatatggccctctgataattGGGGACACTTTATttcag Results: B-B Motif ps-RD-in_4 (31 - 42), GA20847-in_4 (31 - 42) score: -11 catttaatatcttgaaaatatttaacataaATCTGATGCAAAtattccag | || | || |||||||||||||||||||||||||||||||| attactattcttaaaatatatttaacataaATCTGATGCAAAtattccag fru-RE-in_6 (26 - 37), GA12896-in_5 (24 - 35) score: -13 cccacccccacagtgatgacgcctaATATGAACCAAGcaaatgtttgcag | | | | | | ||| | || | | | | tgctaaataaaccaaattccaaaCTCTGATCAAAAaataccgataaaaag Ptp52F-RA-in_0 (38 - 49), GA14851-in_14 (34 - 45) score: -13 tactctttgaaaaataagcatatggatgtcactgataATATGATATTAAt | | | | || | ||| || || tctaaatcgtattcaaatcgaattgaaacataaATCGAATCCAAAaacag CG9455-RA-in_0 (32 - 43), GA21800-in_0 (27 - 38) score: -13 aatagtggctttgttttaataacaatgtaatATCTGATATTTAttctcag | | | | | ||||| | | | cagagcgtgccccgtctgatgatccgAACTGATCTGATgtttttcggtag CG8709-RA-in_2 (34 - 45), GA21271-in_9 (34 - 45) acaaatcttaggaaataccaaagttgttctacgATCTTATCTATGgagtc | | | | | | || || | |||||| gccccatcagtgtcagtggcagctgaccccaccATTTGATCTATTtgcag score: -13 CG7966-RA-in_0 (37 - 48), GA20727-in_4 (26 - 37) score: -13 tatatgtacacattgtactgcaaacacatgccctgaATCTTTGATAAAga | | ||| | | |||||| | |||| gtgttgaatgaaagaatacacttgaATCGGTTCTAAAttgcatcgcacag Biomolecular Activity: B-A Biomolecular Activity: B-B Biomolecular activity analysis fru gene, regulated by the tra and tra2 genes is expressed at the same time as dsx gene helps validate our results. Expected presence of sxl and tra genes. Functional Similarity: B-A motif: SNF4Agamma, rdgc, qtc. B-B motif: ps, ptp, CG9455. Difficulties & Future Directions Support Vector Machines were applied Lack of significant training data. Lack of direct experimental data for crossvalidation. Since the current D. Pse. genome has far fewer intron sequences, reliance upon orthologs introduces many false negatives. Alternate Approach: Support Vector Machines (SVM) Used for data classification Creates hyperplanes that separate data into two classes with maximum-margin Appropriate for multidimensional classification problems Examples Article classification Protein classification Critical points Feature selection Training HMM and SVM HMMer is used to generate features All genome searched for A and B consensus sequences Search results for each intron combined to create features Features Scores of two motifs in the upstream (2) Distance of the motifs to the splice site (1) Length of consensus sequence overlap (1) Length of motif (1) Does consensus sequence B precedes A (1) Number of features = 6 Summary Profile HMM used for modeling Comparative analysis with the D.Pseu genome High scoring alignments for both motifs further analyzed for biomolecular activity The existence of the fru and other close matches help to validate our results