Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Weixi Zhong Mentor: Dr. Andrew Cameron Center for Computational Regulatory Genomics California Institute of Technology Set up an accessible database for E. tribuloides transcriptome Compare the quality of Eucidaris tribuloides RNA sequence assemblies Choose best assembly Create sequence database Create web interface to access database Facilitate future E. tribuloides gene studies Share findings on E. tribuloides transcriptome Extensions after further research (i.e. more search options, feedback, etc.) 1. Image courtesy of http://www.peteducation.com/ Image 1. Strongylocentrotus purpuratus Image 2. Only Echinoderm with fully sequenced genome Evolutionarily closer to humans than many other model organisms used in developmental biology Eucidaris tribuloides Distant relative of S. purpuratus (~275 my) Useful in comparative studies 2. Image courtesy of SpBase (http://www.spbase.org/) E. tribuloides S. purpuratus Image 3. Gene regulatory differences? *Red arrows point to mesenchyme cells, which develop later in E. tribuloides than other sea urchins; circles indicate location of blastopore 3. Image courtesy of http://www.palaeos.com/ Microscope images of sea urchin gastrula courtesy of Dr. Andrew Cameron Image 3. No available E. tribuloides genome Assemble transcriptome: Early Et gastrula Quality comparison! RNA Velvet assembly Database Expression studies cDNA Solexa reads High-throughput short read sequencing technology cDNA Sequenced reads . : A G G T C T T A C . : De novo genome assembly software developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI) in UK SOLEXA reads Contigs A G C A T A C C T G T A A Contig – sequence of a set of contiguous overlapping reads Contigs from a single velvet run assumed to be unique and non-overlapping Information from http://www.ebi.ac.uk/~zerbino/velvet/ Assess quality of assembly using length distribution: n50 and 90% complexity calculations N50—length of shortest contig such that the summed length of equal or longer contigs constitute at least 50% of the total length of all contigs* 90% complexity—similar, assuming unique contigs Weighted length frequency of contigs Weighted frequency (total # of nucleotides in contigs of given length) 2500000 2000000 1500000 1000000 500000 0 0 100 n50 200 300 400 500 600 700 800 900 Length (nucleotides) *n50 definition based on definition by Jeremy Leipzig (http://jermdemo.blogspot.com/2008/11/calculating-n50-from-velvet-output.html) 1000 Use S. purpuratus proteome as reference Map contigs to proteome Using proteome “removes” silent mutation differences between genes S. purpuratus: CTC-ATG-TAC-TTC-GAG-GGA-TGC-TTG-AAG GLEAN3_00299: LEU-MET-TYR-PHE-GLU-GLY-CYS-LEU-LYS E. tribuloides: TTG-ATG-TAT-TTT-GAA-GGA-TGC-CTG-AAA Record metadata : count of matches, annotated matches, unique matches Create User information table Contig information table Gene information table Contig-gene match information table Sequences Write database using PostgreSQL webpage to access database Ability to search using both species Display in text and graphical formats Search Sp genome search results Change display order Change display order Et contigs search results Gene match information graphical display Tabular display Contig information popup Search history Match details popup Eucidaris tribuloides RNA Sequence Database Conduct Share Add research using database information data with researchers through website functionality to website as research findings evolve Special The SoCalBSI faculty and staff Dr. Jamil Momand, Dr. Sandy Sharp, Dr. Nancy Warter-Perez, Dr. Wendie Johnston, Dr. Beverly Krilowicz, and Ronnie Cheng My mentor: Dr. Andrew Cameron The CCRG staff thanks to: Autumn Yuan, Dong He, Dave Felt All the SoCalBSI interns Funded by: Choose search criterion Narrow Searchdown using Enter Examples search results either species terms Sp Official Link gene to result identifier name page with forfor link this all tosequence Et match inthat the match page Sp genome to this gene Link to SpBase page forcontigs thisdisplay gene Corresponding Contig Topname, SPU Blastx genes Contig link matches score iflength existent, to popup and for e-value this with withcontig link contig to display information page Contig name Contig length Contig coverage Contig sequence Gene name Basic gene information and links to more comprehensive Change webpages display format Alignment Link to popup with detailed alignment Link to popup Tabulated alignment with detailed summary alignment Change display format Alignment details Contig information