Download A Bioinformatics Tool for Analyzing G

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

United Kingdom National DNA Database wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

MicroRNA wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Designer baby wikipedia , lookup

RNA world wikipedia , lookup

Gene expression profiling wikipedia , lookup

Point mutation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genomics wikipedia , lookup

Microevolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

RNA interference wikipedia , lookup

Metagenomics wikipedia , lookup

Nucleic acid tertiary structure wikipedia , lookup

Gene wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA wikipedia , lookup

Deoxyribozyme wikipedia , lookup

RNA silencing wikipedia , lookup

Polyadenylation wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

History of RNA biology wikipedia , lookup

Non-coding RNA wikipedia , lookup

RNA-binding protein wikipedia , lookup

RNA-Seq wikipedia , lookup

Messenger RNA wikipedia , lookup

Epitranscriptome wikipedia , lookup

Primary transcript wikipedia , lookup

Transcript
A Bioinformatics Tool for
Analyzing G-quadruplexes in the
mRNA Untranslated Regions
ザカレ ザッパァ
Zachary Zappala
But why?
 To
map theoretical existence of
Quadruplex forming G-Rich Sequences in
cytoplasmic mRNAs
Utility Belt
 Access
to NCBI Entrez Gene
 PHP/MySQL/C++/JavaScript
 Laptop (Dell Latitude D505 1.6 ghz 512
DDR)
 Internet server
 Perseverance
 Starbucks Frappucinos™
Bioinformatics?

A hybrid of information sciences and biology
 Similar, but not the same as computational
biology
 Enlists the help of databases and tools to
analyze large masses of data to find patterns
that are not easily discernable by the human eye

Tools like NCBI BLAST are especially well know
…Biology? In computer science?
Well, it’s not that complicated
 The biology involved in this project is
transcription/translation
 Genetics!
 Quick overview:




DNA (double helix; 2 nucleotide strands)
RNA (single nucleotide strands)
DNA is transcripted into RNA, which travels to
ribosomes, which translate the RNA data into amino
acids, the building blocks of proteins
Eukaryotic mRNA!
DNA in the nucleus
transcription
There are 3 sections in cytoplasmic
eukaryotic mRNAs.
•The 5’ UTR
•The Coding sequence
•The 3’ UTR
RNA in the nucleus (pre-mRNA)
RNA processing (SPLICING!)
RNA in the Cytoplasm (mRNA)
5’ UTR
CDS
3’ UTR
Gene expression regulation factors?
G-quadrawhats?


That’s right: G-quadruplexes
A type of secondary structure that forms in
single stranded nucleotide sequences
(aka..mRNA)

…GG~GG~GG~GG…

Plates (tetrads) form
between 4 guanine molecules
that line up
Why is this important?
Since all this work is theoretical, it’s important to
know that there could be an application
 QGRS in pre-mRNA has already been shown to
play an important role in pre-mRNA splicing
(Kikin, D’Antonio, Bagga 2006)
 So, what about cytoplasmic mRNA?



Gene expression control (since not all mRNA become
proteins)
Internal Ribosomal Entry Sites (IRES)
• Allows entry of ribosomes to start translation not at the
beginning of the 5’ UTR
But how to predict?

Prior research has given several clues to what
constitutes a strong QGRS


For instance, it is known that only one loop can have
a length of zero
Also, the more tetrad plates that are forming, the
more likely it is that the QGRS will exist
QGRS Motif:
GxNy1GxNy2GxNy3Gx
G-score is assigned using a straightforward function
Divining the QGRS in an mRNA sequence
 When
a gene is requested by the user,
data is parsed from NCBI and given as
parameters for a C++ program
 Executes and saves data from the
sequence, which is then picked up again
by the PHP program to be displayed to the
user
“Behind the scenes”
Program Flow!
PHP Session
Variable
“Interface”
Screenshots!
…more screenshots!
…and more screenshots!
But that’s not all!
 In
order to not overload you with pretty
pictures, let’s just say that you can also
view direct data tables and a sequence
view that block out the QGRS locations
 Program executes in a small time frame,
and due to the nature of mRNA there are
not many abnormal situations

Poor internet connects do tend to slow
display…but that’s your ISP
Successes & Failures

Research on the NRAS oncogene has shown,
using crystallography, that a QGRS exists @ the
-222 CDS bp position (within the 5’ UTR)



When QGRS Mapper 2 analyzed the same gene, it
predicted a QGRS at the same position
Incomplete NCBI entries have prevented full
verification of the reported data
Unfortunately, not enough data is available for
research to be done on IRES sites

All IRES sites must be determined empirically as no
strict pattern has been shown to exist yet
Think of tomorrow…
 Currently
analyzing various oncogenes,
especially the NRAS to find out if the
Mapper successfully maps the
conservative QGRS
 GRS UTRdb is currently being built as
well, making it possible for large
calculations to be applied to mapped data
 The design of this database is…
Entity Relationship Diagrams!
 ORACLE
Certified!
 Shows the
relationships between
different tables in the
database
 Is currently being
populated, and is not
yet public
Want to try mapping yourself?
 Go
to http://bioinformatics.ramapu.edu/QGRS2/index.php
 While the Mapper program is publicly
available, the database is still not ready for
public access
Related References