Download Presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Low-voltage differential signaling wikipedia , lookup

IEEE 1355 wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Transcript
SERIALIZED DATA STORAGE
Within a Database
James Devens (devensj)
THE IDEA

Serialized data can be used to store the current state of objects
in a database.

Good alternative to deprecated object based databases.

Storing separate data values into a single byte array.
TOOLS USED

MySQL Workbench

DigitalOcean Server Hosting

PuTTY

WinSCP

Microsoft Excel & PowerPoint

Vim (Java Source)

Protocol Buffers (Google)

JDBC (Java Database Connectivity)

United States 2000 Census
PREDICTIONS

Data will usually take less storage as byte arrays.

Data will take less time to do basic queries (non-indexed
database).

Serialized data will be harder to access in a relational database.

It can defeat the purpose of relational databases
DATABASE STRUCTURE

Census Table

Census_ pb Table
INSERTING DATA


Data inserted into both tables using JDBC Prepared Statements

Prevents SQL injections

Allows similar queries to execute FASTER
Serialized data through the use of Protocol Buffers

Developed by Google

More secure and portable than Java serialization
INSERTING DATA (NON-SERIALIZED)
INSERTING DATA (SERIALIZED)
QUERYING DATA

Use an array of names

Each of these names will be queried

This process repeats however many times specified (default 1000)

Number of Queries = NumLoops * Names.length * 2
QUERYING DATA
DATA COLLECTION

Modified the simple query class to record data

Exported to .csv for Microsoft Excel

Each data sample consisted of 5 names being queried 10000
times

5000 data samples were taken

Number of Queries = 50000 * 5000 * 2 = 500,000,000 queries
DATA COLLECTION
RESULTS (INSERTS)


Results:
Non-Serialized

INSERT Dump Success!

Took: 204651 ms to complete.

Serialized

INSERT Dump Success!

Took: 190233 ms to complete.
RESULTS (DATA COLLECTION)

Results:

Took 27623887 ms to complete (7.67 hours).

5000 loops, and 500000000 queries executed.
RESULTS (DATA COLLECTION)
Every 50,000 Queries
RESULTS (STORAGE)

Non-Serialized Data Space

Serialized Data Space
4194303 Byte (4.19 MB) Difference
CONCLUSION

Data storage is reduced quite a bit, making it efficient to store
serialized data

The query speeds were roughly the same

Serialization is good way to store object states

Serialization is NOT a good way to store frequently changing objects


If an object class is modified it would ruin all of your current data
It is NOT relational friendly (for the most part)

You cannot access the original data values inside the byte array without
another program’s help
FUTURE WORK

Write a program to return the byte array back to the original
object (easy)

Use a different .proto file with tons of data values (e.g. 2000
doubles)

Find more test statistics and collect more data

Index the data to see how it affects query speeds of both
methods