Download CSE 662 * Languages and Databases Project Overview Part 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

ContactPoint wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Versant Object Database wikipedia , lookup

Database model wikipedia , lookup

Transcript
CSE 662 – Languages and Databases
Project Overview Part 1
Oliver Kennedy & Lukasz Ziarek
Announcements
• Scientista Brunch
• Today: 12 – 1 113A Davis Hall
• The Scientista Foundation is a national
organization that empowers pre-professional
women in STEM.
Types of Projects
•
•
•
•
Benchmarking and Evaluation
Replication of Research Results
Building of Specialized Databases
Research Oriented
Checkpoint Expectations
• 10+ page report
• Design / algorithms
• Limitations
• Work plan
• Related work
• Current state of the implementation brief
• Code / scripts written thus far
• Preliminary numbers
Final Project Expectations
•
•
•
•
•
•
15+ page report
Implementation and Documentation
Unit tests and benchmarks
Build scripts
Automated evaluation scripts
Automated plotting scripts
Embedded Database Benchmark
• Lightweight, embedded databases are increasingly
being used to store structured application
state. Commonly used embedded databases include
SQLite, BerkeleyDB, Derby, HSQLDB, and H2. Each
database engine targets different workload/usage
patterns. In this project, you will design a microbenchmark workload and then perform a comparison
between several embedded database engines, and
analyze the results in a detailed report.
Embedded Database Benchmark
• Languages Required: Java, some C/C++, A language like
Python or R
• First Steps: Install all 5 embedded databases and evaluate
them using YCSB's 6 default workloads (workloads A-F)
• Expected Outcomes: (1) A report detailing the strengths and
weaknesses of each system evaluated. (2) The software used
to perform the benchmarks.
• Preliminary Questions: What types of workloads do you
expect embedded databases to be used on? What kinds of
access patterns do these workloads create? How would you
evaluate each database's performance on these access
patterns?
PocketData Benchmark
• Lightweight, embedded databases are increasingly
being used to store structured application state,
especially on mobile devices. We have permission
to analyze traces of (anonymized) query logs from
PhoneLab phones. In this project, you will use
these traces as the basis for a macro-benchmark
workload- and data-generator that simulates query
patterns on a mobile phone.
PocketData Benchmark
• Languages Required: Probably a scripting language like
Ruby/Python
• First Steps: Pick 3-4 high-volume applications in the 11-phone
PhoneLab trace, label each of the queries issued by each
application by the query's intent, and define a query "pattern" for
each intent.
• Expected Outcomes: (1) A workload- and data-generator
simulating the behavior of a mobile app's query workload. (2) A
report outlining the design of the data and workload generators,
and a lightweight evaluation of one or more database engines on
this workload.
• Preliminary Questions: What are features of the query workload
that need to be emulated? What constitutes a "realistic"
simulation (i.e., how do you judge whether your generators are
successful or not)?
Replicate the Dietrich Paper
• The Uncracked Pieces in Database Cracking" by
Schuhknecht et al (
http://dl.acm.org/citation.cfm?id=2732229 ) describes a
thorough evaluation of several different index types,
including among others, Cracker Indexes. In this project,
you will implement a variety of simple in-memory index
structures such as Cracker Indexes, Hybrid Cracker
Indexes, BTree Indexes, and possibly others. Using these
index structures, you will attempt to replicate
Schuhknecht et al's results and further generalize on
them.
Replicate the Dietrich Paper
• Languages Required: A complied, non-GC language like C, C++
or Rust
• First Steps: Read "The Uncracked Pieces in Database
Cracking" Implement a trivial in-memory BTree, Hash Table,
and Cracker Index.
• Expected Outcomes: (1) A report outlining how you validated
of Schuhknecht et. al.'s results, and detailing any additional
findings you discovered in the process.
• Preliminary Questions: How did Schuhknecht et al evaluate
cracker indexes? What are their main takeaways? Do these
findings create any new questions to be answered?
LLVM Query Runtime
• Several recent academic database systems such as
HyPer ( http://hyper-db.com ) and DBToaster (
http://www.dbtoaster.org ) use compilers to
accelerate query processing by translating query
plans into machine code before executing it. In this
project, you will do the same, creating a simple
compiler (along the lines of the CSE 562 query
processor) that generates machine code using LLVM
(a highly extensible compiler toolchain).
LLVM Query Runtime
• Languages Required: Java, Scala, or C/C++
• First Steps: Write a program that constructs a toy
program in LLVM Bytecode (e.g., print out a
hardcoded tuple)
• Expected Outcomes: (1) A JIT-compiled query
engine. (2) A report evaluating the performance of
your engine.
Lightweight Runtimes
• Small, lightweight database engines are becoming
increasingly important as data migrates to lowpower devices like smartphones. In this project,
you will implement a simple query evaluation
engine on an Intel Galileo, a low-powered
computing platform.
Lightweight Runtimes
• Languages Required: Java (?), C++ or C
• First Steps: Get to the point where you can SSH into the
Galileo board and run code.
• Expected Outcomes: (1) A query processor that runs on
the Galileo board. (2) Documentation of what changes
need to be made.
• Preliminary Questions: How will the low-CPU, and lowmemory capabilities of the Galileo board affect the
query processor's design? The board has limited internal
memory, where should the data be stored?
Policy Exploration for JITDs
• Just-in-Time Data Structures assemble simple
building-blocks into more complex data structures
using policies (typically defined as a set of rewrite
rules). A key feature is that the policy can change
in response to changing workloads. In this project,
you will design and evaluate one or more JITD
policies. This project will be available to two
groups: One group will work with the Java
implementation, and the other group will work with
the C implementation.
Policy Exploration for JITDs
• Languages Required: Java (group A) and C (group B)
• First Steps: Familiarize yourself with the implementation
by coming up with a simple policy and implementing it
(e.g., A LSM Tree, a prefix trie or a hash table)
• Expected Outcomes: (1) One or more policy
implementations in the JITD framework. (2) A report
detailing the design of your policy, and discussing your
evaluation of the policy/policies.
• Preliminary Questions: What types of workloads will
your policy (or policies) target? How does your policy
change in response to changes in the workload?
JITDs on Disk
• Just-in-Time Datastructures use a DAG-based data
representation that creates many random read accesses
(assuming sequential writes). When used as an inmemory data structure, this is ok. However, when
written directly to a sequential medium like hard disks
(or even SSDs), this can cause significant performance
impacts. In this project, you will modify either JITD
implementation to persist the index to disk in a way that
makes it possible to efficiently reconstruct the index as
needed.
JITDs on Disk
• Languages Required: Java, C++ or C
• First Steps: Using raw IO, or a storage layer like
BerkeleyDB or similar, add support for paging out Cogs to
one of the
• Expected Outcomes: (1) An implementation of a
persistent JITD. (2) A report evaluating the persistent
data structure against existing datastructures
• Preliminary Questions: How can a JITD's random
accesses be converted into sequential accesses? What
are efficient ways of compacting the log? What happens
when the data structure gets bigger than memory?