Download Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
AdaptiveSchemaDatabases
WilliamSpothb,Bahareh SadatArabi,EricS.Chano,DieterGawlicko,
AdelGhoneimyo,BorisGlavici,BedaHammerschmidto,OliverKennedyb,
Seokki Leei,ZhenHua Liuo,XingNiui,YingYangb
b:UniversityatBuffalo i:IllinoisInst.Tech. o:Oracle
1
AdaptiveSchemaDatabases
2
Classicrelationaldatabase
• Navigationalandorganizationalpurpose
retaindiscovery,goodperformanceandspace,reusable.
3
Classicrelationaldatabase
• But...Highupfrontcostandinflexible
4
BigData/NOSQL
• Datacanbeusedimmediately.
5
BigData/NOSQL
• But...SacrificenavigationalandPerformancebenefit
andmayendupwithduplicateofwork
6
AdaptiveSchemaDatabases
• BridgethegapbetweenrelationaldatabaseandNoSQl.
Queriesand
feedback...
eventually
7
AdaptiveSchemaDatabases
• BridgethegapbetweenrelationaldatabaseandNoSQl.
Queriesand
feedback...
eventually
8
AdaptiveSchemaDatabases
Input:
Queries:
SELECTnameFROMUndergradUNION
SELECTnameFROMGrad
SELECTdeg FROMGrad
SELECTnameFROMStudent
…
9
Outline
• Extractionanddiscovery
• Adaptive,personalizedschemas
fromqueries
• Explanationsandfeedback
• Adaptiveorganization
• Conclusionsandfuturework
Queries + Feedback
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema Matching
Extraction Schema Candidates
Extraction workflow
Unstructured Data
Extraction workflow
Extraction workflow
Semi-structed Data (e.g., JSON)
10
Extraction
Queries + Feedback
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema Matching
Extraction Schema Candidates
Extraction workflow
Unstructured Data
Extraction workflow
Extraction workflow
Semi-structed Data (e.g., JSON)
11
Extraction
•
ASDextractsschemacandidateset
Giveninput:
12
Extraction
•
ASDextractsschemacandidateset
Giveninput:
13
Extraction
•
ASDextractsschemacandidateset
Giveninput:
14
Extraction
•
ASDextractsschemacandidateset
Giveninput:
15
Discovery
•
ASDextractsschemacandidateset
schemacandidatesetCext={Sext,Pext},
whereSext isasetofcandidateschemas,
Pext isaprobabilitydistributionovertheseschemas.
16
Discovery
•
ASDextractsschemacandidateset
Smax:
thebestguessschema
17
Adaptive,personalizedschemasfrom
queries
Queries + Feedback
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema Matching
Extraction Schema Candidates
Extraction workflow
Unstructured Data
Extraction workflow
Extraction workflow
Semi-structed Data (e.g., JSON)
18
Adaptive,personalizedschemas
•
ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Initially,W={}
19
FindingSchemasfromQueries
•
ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Query1:SELECTname FROMUndergrad UNION
SELECTname FROMGrad
20
FindingSchemasfromQueries
•
ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Query1:SELECTname FROMUndergrad UNION
SELECTname FROMGrad
21
FindingSchemasfromQueries
•
ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Query2:SELECTdeg FROMGrad
22
SynthesizingTables
•
ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Query3:SELECTnameFROMStudent
W1 =(S1={Undergrad(name)},P1=0.27),
(S1={Grad(name)},P1=0.23),
(S1={Undergrad(name), Grad(name)},P1=0.5)
23
Explanationsandfeedback
Queries + Feedback
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema Matching
Extraction Schema Candidates
Extraction workflow
Unstructured Data
Extraction workflow
Extraction workflow
Semi-structed Data (e.g., JSON)
24
Whatmightgowrong
Extractionerrorsappearinthreeforms:
(1)AqueryincompatiblewithSmax
(2)AnupdatewithdatathatviolatesSmax
(3)Anextractionerrorpresentedtouser
Weprovide:(1)explanationofresults
(2)provenance
(3)Warn theanalystwithambiguity
(4)Explain theambiguity
(5)Evaluate themagnitudeofambiguity
(6)Assisttheanalysttoresolve theambiguity
25
Typesoferrors
ASDinteractswiththeoutsideworld:Schema,Data,andUpdate.
Schemainteractions:WhenaqueryincompatiblewithSmax andthe
workspace
Datainteractions:provenanceforattributeandrowlevelambiguity.
Updateinteractions:
• representschemamismatchesasmissingvalues.
• resolvedataerrorswithaprobabilisticrepair.
• upgradeherschematomatchthechanges.
• checkpointherworkspaceandignorenewupdates.
26
Explanationsandfeedback
Condition2:Queryfromunknown schemaelements:
SELECTnameFROMStudent
W1 =(S1={Undergrad(name)},P1=0.27),
(S1={Grad(name)},P1=0.23),
(S1={Undergrad(name), Grad(name)},P1=0.5)
Explanations:
WematchStudentwith
bothGradandUndergrad
27
Adaptiveorganization
Queries + Feedback
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema Matching
Extraction Schema Candidates
Extraction workflow
Unstructured Data
Extraction workflow
Extraction workflow
Semi-structed Data (e.g., JSON)
28
Adaptiveorganization
Trade-offbetweenstoringdatainitsnativeformatandbasedon
aspecificschema.
Whatisthechallenge?Manyworkspaces,addtabletothe
schema,….
ChallengesandPossibleSolutions:
• Wewantmultiplepersonalizedschemas
1.Relationalworkspaceschemaisessentiallyaviewoverrawdata.
Materializingviewcanbeused.
2.Useexistingadaptivephysicaldesign andcaching techniques.
• Sharedmaterializations
1.Incrementalmaterializedviewmaintenance.Leveragetechniquesfrom
revisioncontrolsystems.
2.Viewselectionproblem.
29
Conclusionsandfuturework
ASDbridgesthegapbetweenrelationaldatabasesand
NoSQL.
•
•
•
Discovery:Helpuserexploreandunderstandnewdatabyprovidingan
outlineoftheavailableinformation.Done
Materialization:Adoptworkonadaptivedatastructures.Partiallydone
DataSynthesis:Synthesisnewtablesandattributesfromexistingdata.
Done
•
ConflictResponse:
– Versioningorbranchingtheschema.
– Loganalysistohelpusersassesstheimpactofschemarevisions.
30
Related documents