Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
AdaptiveSchemaDatabases
WilliamSpothb,Bahareh SadatArabi,EricS.Chano,DieterGawlicko,
AdelGhoneimyo,BorisGlavici,BedaHammerschmidto,OliverKennedyb,
Seokki Leei,ZhenHua Liuo,XingNiui,YingYangb
b:UniversityatBuffalo i:IllinoisInst.Tech. o:Oracle
1
AdaptiveSchemaDatabases
2
Classicrelationaldatabase
• Navigationalandorganizationalpurpose
retaindiscovery,goodperformanceandspace,reusable.
3
Classicrelationaldatabase
• But...Highupfrontcostandinflexible
4
BigData/NOSQL
• Datacanbeusedimmediately.
5
BigData/NOSQL
• But...SacrificenavigationalandPerformancebenefit
andmayendupwithduplicateofwork
6
AdaptiveSchemaDatabases
• BridgethegapbetweenrelationaldatabaseandNoSQl.
Queriesand
feedback...
eventually
7
AdaptiveSchemaDatabases
• BridgethegapbetweenrelationaldatabaseandNoSQl.
Queriesand
feedback...
eventually
8
AdaptiveSchemaDatabases
Input:
Queries:
SELECTnameFROMUndergradUNION
SELECTnameFROMGrad
SELECTdeg FROMGrad
SELECTnameFROMStudent
…
9
Outline
• Extractionanddiscovery
• Adaptive,personalizedschemas
fromqueries
• Explanationsandfeedback
• Adaptiveorganization
• Conclusionsandfuturework
Queries + Feedback
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema Matching
Extraction Schema Candidates
Extraction workflow
Unstructured Data
Extraction workflow
Extraction workflow
Semi-structed Data (e.g., JSON)
10
Extraction
Queries + Feedback
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema Matching
Extraction Schema Candidates
Extraction workflow
Unstructured Data
Extraction workflow
Extraction workflow
Semi-structed Data (e.g., JSON)
11
Extraction
•
ASDextractsschemacandidateset
Giveninput:
12
Extraction
•
ASDextractsschemacandidateset
Giveninput:
13
Extraction
•
ASDextractsschemacandidateset
Giveninput:
14
Extraction
•
ASDextractsschemacandidateset
Giveninput:
15
Discovery
•
ASDextractsschemacandidateset
schemacandidatesetCext={Sext,Pext},
whereSext isasetofcandidateschemas,
Pext isaprobabilitydistributionovertheseschemas.
16
Discovery
•
ASDextractsschemacandidateset
Smax:
thebestguessschema
17
Adaptive,personalizedschemasfrom
queries
Queries + Feedback
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema Matching
Extraction Schema Candidates
Extraction workflow
Unstructured Data
Extraction workflow
Extraction workflow
Semi-structed Data (e.g., JSON)
18
Adaptive,personalizedschemas
•
ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Initially,W={}
19
FindingSchemasfromQueries
•
ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Query1:SELECTname FROMUndergrad UNION
SELECTname FROMGrad
20
FindingSchemasfromQueries
•
ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Query1:SELECTname FROMUndergrad UNION
SELECTname FROMGrad
21
FindingSchemasfromQueries
•
ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Query2:SELECTdeg FROMGrad
22
SynthesizingTables
•
ASDmaintainsasetofschemaworkspacesW={W1,...,Wn}.
Query3:SELECTnameFROMStudent
W1 =(S1={Undergrad(name)},P1=0.27),
(S1={Grad(name)},P1=0.23),
(S1={Undergrad(name), Grad(name)},P1=0.5)
23
Explanationsandfeedback
Queries + Feedback
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema Matching
Extraction Schema Candidates
Extraction workflow
Unstructured Data
Extraction workflow
Extraction workflow
Semi-structed Data (e.g., JSON)
24
Whatmightgowrong
Extractionerrorsappearinthreeforms:
(1)AqueryincompatiblewithSmax
(2)AnupdatewithdatathatviolatesSmax
(3)Anextractionerrorpresentedtouser
Weprovide:(1)explanationofresults
(2)provenance
(3)Warn theanalystwithambiguity
(4)Explain theambiguity
(5)Evaluate themagnitudeofambiguity
(6)Assisttheanalysttoresolve theambiguity
25
Typesoferrors
ASDinteractswiththeoutsideworld:Schema,Data,andUpdate.
Schemainteractions:WhenaqueryincompatiblewithSmax andthe
workspace
Datainteractions:provenanceforattributeandrowlevelambiguity.
Updateinteractions:
• representschemamismatchesasmissingvalues.
• resolvedataerrorswithaprobabilisticrepair.
• upgradeherschematomatchthechanges.
• checkpointherworkspaceandignorenewupdates.
26
Explanationsandfeedback
Condition2:Queryfromunknown schemaelements:
SELECTnameFROMStudent
W1 =(S1={Undergrad(name)},P1=0.27),
(S1={Grad(name)},P1=0.23),
(S1={Undergrad(name), Grad(name)},P1=0.5)
Explanations:
WematchStudentwith
bothGradandUndergrad
27
Adaptiveorganization
Queries + Feedback
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema
Workspace
Schema Matching
Extraction Schema Candidates
Extraction workflow
Unstructured Data
Extraction workflow
Extraction workflow
Semi-structed Data (e.g., JSON)
28
Adaptiveorganization
Trade-offbetweenstoringdatainitsnativeformatandbasedon
aspecificschema.
Whatisthechallenge?Manyworkspaces,addtabletothe
schema,….
ChallengesandPossibleSolutions:
• Wewantmultiplepersonalizedschemas
1.Relationalworkspaceschemaisessentiallyaviewoverrawdata.
Materializingviewcanbeused.
2.Useexistingadaptivephysicaldesign andcaching techniques.
• Sharedmaterializations
1.Incrementalmaterializedviewmaintenance.Leveragetechniquesfrom
revisioncontrolsystems.
2.Viewselectionproblem.
29
Conclusionsandfuturework
ASDbridgesthegapbetweenrelationaldatabasesand
NoSQL.
•
•
•
Discovery:Helpuserexploreandunderstandnewdatabyprovidingan
outlineoftheavailableinformation.Done
Materialization:Adoptworkonadaptivedatastructures.Partiallydone
DataSynthesis:Synthesisnewtablesandattributesfromexistingdata.
Done
•
ConflictResponse:
– Versioningorbranchingtheschema.
– Loganalysistohelpusersassesstheimpactofschemarevisions.
30