Download View/Download - BYU Data Extraction Research Group

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Enabling the
Distributed Family Tree
Thesis Proposal
November 10, 2006
There’s a lot of Genealogy on the
Web




Databases (records, images, results)
GEDCOM files
Family Websites
Genealogy Wikis
Cyndi’s List
262,200+ links
WeRelate.org
1.3 million sources
2
Nevertheless…
Genealogical data is isolated, causing:


Duplication of prior work
Unnecessary stalls at dead ends
3
Distributed Family Tree (DFT)
Network of genealogical data and metadata:





Machine-understandable
Open
Standards-based
Extensible
Scalable
4
Obstacles



Inadequate Search Interfaces
Isolated Pedigrees
Chicken-and-Egg Dilemma
5
Plan of Attack
Inadequate Search Interfaces
 Natural Language Search Interface
Isolated Pedigrees
 Semi-automatic Lineage Linkage
Chicken-and-Egg Dilemma
 Real-time Data Extraction
6
Thesis Statement
Enable the Distributed Family Tree:
1.
2.
3.
4.
Graph-based data model
Communications protocol
Server software
Extensible client software
7
Genealogy Core Data
“Sarah /Baker/”
“Mark /Baker/”
name
name
#sarah
married
#marriage
place
married
#mark
“Boston”
date
“December 22, 1868”
gaveBirth
fathered
#birthOfSamuel
born
place
date
“Chicago”
sex
“Male”
“April 17, 1873”
name
“Samuel /Baker/”
#samuel
8
Genealogy Provenance Metadata
#william
name
“William /Roberts/”
source
“Male”
sex
“City of Detroit”
#fromBirthCertificate
#gedcom
imported
source
“Public Records”
#birthCertificate
author
publication
image
“http://www.example.com/genealogy/cert.png”
#fromGedcom
“October 5, 2006”
9
Genealogy Trust Metadata
“Lawyer”
occupation
“John /Morris/”
name
#birthOfJohn
born
place
date
#john
born
“Houston”
“Houston”
“April 13, 1923”
“June 5, 1903”
#trueBirthOfJohn
place
date
#falseData
#trueData
trust
trust
0.1
0.9
comment
“I’m sure that this information is correct because I have
in my possession John’s original birth certificate.”
#trustDecisions
10
Communications Protocol



Query
Synchronize
Pingback
11
Server Software




Code-named Valhalla
Simple data store
Partitioned user accounts
Restricted access to living records
12
Client Software
Code-named Genesis
Three primary functions:



Data Entry
Search
Inference
13
Data Entry in Genesis


Minimal record manager functionality
Web page data extraction
(using extraction ontologies from the Data Extraction Group)
14
Search in Genesis

Natural Language Queries
(using ontology-based query processing from the Data Extraction Group)

Anticipatory Search
15
Inference in Genesis


Manual non-destructive merges
Semi-automatic lineage linkage
(using record linkage from the Data Mining Lab)
16
Validation
Test installation of Valhalla on five servers:




Distribute genealogy, provenance, and trust
information
Establish links
Seamlessly browse
Demonstrate functioning plug-ins
17
Imagine the possibilities…





Instant gratification
Single search
Accidental collaboration
Deliberate collaboration
Accessible anywhere, anytime
18
Questions?
Progress updates and pre-release software available at
http://blog.nucleartoiletpaper.com/dft
19
Related documents