Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Enabling the Distributed Family Tree Thesis Proposal November 10, 2006 There’s a lot of Genealogy on the Web Databases (records, images, results) GEDCOM files Family Websites Genealogy Wikis Cyndi’s List 262,200+ links WeRelate.org 1.3 million sources 2 Nevertheless… Genealogical data is isolated, causing: Duplication of prior work Unnecessary stalls at dead ends 3 Distributed Family Tree (DFT) Network of genealogical data and metadata: Machine-understandable Open Standards-based Extensible Scalable 4 Obstacles Inadequate Search Interfaces Isolated Pedigrees Chicken-and-Egg Dilemma 5 Plan of Attack Inadequate Search Interfaces Natural Language Search Interface Isolated Pedigrees Semi-automatic Lineage Linkage Chicken-and-Egg Dilemma Real-time Data Extraction 6 Thesis Statement Enable the Distributed Family Tree: 1. 2. 3. 4. Graph-based data model Communications protocol Server software Extensible client software 7 Genealogy Core Data “Sarah /Baker/” “Mark /Baker/” name name #sarah married #marriage place married #mark “Boston” date “December 22, 1868” gaveBirth fathered #birthOfSamuel born place date “Chicago” sex “Male” “April 17, 1873” name “Samuel /Baker/” #samuel 8 Genealogy Provenance Metadata #william name “William /Roberts/” source “Male” sex “City of Detroit” #fromBirthCertificate #gedcom imported source “Public Records” #birthCertificate author publication image “http://www.example.com/genealogy/cert.png” #fromGedcom “October 5, 2006” 9 Genealogy Trust Metadata “Lawyer” occupation “John /Morris/” name #birthOfJohn born place date #john born “Houston” “Houston” “April 13, 1923” “June 5, 1903” #trueBirthOfJohn place date #falseData #trueData trust trust 0.1 0.9 comment “I’m sure that this information is correct because I have in my possession John’s original birth certificate.” #trustDecisions 10 Communications Protocol Query Synchronize Pingback 11 Server Software Code-named Valhalla Simple data store Partitioned user accounts Restricted access to living records 12 Client Software Code-named Genesis Three primary functions: Data Entry Search Inference 13 Data Entry in Genesis Minimal record manager functionality Web page data extraction (using extraction ontologies from the Data Extraction Group) 14 Search in Genesis Natural Language Queries (using ontology-based query processing from the Data Extraction Group) Anticipatory Search 15 Inference in Genesis Manual non-destructive merges Semi-automatic lineage linkage (using record linkage from the Data Mining Lab) 16 Validation Test installation of Valhalla on five servers: Distribute genealogy, provenance, and trust information Establish links Seamlessly browse Demonstrate functioning plug-ins 17 Imagine the possibilities… Instant gratification Single search Accidental collaboration Deliberate collaboration Accessible anywhere, anytime 18 Questions? Progress updates and pre-release software available at http://blog.nucleartoiletpaper.com/dft 19