* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 20091014b_treehouse
Microsoft Access wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Oracle Database wikipedia , lookup
Ingres (database) wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational model wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
thai-language.com Glenn Slayden October 14, 2009 Agenda • • • • • • • Background and history Site surface demonstration Database ontology Database technology Data Entry demonstration Future directions Q&A : throughout please Overarching Motivation • Long-term objectives: –Increase linguistic rigor –Publish any new work –Maintain popular accessibility –Build community Historical Parchment - 1997 More Parchment - 2001 Site Demonstration Database? What Database • How big is a monolingual dictionary? • 100,000 words x 30 b/entry = 30 MB • How much memory in a modern server? 32GB. • That’s about 1/10th of 1% (.00094) • SQL? MySql? PostGres? Not indicated. Case Study October 13, 2009 – 64-bit web server – 32 GB RAM Server Memory Utilization n.b. this entire pie chart represents 10% of total memory In-memory is the way to go • • • • • For performance For ease and speed of development Easy refactoring LINQ – C# “language-integrated query” Have a flexible and powerful object-model without worrying about relational mapping • Completely avoid OR/M (object-relational mapping) “impedance mismatch” issues thai-language.com Ontology • Disclaimer and warning – Internal names of programming objects are not (any longer) intended to have any relationship to corresponding Linguistic terms. On the following slides please consider these names to be opaque monikers. thai-language.com Ontology Entry Definition Phrase Category These colors correspond (roughly) to data-entry screen colors in DBEdit The most basic Lucky Decision • ..that turned out to be incredibly valuable: – Heterogeneous objects are assigned ID numbers within mutually exclusive ranges Scary Picture with Clouds In It Data Entry Demonstration Future directions • Track provenance of entries and changes • Separate-out meta-information in English senses • Move towards community curatorship while maintaining asset value – Requires reputation-granting authority • Refine and formalize dictionary statement of purpose (i.e. to prevent hijacking) Technology Changes • In 2009, optimizing a language dictionary database for size is not necessary • Detailed fields should be generously deployed • Exception to the in-memory model: – Comprehensive change version tracking may warrant database storage – This is necessary for community curatorship An integrated DELPH-IN style computational-analytical grammar • Associate a rigorous HPSG feature structure with each sense • Display MRS and tree on dictionary page for compounds and sentences. • Ability to designate gold standard parse trees and attestation provenance • Live interface for LKB/PET-style parser to provide arbitrary parsing Thanks for Coming!