Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Individualized Knowledge Access David Karger Lynn Andrea Stein Web Search Tools Indices search by keyword Taxonomies A lot like libraries... Library catalogues Dewey Digital classify by subject Cool site of the day New book shelf, suggested reading Is a universal library enough? Library/Web Limitations Huge: too many answers, mostly irrelevant Only published material miss info known to few, leading-edge content Rigid: all get same search results even if come back and try again The library is the last place we look Bookshelves First My data: information gathered personally high quality, easy for me to understand not limited to publicly available content annotations My organization: choose own subject arrangement optimize for my kind of searching Adapts to my needs Then a Friend Leverage they organize information for their access so quickly find things for me Personal expertise they know things not in any library Trust their recommendations are good Shared vocabulary they know me and what I want Last the Library Answer usually there but hard to find would be nice to rearrange to my needs For hardest problems, need librarian they have broad knowledge of library but not as deep as an expert on question Lessons Individualized access: The best tools adapt to individual ways of organizing and seeking data. Individualized knowledge: People know much more than they publish. That knowledge is useful. Haystack: a Tool for Oxygen Independent but interacting repositories that adapt to their individual users Individualize access My data collection, organization My search tools, with answers for me Leverage individual knowledge Collaborative retrieval with others Motivate people to organize their data for their own benefit and thus for others’ Example Have probabilistic models been used in data mining? My haystack doesn’t know, but “probability” is in lots of mail I got from Tommi Jaakola Tommi told his haystack that “Bayesian” refers to “probability models” Tommi has read several papers on Bayesian methods in data mining His haystack suggests them to mine Research Threads Heterogeneous data and metadata archive whatever user wants Human-Computer Interaction let user express/use own organizational rules observe user to detect unexpressed knowledge Machine learning use gathered data to improve performance Collaborative filtering use others’ decisions to help me My data Haystack archives anything web pages browsed, email sent and received, documents written, scanned images, home directory, people known, projects worked on And any properties, relationships text of object (if know how) author, title, color, citations, quotations, annotations, quality, last usage Users freely adds types, relationships Gathering My Data Active user input interfaces let user add data, note relationships Mining data from haystack plug-in services opportunistically extract data e.g., find author/title/text in MSWord document or, detect that one document quotes another Observing user plug-ins to other interfaces report user actions web pages browsed, mail sent, queries made Adaptation Remember user’s attempts to tune a query instead of first query attempt, use last one record items user picked as good matches future similar queries do better right away Stored content shows what user knows/likes modify queries to big search engines filter results coming back personalized “cool site of the day” Collaborative Access Leverage others’ work organizing data no need to “publish” expertise exposed automatically self interest helps others Privacy/permission concerns allowing exposure easier than publishing much public info: mailing lists, papers read Whose opinions matter? people I mail, w/shared data, referrals collaborative filtering techniques Conclusion Libraries are not enough Haystack teases out individual knowledge Individualizes information access for user Exposes individual knowledge to benefit community Current status: individual-user prototype. Some data extraction, observation, adapting. Collaborative version in future.