Download Knowledge Discovery and Dissemination (KDD) Program

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Knowledge Discovery and
Dissemination (KDD) Program
IARPA-BAA-09-10
Question Period: 22 Dec 09 – 2 Feb 10
Proposal Due Date: 16 Feb 10
SYNOPSIS
• Problems to solve
• System requirements
• Team requirements
Intelligence analysts must gather and analyze information from a wide variety of data sets that include: general references, news,
technical journals and reports, geospatial data, entity databases, internal reports and more. The different terminologies, formats,
data models, and contexts make it difficult to perform advanced analytic tasks across different data sets.
If there are only a small, fixed number of data sets involved in an intelligence problem, then it may be practical to map all of the data
sets to a common data model and to develop specialized analytic tools tailored to the problem. However, if the problem changes
over time, the data sets are large or numerous, or there are new data sets that need to be integrated with those already in use, then
a new approach is required. The focus of the KDD program is to develop novel approaches that will enable the intelligence analyst to
effectively derive actionable intelligence from multiple, large, disparate sources of information, to include newly available data sets
previously unknown to the analyst.
The ability to quickly produce actionable intelligence from unanticipated, multiple, varied data sets require research advances in two
key areas: (1) alignment of data models; and (2) advanced analytic algorithms. Making advances in these two research areas, and
fully characterizing the performance of the research results using real Intelligence problems, is the focus of the IARPA Knowledge
Discovery and Dissemination (KDD) Program. Performers shall perform research in both areas and develop prototype systems that
implement their techniques and research results. The KDD Program will provide data sets to support research and development in
addition to extensive test and evaluation.
KDD test and evaluation will take place on an annual cycle, with each performer applying their prototype systems to challenge
problems defined by the KDD Program. KDD evaluation of prototype systems will take place at government facilities and will use
realistic Intelligence problems and real Intelligence data. The research supported by KDD will generally be unclassified, but the
annual KDD evaluations will involve data sets classified no higher than SECRET//NOFORN.
The KDD Program requires a combination of innovative research and the capability to develop robust prototypes. Research goals
should be set, and research plans should be made, to take full advantage of the length of the KDD Program. The KDD Program
expects a staged approach to prototype development; each successive prototype will leverage research progress made since the
previous prototype. IARPA encourages teaming between academic and commercial entities to leverage the strengths of both types
of organization. KDD requires the prime contractor for each performer team to have personnel and a facility cleared at the
SECRET//NOFORN level at the time their proposal is submitted.
KDD is planned as a 51-month program and anticipates making multiple awards.
All correspondence must be submitted to the BAA e-mail address at [email protected]. All proposals should be
submitted in accordance with the instructions in Section 4.C. of the BAA.
Our Answers to Problems to Solve
• Wide variety of data sets
–
–
–
–
General references – the Web? CIA World Factbook? ... (OntoES, FOCIH, TISP)
Free-running text – news, technical journals (WePS, [Embley09], Ancestry.com)
Geospatial data ([Embley89b])
Entity databases (RelDB[Embley97], XML[Al-Kamha07,Al-Kamha08],
IMS=heirarchical[Mok06,Mok10], Network=graph=OSM, OWL/RDF[Ding-converter])
– Reports (Filled-in forms and semi-structured data [Tao09,Liddle99],TANGO)
– And more (Attensity?)
• Large and numerous data sets (extension to large and additional types; performance)
• Alignment of data models (TANGO)
– Schema mapping ([Xu03,Xu06], …)
– Data integration ([Biskup03])
– Semantic enrichment (MOGO)
• Advanced analytic algorithms (Giraud-Carrier: knowledge-based semantic distance,
record linkage, and hybrid social networks; best-effort, quick answers [Zitzelberger thesis])
• Performance ([Al-Muhammed07b], IS/Liddle, Attensity?)
Our Answers to System Requirements
• Prototype development (IS/Liddle/EnticeLabs liaison with
Attensity where actual prototype would be developed)
• Data set storage and usage (Dithers currently 190Gbyte, 23Tbyte easily obtained, 1Pbyte for $120,000; Attensity?)
• Ability to develop innovative research and a
robust prototype (BYU primarily innovative research; Attensity
primarily robust prototype; but synergistic R&D)
• Staged prototype development (Heart of research plan)
– Leverage previous prototype
– Show improvement at each stage
Our Answers to Team Requirements
• Academic: BYU
–
–
–
–
Computer Science (Embley)
Information Systems Management (Liddle)
Computational Linguistics (Lonsdale)
Data Mining (Giraud-Carrier)
• Commercial: Attensity
Attensity's technology is the culmination of decades of research in the
areas of search, natural language processing (NLP,) machine learning,
artificial intelligence, and semantics. This research led to breakthrough
software that allows computers to understand, process and analyze
free-form text, offering government and commercial organizations the
opportunity to leverage the vast amounts of information contained in
non-structured formats.
SECRET//NOFORN Clearance
• BYU? BYU Personnel?
– Embley, Liddle (US)
– Lonsdale, Giraud-Carrier (NOFORN Clearance?)
• Attensity? Attensity Personnel?