Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited Problem Statement • First there was data overload. • Now there is an over abundance of tool power. Information Types Sterling Software Announces 2-For-1 Stock Split DALLAS, Texas (March 11, 1998) - Sterling Software, Inc. (SSW-NYSE) today announced that its Board of Directors has approved a 2-for-1 split of the company’s common stock. Stockholders will receive one additional common share for every share held on the record date of March 20, 1998. The additional shares will be issued on April 3, 1998. Sterling Software currently has approximately First 38.8Name millionLast shares of common Name Project stock Phoneoutstanding. # Office # This number will double to approximately 77.6Michael million shares by reason of the stock split. 249 Brenton MDITDS x7060 Kornegay Harold MDITDS x7049 221 Sterling L. Williams, president and chief executive officer of Sterling Software commented, "Sterling Software’s stock price increased 30% during 1997 and 28% so far this year, based on consistently excellent performance by the company. We decided to split our stock to improve its trading andDefense to help ensure that it trades in a price range that is accessible to a broad base The liquidity Migration Intelligence of Threat investors." Data System (MDITDS) is a Department of Defense Intelligence Sterling Software is a leading provider of software and services for the applications management, Information Systems (DODIIS) systems management and federal systems markets. Sterling Software, with its headquarters in designated migration system tasked to than 20,000 customer sites and has 3,100 Dallas, has a worldwide installed base of more provideinthe production employees 85automated offices worldwide. For more information on Sterling Software, visit the company’s Webthe siteDODIIS at http://www.sterling.com. system for Indications and Warnings (I&W), Counterintelligence Contact: (CI), Anti-terrorism (AT), Julie Kupp Counterterrorism (CT), Information Vice President, Investor Relations Warfare (IW), Arms Proliferation (AP), Sterling Software, Inc. (214) 981-1000 and [email protected] Defense Industry (DI) communities. ©Copyright Sterling Software, 1998 All rights reserved Open Source Materials • Electronic Information – – – – – – Library Services On-line Newspapers On-line Reports Information Brokers CD-ROM Products Wire Services • Agents – Services - People – Services - Push and Punch – Spiders, Crawlers, and Profilers Tools • Data Warehouses • Concept Analysis and Summarization • Vectors, Clustering, Histograms • Data Mining • OLAP • Statistical Analysis • Visualization • Information Extraction • Temporal Analysis • Link Analysis Data Warehouses • Data warehousing is an emerging technology that supports non-operational application areas like management information systems, decision support, and data mining. • A data warehouse is a database that provides efficient and integrated access to relevant analytical data. Department of Information Science - The Aarhus School of Business Memex Information Engine and Client Applications Memex Network Query Tool File Edit View Insert Memex Network Query Tool File Edit 4 -- View 65% Insert -- Munitions on Tactical Fighters 6 -- 50% -- Smart Munitions 7 -- 45% -- Air Dropped Land Mines Text Search: Products Memex Network Query 2 --Tool 85% -- UK Air Power and•Country NATO Profiles • Group (Unit) File Edit View Insert 1 -- 90% -- Air Power Over Bosnia Profiles • Individual Profiles • Incidents (Events) -- 70% -- Air Power Assessment Field 3Search: • Misc. Assessments Name: __________________________ • All 1 -- 90% -- Air Power Over Bosnia Fighters and 5 -- 50% -- Tactical LSB Incident Type: ____________________ Domains 2 -- 85% -- Organization UK Air Power and NATO Type: ________________ • Counter Intelligence Equipment: ______________________ • Counter Terrorism 3 -- 70% -- Air Power Assessment Start Date:_________ Stop • Force Protection • Arms Proliferation 4 -- 65% -- Date:_________ Munitions on Tactical Fighters • Defense Industries • Indications & Warning 5 -- 50% -- Tactical Fighters and LSB • All 6 -- 50% -- Smart Munitions 7 -- 45% -- Air Dropped Land Mines Network DIA EUCOM JICPAC SOUTHCOM CENTCOM STRATCOM SPACECOM TRANSCOM Concept Analysis and Summarization • Concept analysis is the process of matching keywords in the text to hierarchical topic trees in order to determine the major theme(s) in the document, paragraph, or sentence. • Some systems use this information and predetermined “templates” to build summaries of a document. • The concepts and summaries are then used to route documents to analysts. Vectors, Clustering, and Histograms • Document clustering is a technique for automatically discovering the subtopics in a set of documents and grouping the documents by those subtopics. • Organizing documents by subtopic can help you get a sense of the major subject areas covered in the document set… Verity, Inc. Data Mining • Data mining is the analysis of data for relationships that have not previously been discovered. • For example, the sales records for a particular brand of tennis racket might, if sufficiently analyzed and related to other market data, reveal a seasonal correlation with the purchase by the same parties of golf equipment. whatis.com Inc. OLAP • OLAP (online analytical processing) enables a user to easily and selectively extract and view data from different points-of-view. • For example, display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, 1997, then compare revenue figures with those for the same products in July, 1996, and then etc. whatis.com Inc. Statistical Analysis • The collection, classification, and interpretation of numerical data. • Elements of statistics are present in most OLAP tool sets. • Functions include: Frequency Distribution, Average, Mean, Standard, Deviations, etc. • Functions found in most spreadsheet applications. Visualization • Visualization is the process of representing abstract business or scientific data as images that can aid in understanding the meaning of the data. • Visual computing is computing that lets you interact with and control work by through visualization. whatis.com Inc. Information Extraction • Automated information extraction involves the identification and extraction of information about specified classes of events and the filling of templates for each instance of such an event. • Operates against pure text. • Also known as NLU or NLP. • Naval Research and Development group (NRaD) of NOSC Temporal Analysis • Temporal analysis is the process of evaluating information, events and activities in light of models which encompass the concept of time or sequence and time. • Model sequences incorporate a timeframe constraint on the identified events. Link Analysis • Link analysis provided the ability to investigate relationships between people, places, events, and things. • Ideally, it is a mechanism to “walk through” a data warehouse following those links which have meaning relevant to the immediate problem. Tools are nice but... • There has to be a reason: • Analysis of operational data • Analysis of associated data • Discovering new relationships • Discovering new trends • Gaining new insights into your business • Competitive Edge Different Tools for Different Kinds of Discovery Information Extraction • Translating text reports (prose) into “tagged data” • Evaluating the tagged data to extract information • Commonly referred to as Natural Language Understanding or Processing A Focus on the Analysis of Textual Information • Typical process flow – Receipt – Auto-analysis • Classification • Extraction – Archive – Visualization Ten-PlusYear Repository Wire Service Analyze Review Process Traffic Receipt Analyst Queues Government Traffic Ignore Update Assessment Think Update Queue Profiles Making the Information Usable Sterling Software Announces 2-For-1 Stock Split DALLAS, Texas (March 11, 1998) - Mr. Sterling Williams of Sterling Software, Inc. (SSW-NYSE) today announced that the companies Board of Directors has approved a 2-for-1 split of the company’s common stock. Stockholders will receive one additional common share for every share held on the record date of March 20, 1998. The additional shares will be issued on April 3, 1998. Sterling Software currently has approximately 38.8 million shares of common stock outstanding. This number will double to approximately 77.6 million shares by reason of the stock split. Org Group Group Location Object Object Date Date Date Event Event Sterling Software Board of Directors Stockholders Dallas, Texas stock shares stock shares 11-Mar-98 3-Apr-98 20-Mar-98 meeting stock split US Corporation Sterling Software Sterling Software 38.8 million 77.6 million Board of Directors 20-Mar-98 Information Extraction is not Information Retrieval Information retrieval gets sets of relevant documents -- you analyze the documents Information extraction gets facts out of documents -- you analyze the facts Natural Language Processing Group, The University of Sheffield Why is Information Extraction Difficult • There are many ways of expressing the same fact: – BNC Holdings Inc named Ms G Torretta as its new chairman. – Nicholas Andrews was succeeded by Gina Torretta as chairman of BNC Holdings Inc. – Ms. Gina Torretta took the helm at BNC Holdings Inc. • Information may need to be combined across several sentences: – After a long boardroom struggle, Mr Andrews stepped down as chairman of BNC Holdings Inc. He was succeeded by Ms Torretta. Natural Language Processing Group, The University of Sheffield Information Extraction (Document) • Natural Language Understanding Article Lexical Analysis Reduction Simple Relations Common Events Coreference Records Domain Events Correlation of • The events in a single document Extracted are relevant to routing the Information document, (Other • But a single meeting (event) put Documents) in context of other meetings (events) becomes much more useful. • Manual vs. Automated Process • User interest profiles, e.g., – Membership – Meeting (Communication) Events – Relocation (Movement) Events Using Correlated Data (Mining Text (or other) Databases) • What would the user do if they knew how toMonitor use the the success visualizationoftools? the process and feed back the • Automate the process: results into the system. – Use names of people and organizations for data mining. – Use temporal analysis to align (chronologically) the events. – Use link analysis to establish networks of people and things, e.g., vehicles. • Present the user with organized information. Summary • Still faced with a tremendous amount of data. • Tools are available for acquiring information relevant to your business. • Tools to perform data mining over a substantial data warehouse require a commitment to: – – – – Money Time Training Personnel • The results are: Thank you Mike Brenton Sterling Software www.sterling.com [email protected] ------------Memex Technology Limited www.memex.co.uk -----------Jim Basara Memex, Inc. [email protected]