Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
WP6: Software Platform and Tools Lead: UDE Partners: UMA, CICE, FriontiersIn Month 1 - Month 30 Overview  Bundles all activities related to the provision of a software platform hosting tools and services for data mining, crawling and social network analysis  Relies on existing tools, either free and open software or tools owned by the partners  First part: definition of crawling, data mining, storage strategy  Second part: Data transformation for SNA, definition of network based role model and evaluation of these models Specific objectives  Selection and evaluation of mining strategies  Specification of crawling approach and integration of crawlers  Specification and configuration of a software platform  Preparation / transformation of data for SNA  Specification and modelling of roles and constellations (SNA)  Data analyses and evaluation  Model revision and software adaptation T6.1 Crawler and mining strategy  Specify requirements for crawling and data mining based on the focused data sources and social models  flexible with respect crawling strategies to be adaptable also to the needs other work packages (esp. the case studies)  integrated and controlled by a framework which handles the storage of retrieved web objects and the notification of newly found relevant data and changes in the data sources. Responsible: UMA (2PM) Contributors: UDE (1PM), CICE (1PM) T6.2 Semantic evaluation and filtering  Categorize and filter data retrieved from the various data sources  relies on techniques adopted from the field of knowledge discovery in databases (KDD)  encompass the pre-processing of given data in terms of statistical sampling, cleaning and transformation of the data into adequate representations for the subsequent algorithms Responsible: UDE (2PM) Contributors: UMA (1PM) T6.3 Framework for storage, notification and triggering  Retrigger crawler due changes in data corpus over time  Re-triggering based on a "when appropriate" strategy  recognition of specific events such as new conference announcements or availability of proceedings.  Notify its users about new and relevant findings Responsible: UDE (2PM) Contributors: UMA (2PM), CICE (2PM) T6.4 Data transformation and structural modeling for SNA  Define a common data format for sharing within consortium based on the identification of relevant communities and their "traces" (communication, copublications etc.), and based on the general conceptual model (WP 2)  Define and specify typical roles and constellations (e.g. broker) based on SNA techniques (e.g. blockmodeling)  Continuously verification of social indicators Responsible: UDE (2PM) Contributors: UMA (2PM) T6.5 Software platform  Configure an integrated software platform for crawling/data mining and SNA based on the initial specifications  input relates to the transformation from relevant data sources (specified in T6.4)  output is concerned with visualisation and reporting  Revised and adapt platform according to emerging issues and needs (esp. considering the case studies) Uses freely available (open) software and software owned by the partners (mainly UDE) Responsible: UDE (7PM) Contributors: UMA (4PM), CICE (1PM) T6.6 Data analysis and evaluation  Test platform with standard cases based on specifications of WP 4 (Measurements and Social Indicators)  early phase: test functioning of the platform and its components (from T6.5) and adequacy of the semantic filters (T6.2) and structural definitions (T6.4).  later stage: evaluate actual performance and community developments in association with the case studies and with WP 4. Responsible: UDE (3PM) Contributors: Frontiersln (2PM), UMA (1PM), CICE (1PM) Deliverables and Milesones Deliverables  6.1 Mining strategy and requirements specification for the software platform (RP:UDE,RV:UMA, C: all in /M5)  6.2 First version of structural definitions (RP: UDE, RV: UMA, C: all in / M10)  6.3 Configuration, test of the platform and first evaluation report (RP:UDE,RV:CICE,C: all in /M22)  6.4 Final report and system (RP:UDE,RV:CICE,C: all in /M30) Milestones  MS2, SISOB System first prototype, month 15  MS3, SISOB Final System, month 30 Tools  Open Source Crawler  DMD –Data-Multiplexer-Demultiplexer  WOS2Pajek, Pajek, and UCINET  CFinder Challenges  Data model adequate to different data sources  Data model supporting multilevel analysis according to multivocality in project  Merging different types of data  Cleaning data  e. g. researchers having different email  e. g. researchers writing their names in different ways  How to get data from Web 2.0 Platforms like Mendeley