Download Responsible: UDE (2PM)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
WP6: Software Platform and Tools
Lead: UDE
Partners: UMA, CICE, FriontiersIn
Month 1 - Month 30
Overview
 Bundles all activities related to the provision of a
software platform hosting tools and services for
data mining, crawling and social network analysis
 Relies on existing tools, either free and open
software or tools owned by the partners
 First part: definition of crawling, data mining,
storage strategy
 Second part: Data transformation for SNA, definition
of network based role model and evaluation of
these models
Specific objectives
 Selection and evaluation of mining strategies
 Specification of crawling approach and integration
of crawlers
 Specification and configuration of a software
platform
 Preparation / transformation of data for SNA
 Specification and modelling of roles and
constellations (SNA)
 Data analyses and evaluation
 Model revision and software adaptation
T6.1 Crawler and mining strategy
 Specify requirements for crawling and data mining
based on the focused data sources and social
models
 flexible with respect crawling strategies to be adaptable also
to the needs other work packages (esp. the case studies)
 integrated and controlled by a framework which handles the
storage of retrieved web objects and the notification of newly
found relevant data and changes in the data sources.
Responsible: UMA (2PM)
Contributors: UDE (1PM), CICE (1PM)
T6.2 Semantic evaluation and filtering
 Categorize and filter data retrieved from the various data
sources
 relies on techniques adopted from the field of knowledge
discovery in databases (KDD)
 encompass the pre-processing of given data in terms of
statistical sampling, cleaning and transformation of the
data into adequate representations for the subsequent
algorithms
Responsible: UDE (2PM)
Contributors: UMA (1PM)
T6.3 Framework for storage, notification and triggering
 Retrigger crawler due changes in data corpus over
time
 Re-triggering based on a "when appropriate"
strategy
 recognition of specific events such as new conference
announcements or availability of proceedings.
 Notify its users about new and relevant findings
Responsible: UDE (2PM)
Contributors: UMA (2PM), CICE (2PM)
T6.4 Data transformation and structural modeling for SNA
 Define a common data format for sharing within
consortium based on the identification of relevant
communities and their "traces" (communication, copublications etc.), and based on the general conceptual
model (WP 2)
 Define and specify typical roles and constellations (e.g.
broker) based on SNA techniques (e.g. blockmodeling)
 Continuously verification of social indicators
Responsible: UDE (2PM)
Contributors: UMA (2PM)
T6.5 Software platform
 Configure an integrated software platform for
crawling/data mining and SNA based on the initial
specifications
 input relates to the transformation from relevant data sources (specified in T6.4)
 output is concerned with visualisation and reporting
 Revised and adapt platform according to emerging
issues and needs (esp. considering the case
studies)
Uses freely available (open) software and software owned by
the partners (mainly UDE)
Responsible: UDE (7PM)
Contributors: UMA (4PM), CICE (1PM)
T6.6 Data analysis and evaluation
 Test platform with standard cases based on
specifications of WP 4 (Measurements and Social
Indicators)
 early phase: test functioning of the platform and its
components (from T6.5) and adequacy of the semantic filters
(T6.2) and structural definitions (T6.4).
 later stage: evaluate actual performance and community
developments in association with the case studies and with
WP 4.
Responsible: UDE (3PM)
Contributors: Frontiersln (2PM), UMA (1PM), CICE (1PM)
Deliverables and Milesones
Deliverables
 6.1 Mining strategy and requirements specification for
the software platform (RP:UDE,RV:UMA, C: all in /M5)
 6.2 First version of structural definitions (RP: UDE, RV:
UMA, C: all in / M10)
 6.3 Configuration, test of the platform and first
evaluation report (RP:UDE,RV:CICE,C: all in /M22)
 6.4 Final report and system (RP:UDE,RV:CICE,C: all in
/M30)
Milestones
 MS2, SISOB System first prototype, month 15
 MS3, SISOB Final System, month 30
Tools
 Open Source Crawler
 DMD –Data-Multiplexer-Demultiplexer
 WOS2Pajek, Pajek, and UCINET
 CFinder
Challenges
 Data model adequate to different data sources
 Data model supporting multilevel analysis according
to multivocality in project
 Merging different types of data
 Cleaning data
 e. g. researchers having different email
 e. g. researchers writing their names in different ways
 How to get data from Web 2.0 Platforms like
Mendeley