* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DwB Standard Presentation
Survey
Document related concepts
Transcript
Improving transnational access to microdata Proof of Concept for a European Network of Secure Remote Access Systems Presentation and demonstration Roxane Silberman (CNRS), Jara Kampmann (Gesis), Maurice Brandt (Destatis), Eric Debonnel (Genes), Katharina KinderKurlanda (Gesis), Mathias Zenke (Destatis), Philippe Donnay (Genes), Kamel Gadouche (Genes) Luxembourg, Data without Boundaries, European Data Access Forum 2015, march 25, 2015 Proof of concept : Main goals • Build a trans-border network of secure remote access systems will : Allow access in the same environment to several data-sources from different data-providers Improve collaboration between researchers in Europe Improve collaboration between data-providers in Europe Give a secure infrastructure for hosting confidential microdata at a European level • The main goal of the POC is to show how it could be possible to define and deploy such a secure network Proof of concept : Main security features • High security level : Closed environment Input and ouput are controlled Connections are permitted only from accredited sites (IP address) Confidential data must remain inside the secure environment to prevent datafile extraction Strong authentication is mandatory • The same security level everywhere accross the network : Every node (site) with the same security level Every access point with the same security level Every user (researchers and staff) authenticates with the same procedure All the communications have to be encrypted • By design, the network should be compliant with ISO 27001 (IT security certification) Proof of concept : Main features • Flexibility of the network is a key point : The network should be able to manage different legal frameworks (and different interpretations…) The network should be able to accept new members (easy to join) The network has to be independent from existing RDC infrastructure (completely separated and isolated) : it doesn’t aim at replacing the local RDC • For flexibility, 3 different levels of trust between partners of the network have been defined : Full trust inside the secured environment (free exchange of microdata) Medium trust (exchange of lightly controlled output data) Minimum trust (exchange of fully controlled output data) Proof of concept : Main features • Usability : The virtual research environment (VRE), a windows desktop, should provide every standard tool for processing microdata (SAS, STATA, R, SPSS, etc.) • Collaboration : Users should be able to share data and results according to certain rules (legal or organisational) • Management tools (not implemented for POC): RDC Staff should be able to manage the network : project creation, output checking etc. Proof of concept : Context • 3 institutions involved : DESTATIS – GERMANY – WIESBADEN GESIS – GERMANY – COLOGNE GENES – FRANCE – PARIS • A previous Dwb study has shown that existing RDC IT infrastructures are very different and therefore they cannot be securely connected. • The POC was also designed and implemented without using any existing RDC’s IT infrastructure (build from scratch) • IT Infrastructure implemented : Gesis Node located in Cologne, DESTATIS node located in Wiesbaden, CASD node and the central node located in Paris Proof of concept : Organisation • September 2014 : Decision to implement this POC was taken in • October 2014 – January 2015 : Installation of the 4 infrastructures at CASD • February 2015 : The servers (nodes) were sent to GESIS and DESTATIS • March 2015 : Physical installation of the servers Connections to the central node First tests and improvements DwB Proof of concept diagram DEMONSTRATION SCENARIO Scenario overview • There is a project of a study on female employment and opinion on gender role with a comparison of France and Germany. • The project involves 2 researchers in this fictional scenario : Jara from Gesis as a researcher from the Humboldt University in Berlin. Kamel from CASD as a researcher from Toulouse School of Economics. • It requires access to confidential microdata from INSEE provided at CASD in Paris and from DESTATIS in Germany as well as microdata provided at GESIS in Cologne. • Jara does not speak French and does not know so well the French context for keeping young children. • Jara works with SPSS and Kamel uses SAS Scenario context • Rules for the demonstration scenario : Rule 1 : Official German microdata must not be transferred to another country (legal constraint) Rule 2 : French microdata could be transferred to Germany in a secure environment Rule 3 : Gesis research microdata could be transferred to France or Destatis in a secure environment • Access points rules : Jara and Kamel use the same thin client (DwB SDBox) DESTATIS allows access for resident as well as non resident, however only from accredited point of access in Germany. Humboldt University is accredited. France allows access from researchers own office in France as well as in other European countries Scenario context • Different phases for access : Accreditation level (each institution/country has its own rules): Jara and Kamel have both got the accreditation in Germany and in France Service providing level (after getting accreditation) -> researchers working on a network of RDCs is the topic of the demonstration Microdata needed • For the demonstration, only public use files will be used (for legal reason) : The microcensus data for employment information (DESTATIS) available at DESTATIS The French LFS for employment information (INSEE) available at CASD The EVS for attitudes information available at GESIS • That’s why for the demonstration, data will be merged at region level. • In DESTATIS and INSEE there are only factual microdata that will be merged with GESIS EVS microdata in this project. DEMONSTRATION LIVE (20 MINUTES) DwB Proof of concept Workflows animation Demonstration scenario 1 6 4 5 3 1 Connection and EVS transfer 2 Microcensus copy to common folder 3 EVS and Microcensus preparation 4 LFS copy to local folder 5 Preparation, translation and conversion 6 Blind transfer of LFS to common folder 7 EVS, LFS, Microcensus file merge > output 2 Demonstration : Story board • 1- Jara connects to VRE-D1 from Berlin and transfers EVS from Gesis to the common folder • 2- Jara copies Microcensus data to the common folder • 3- Jara prepares data from EVS and Microsensus with SPSS and waits for the French microdata • 4- Kamel connects to VRE-C2 in France from Toulouse, and copies the French LFS (metadata is only in French) to the local folder • 5 – Kamel, using SAS, prepares, translates, converts microdata data from the French LFS, • 6 – Kamel transfers prepared data to « write only » common folder (Kamel will show that he can’t see datafiles inside the common folder) • 7- Jara merges the 3 files and does the analysis and issues the results The final output • The final output has to be checked by DESTATIS with the support of GESIS and CASD • Jara gets the output outside the secure environment (by encrypted email for example) Conclusion of the POC • The POC was designed in a collaborative way with several NSI and data archives involved. • Currently the POC does not include management tools for RDC’s staff and data-producers but it is possible to implement them (question of time). • The consortium contract for managing the network with different stakeholders was not defined. • There were some issues to solve : Installing server inside different institutions. Network and firewall configuration adjustments. • There was strong involvement of all partners to make the POC successful. Conclusion – the POC and afterwards • The system has to be flexible enough to manage very different legal and organisational contexts. • The system has to provide a real collaborative environment for researchers across Europe (because data are not harmonised and not always documented in English). • The system has to be highly secure to get the trust of many dataproducers. Such a system may encourage some countries to get more involved in cross-border collaboration. • Such a system, really implemented, would allow : A better collaboration between researchers and data-providers. New possibilities to merge data from different institutions/countries and different domains. A better way to secure microdata which could lead to more data available. • Still lots of thing to do to implement a real production network. Thanks for Listening Questions & comments Contact: [email protected] Website: http://www.dwbproject.org/