* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Presentation
Survey
Document related concepts
Transcript
Name and organization Have you worked with DDI before? (2 or 3) If not, are you familiar with XML? What kind of CAI systems do you use? Goals for today Introduction Creating DDI3 and Documentation Discussion • DDI 3 Background • XML Background • How DDI 3 documents survey instruments • Manual Markup • Using functionality from CAI systems • Custom development • Colectica • Questions and discussion • Additional documentation activities Data Documentation Initiative DDI3 Background Background • Concept of DDI and definition of needs grew out of the data archival community • Established in 1995 as a grant funded project initiated and organized by ICPSR • Members: – Social Science Data Archives (US, Canada, Europe) – Statistical data producers (including US Bureau of the Census, the US Bureau of Labor Statistics, Statistics Canada and Health Canada) • February 2003 – Formation of DDI Alliance – Membership based alliance Copyright © 2008 GESIS – Formalized development procedures Origins of the DDI Alliance • Versions 1.* and 2.* were developed by an informal network of individuals from the social science community and official statistics – Funding was through grants • It was decided that a more formal organization would help to drive the development of the standard forward – Many new features were requested – The DDI Alliance was born to facilitate the Copyright © 2008 GESIS development in a consistent and on-going fashion Requirements for 3.0 • Improve and expand the machine-actionable aspects of the DDI to support programming and software systems • Support CAI instruments through expanded description of the questionnaire (content and question flow) • Support the description of data series (longitudinal surveys, panel studies, recurring waves, etc.) • Support comparison, in particular comparison by design but also comparison-after-the fact (harmonization) • Improve support for describing complex data files (record and file linkages) • Provide improved support for geographic content to facilitate linking to geographic files (shape files, boundary files, etc.) Copyright © 2008 GESIS DDI 3.0 and the Data Life Cycle • • • • • A survey is not a static process: It dynamically evolved across time and involves many agencies/individuals DDI 2.x is about archiving, DDI 3.0 across the entire “life cycle” 3.0 focus on metadata reuse (minimizes redundancies/discrepancies, support comparison) Also supports multilingual, grouping, geography, and others 3.0 is extensible Copyright © 2008 GESIS Development of DDI 3.0 • 2004 – Acceptance of a new DDI paradigm – Lifecycle model – Shift from the codebook centric / variable centric model to capturing the lifecycle of data – Agreement on expanded areas of coverage • 2005 – Presentation of schema structure – Focus on points of metadata creation and reuse • 2006 – Presentation of first complete 3.0 model – Internal and public review • 2007 – Vote to move to Candidate Version – Establishment of a set of use cases to test application and implementation • 2008 – April: DDI 3.0 published Copyright © 2008 GESIS XML: Extensible Markup Language Designed to transport and store data XML Schemas, DDI Modules, and DDI Schemes Data Collection Instance Study Unit Physical Instance DDI Profile Logical Product Physical Data Structure Archive Conceptual Component Comparative Reusable Ncube Inline ncube Tabular ncube Proprietary Dataset Copyright © 2008 GESIS XML Schemas, DDI Modules, and DDI Schemes Data Collection Instance Study Unit Physical Instance DDI Profile Logical Product Physical Data Structure Archive Conceptual Component Comparative Reusable Ncube Inline ncube Tabular ncube Proprietary Dataset Copyright © 2008 GESIS XML Schemas, DDI Modules, and DDI Schemes Instance Study Unit Physical Instance DDI Profile Comparative Data Collection Question Scheme Control Construct Scheme Interviewer Instruction Scheme Logical Product Category Scheme Code Scheme Variable Scheme NCube Scheme Physical Data Structure Physical Structure Scheme Record Layout Scheme Archive Organization Scheme Conceptual Component Concept Scheme Universe Scheme Copyright © 2008 GESIS Scheme Geographic Structure Geographic Location Scheme Reusable Ncube Inline ncube Tabular ncube Proprietary Dataset Maintainable Schemes • • • • • • • • • • • • • • Category Scheme Code Scheme Concept Scheme Control Construct Scheme Geographic Structure Scheme Geographic Location Scheme Interviewer Instruction Scheme Question Scheme NCube Scheme Organization Scheme Physical Structure Scheme Record Layout Scheme Universe Scheme Variable Scheme Copyright © 2008 GESIS Packages of reusable metadata maintained by a single agency Designed to Support Registries • A “Registry” is a catalog of metadata resources • Resource package – Structure to publish non-study-specific materials for reuse • Extracting specified types of information in to schemes – Universe, Concept, Category, Code, Question, Instrument, Variable, etc. • Allowing for either internal or external references – Can include other schemes by reference and select only desired items • Providing Comparison Mapping – Target can be external harmonized structure Copyright © 2008 GESIS Data Collection • Methodology • Question Scheme • Question and Response Domain designed to support question banks – Question – Response domain – Question Scheme is a maintainable object • Instrument – using Control Construct Scheme • Coding Instructions – question to raw data – raw data to public file • Interviewer Instructions • Organization and flow of questions into Instrument – Used to drive systems like CASES and Blaise • Coding Instructions – Reuse by Questions, Variables, and comparison Copyright © 2008 GESIS QuestionItem in DDI QuestionItem Opening tag & identification QuestionText NumericDomain NumericDomain In a QuestionScheme ControlConstructScheme with QuestionConstructs An Instrument Those all go in a DataCollection element The DataCollection element goes in a StudyUnit, which goes in a DDIInstance or ResourcePackage Create QuestionScheme and QuestionItems Create ControlConstructScheme Add QuestionReferences Add control flow items to ControlConstructScheme Include a main Sequence element Create the Instrument Element Add the main ControlConstructReference Create the DDIInstance element Create the StudyUnit element Create the DataCollection element Add the QuestionScheme, ControlConstructScheme, and Instrument to the DataCollection element Check the XML document against the DDI schemas to see if we got it right. We have DDI, now we need documentation Custom Development MQDS Colectica Michigan Questionnaire Documentation System (MQDS) Sue Ellen Hansen Nicole Kirgis What Does MQDS Do? • Facilitates automated documentation and harmonization of Blaise survey instruments and datasets – Extracts survey question metadata – Standardized format Survey Question Metadata • • • • • • • Question universe Variable name and label Question text Question variable text (fills) Data type Code values and code text Skip instructions • etc. MQDS Version 1 • Extracted metadata from Blaise data model as XML tagged data • Provided user interface for selection of – Blaise files – Instrument questions and sections – Types of metadata to extract – Languages to display – Style sheet for generation of instrument documentation or codebook Using MQDS V1 XML: Codebook in Five Languages National Latino and Asian American Study www.icpsr.umich.edu/CPES MQDS Version 1 • Limitations – XML not DDI-compliant • DDI Version 2 did not have XML tags for all metadata provided by Blaise • Did not provide easy means of adding XML tags without becoming noncompliant – XML files for complex surveys can be very large (text files) • Entire files had to be processed in computer memory • Limited ability to fully automate documentation DDI Version 3 • Released April 2008 • Focus on complete data lifecycle –going beyond the codebook DDI Version 3 • Included extensions proposed by DDI working group on instrument design Persistent Content of Question Use of Question in Instrument Question text • Static • Dynamic or variable Order and routing • Sequence / skip patterns • Loops Multiple-part question Universe Response domain • Open • Set categories • Special types (date, time, etc.) Analysis unit Definitional text Instructions MQDS Version 3 • Joint SRC and ICPSR venture • Goals: – Address version 2 limitations • Process Blaise instrument of any size – Exploit new elements and validate to the recently released DDI version 3 standard – Move from processing XML metadata in memory to streaming metadata to a relational database MQDS Version 3 Relational Database: Import, Export, Transform SQL Server / SQL Server Express XML (DDI 3) Relational Db Blaise Datamodel (BMI) User specifies input files (location, file type, etc.) Blaise Database (BDB) 2. Export 1. Import User specifies output files (location, Language/locale, XML output options, etc.) 3. Transform Questionnaire Other File Types (e.g. SAS, SPSS, etc) Database connection settings DDI 3 elements not in *.bmi Codebook User specifies stylesheet selection criteria, type of output desired (html, rtf, pdf), etc. MQDS Version 3 • Relational database – DDI compliant standardized tables – Flexibility for SRC and ICPSR to add extensions that meet their specific organizational needs – Allows • Automated documentation of any Blaise survey instrument • Importing and documenting data produced by other software • Lower cost development of other tools that facilitate editing and disseminating data MQDS V3 Prototype: Exporting Language XML MQDS Development • Expect to release Summer 2009 • Working out a distribution plan for Blaise users