The Data Documentation Initiative (DDI)
What Is the DDI?
The Data Documentation Initiative (DDI) is an international effort to develop a specification for the content and structure of the metadata that describe the empirical data used in research in social and behavioral sciences
Empirical Data Include
- Surveys
- Censuses
- Administrative records
- Direct observations
- What else? Standards already exist for:
- Images
- Diaries and letters
- Recordings (sound, video)
Why Was the DDI Developed?
- To replace existing documentation specifications:
- Insufficient structure and
- Loose semantic content (meaning)
- Many idiosyncratic and obsolete formats (e.g., OSIRIS)
- Standardization needed to enhance interchangeability
- Format that is resistant to corruption and mis-use
- Persistent format
- Users have only one format to learn to use
- Take advantage of new technologies to enhance the usefulness of documentation
DDI and Traditional Codebooks
- DDI provides a way for codebooks
- to be created in a uniform, highly structured format
- that is easily and precisely searchable on the web,
- that lends itself well to simultaneous use of multiple datasets, and
- that will significantly improve the content and usability of metadata
Uses of DDI
- Transform an XML marked-up document to create:
- HTML for web pages
- Formatted text for printing or other display purposes
- Syntax files for statistical packages
- Importable text files for loading databases or library catalogs
Other Uses of DDI
- Enhance searches using tags to identify relevant information; search by topic across sources
- Create “families” of comparable collections across data sources related by geography, time, or subject
- Integrate heterogeneous, distributed data sources under a single user interface
HTML Basics
- HyperText Markup Language consists of a series of text tags used for presentation purposes and linking
- HTML tags travel in pairs and take the form <table>…</table>
- Some tags are English words; others are abbreviations (i.e., <b>…</b>)
- Browsers interpret the tags and present the page according to HTML standards
XML Concepts
- XML stands for eXtensible Markup Language
- XML is a markup language much like HTML
- XML was designed to describe data of all forms
- XML tags are not predefined in XML as in HTML
- You must define your own tags
- XML may use a DTD (Document Type Definition) to formally describe the data
- Alternative modes of expressing XML DDI in future
- Resource Description Framework
- XML Schema
- ?
Why XML?
- Designed for use on WWW
- Key to discovery and dissemination
- Hardware & software independent
- Non-proprietary
- Interoperable across a wide range of platforms and computing sites
- Separates content from format
- Simplifies multiple uses of same source document
- Markup is plain text
- Human readable
- Easier to preserve than non-text formats
- Migrates across hardware and software platforms
- Markup makes it easier for software to locate information, creating tremendous possibilities
- XML documents are flexible and modular
- Could support possibilities we have not even anticipated
Document Type Definitions (DTDs)
- DTDs are currently the mode of expressing the XML specifications in today’s DDI
- DTDs provide a set of rules for determining if a particular document is valid
- Describe the structure and syntax of an XML document
- Document Type Definitions (cont.)
- Allow validation and limit the content: what elements are allowed, required, disallowed, defaulted.
- DTD makes a specialized vocabulary such as DDI sharable by publishing and enforcing a set of rules
- DTD can be parsed to ascertain compliance with DDI
The Dublin Core
- Very general metadata element set intended to facilitate the discovery of all electronic resources
- Fifteen elements only
- Roughly the complexity of a library catalog card
- MARC record in Library of Congress
- DDI is intended to comply with the Dublin Core
- The Dublin Core Elements
- Title
- Creator
- Subject
- Description
- Publisher
- Contributor
- Date
- Type
- Format
- Identifier
- Source
- Language
- Relation
- Coverage
- Rights
Major Components of DDI
- Document Description
- Study Description
- File Description
- Data Description
- Related Material
- DDI Structural Diagram
Section 1: Document Description
Citation – Bibliographic information describing the marked-up document
Guide – Terms and definitions used in the documentation
Status – Indicator of whether the documentation is a pre-release or final version
Source – Citation for the source of the marked-up documentation
Section 2: Study Description
Citation – Bibliographic information for the data collection (not the documentation)
Scope – Information about the study’s subject, geographic & temporal coverage
Methodology & Process – Information about how the data were collected (e.g., sample design)
Data Access – Access conditions & terms of use for the data collection
Other Study Description Materials
Section 3: File Description
X
Name, contents, structure, and dimensions (e.g., number of cases and record lengths) of each file in the collection
Section 4: Data Description
Variable Group – Combines variables that share a common subject, are coded from a single question, or are linked in some other way
Variable –Name, weighting, valid/invalid ranges, etc.
Section 5: Other Material
Allows for the inclusion of other materials that are related to the study as identified and labeled by the person doing the mark-up (e.g., bibliography, reports, methodological documents, etc.)
Provides a "container" for other machine-readable materials such as data definition statements, PDF, or scanned facsimiles of the codebook, etc.
What Does the DDI Not Cover?
- Display criteria
- Authoring software
- Parsing software
- Editing software
- Statistical software
- Data extraction software
- I.E.
- The DDI is a metadata standard
- Period
Who is Using DDI?
- Networked Social Science Tools and Resources, European Union (NESSTAR)
- Survey Documentation and Analysis, Computer Survey Methods Program, UC Berkeley (SDA)
- Federal Electronic Research and Review Extraction Tool (FERRET)
Who Else is Using DDI?
- Virtual Data Center Project, Harvard-MIT Data Center (VDC)
- Counting California
- Statistical Data Collection, University of Virginia Library
and ICPSR and the Roper Center
- Standardize Guide or Catalog elements
- Integrate Guide or Catalog with variable-level databases, mapping each to the same standard E.g., iPOLL, the RC question database
- Collaborate with other data archives
- One step closer to offering on-line data extraction and analysis for lots of data
What Are the Next Steps for Archives?
- Become familiar with DDI elements on the study level
- Select subset of DDI elements at other levels that are most relevant to the collections
- Map the finding aid(s) to the DDI elements
- Use XML wrappers to create XML pages dynamically from database information
Next Steps for DDI
- More flexible mode of implementation
- Do not relax vigilance for full compliance
- Provide for user extensibility
- Provide better metadata for
- Complex file structures
- Data for comparative and longitudinal research
- Training materials!!!!!
- Major Funding Organizations
Thanks to Our Funders!
- U.S. National Science Foundation
- Health Canada
- Inter-university Consortium for Political and Social Research
Significant In-kind Support
- CESSDA and IFDO member archives
- Individual members of IASSIST
- Colleges and universities
- Agencies of U.S. Federal Government
- NESSTAR
Alliance for the Data Documentation Initiative
- Formed to provide long-term infrastructure for the DDI
- Created, funded, and guided by stakeholders in the DDI
Data producers
- Data archives and libraries
Data users
- Secretariat at ICPSR
Organizing Institutions of the Alliance
- Inter-university Consortium for Political and Social Research (ICPSR)
Host institution - Roper Center for Public Opinion Research
Host institution - Council of European Social Science Data Archives (CESSDA)
- International Association of Social Science Information Service and Technology (IASSIST)
- International Federation of Data Organizations (IFDO)
Promoting Easy, Effective and Economical Access to Essential European Data
©NESSIE 2004


