PapersFlow Research Brief

Social Sciences · Decision Sciences

Data Quality and Management
Research Guide

What is Data Quality and Management?

Data Quality and Management is the cluster of techniques for assessing, improving, and maintaining data quality, including record linkage, data cleaning, entity resolution, information quality benchmarks, privacy-preserving record linkage, name disambiguation, data integration, and addressing big data challenges.

This field encompasses 61,971 works focused on data quality assessment and improvement methods such as duplicate detection and string similarity measures. Key contributions include frameworks like FAIR principles for data stewardship and models for privacy protection such as k-anonymity. Data consumers define quality beyond accuracy to include broader dimensions like accessibility and timeliness.

Topic Hierarchy

100%
graph TD D["Social Sciences"] F["Decision Sciences"] S["Management Science and Operations Research"] T["Data Quality and Management"] D --> F F --> S S --> T style T fill:#DC5238,stroke:#c4452e,stroke-width:2px
Scroll to zoom • Drag to pan
62.0K
Papers
N/A
5yr Growth
400.5K
Total Citations

Research Sub-Topics

Why It Matters

Data Quality and Management enables reliable data sharing in healthcare, as shown by the REDCap consortium, which built an international community of software platform partners cited 21,869 times and used in clinical research for secure data collection (Harris et al., 2019). In scientific research, the FAIR Guiding Principles ensure data is findable, accessible, interoperable, and reusable, supporting over 16,387 citations and improving reproducibility across disciplines (Wilkinson et al., 2016). Privacy models like k-anonymity allow hospitals and banks to release person-specific data to researchers with scientific guarantees against identification, cited 8,343 times (Sweeney, 2002). Poor data quality affects business decisions, with Wang and Strong (1996) identifying multiple dimensions beyond accuracy that impact economic outcomes, cited 4,344 times.

Reading Guide

Where to Start

'Beyond Accuracy: What Data Quality Means to Data Consumers' by Wang and Strong (1996), because it provides a foundational, consumer-focused framework of data quality dimensions with 4,344 citations, accessible before technical methods.

Key Papers Explained

Wang and Strong (1996) in 'Beyond Accuracy: What Data Quality Means to Data Consumers' establishes multiple quality dimensions, which Wilkinson et al. (2016) in 'The FAIR Guiding Principles for scientific data management and stewardship' operationalizes through findability and reusability standards; Sweeney (2002) in 'k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY' builds privacy protections compatible with these; Bizer, Heath, and Berners-Lee (2009) in 'Linked Data - The Story So Far' extends integration practices; Chen (2002) in 'The Entity Relationship Model — Toward a Unified View of Data' supplies the structural foundation.

Paper Timeline

100%
graph LR P0["Bayesian Data Analysis
1995 · 13.7K cites"] P1["k-ANONYMITY: A MODEL FOR PROTECT...
2002 · 8.3K cites"] P2["The Entity Relationship Model — ...
2002 · 5.8K cites"] P3["Linked Data - The Story So Far
2009 · 4.5K cites"] P4["Business Intelligence and Analyt...
2012 · 5.8K cites"] P5["The FAIR Guiding Principles for ...
2016 · 16.4K cites"] P6["The REDCap consortium: Building ...
2019 · 21.9K cites"] P0 --> P1 P1 --> P2 P2 --> P3 P3 --> P4 P4 --> P5 P5 --> P6 style P6 fill:#DC5238,stroke:#c4452e,stroke-width:2px
Scroll to zoom • Drag to pan

Most-cited paper highlighted in red. Papers ordered chronologically.

Advanced Directions

Current work emphasizes scalability in big data cleaning, duplicate detection, and string similarity, as reflected in the 61,971 papers; no recent preprints or news available, so frontiers remain in privacy-preserving entity resolution and information quality benchmarks for social sciences applications.

Papers at a Glance

# Paper Year Venue Citations Open Access
1 The REDCap consortium: Building an international community of ... 2019 Journal of Biomedical ... 21.9K
2 The FAIR Guiding Principles for scientific data management and... 2016 Scientific Data 16.4K
3 Bayesian Data Analysis 1995 13.7K
4 k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY 2002 International Journal ... 8.3K
5 Business Intelligence and Analytics: From Big Data to Big Impact 2012 MIS Quarterly 5.8K
6 The Entity Relationship Model — Toward a Unified View of Data 2002 5.8K
7 Linked Data - The Story So Far 2009 International Journal ... 4.5K
8 Beyond Accuracy: What Data Quality Means to Data Consumers 1996 Journal of Management ... 4.3K
9 Software Framework for Topic Modelling with Large Corpora 2010 3.8K
10 Journal of Statistical Software 2009 Wiley Interdisciplinar... 3.6K

Frequently Asked Questions

What are the FAIR Guiding Principles?

The FAIR Guiding Principles are guidelines for scientific data management and stewardship that make data findable, accessible, interoperable, and reusable. Wilkinson et al. (2016) introduced them in 'The FAIR Guiding Principles for scientific data management and stewardship,' which has 16,387 citations. These principles support global data integration in research.

How does k-anonymity protect privacy in data sharing?

k-anonymity is a model that protects privacy by ensuring each record in released data is indistinguishable from at least k-1 other records. Sweeney (2002) defined it in 'k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY,' allowing data holders like hospitals to share field-structured data with researchers while providing guarantees against re-identification. The paper has 8,343 citations.

What data quality dimensions matter to consumers?

Data consumers consider quality beyond accuracy, including dimensions like completeness, timeliness, and accessibility. Wang and Strong (1996) showed in 'Beyond Accuracy: What Data Quality Means to Data Consumers' that narrow focus on accuracy misses broader impacts, with 4,344 citations. This broader view guides improvement efforts in organizations.

What is record linkage in data management?

Record linkage identifies and links records from different databases referring to the same entities, often using techniques like string similarity and duplicate detection. The field includes privacy-preserving methods and entity resolution, central to the 61,971 works in data quality management. Papers like those on name disambiguation address challenges in big data integration.

How do Linked Data principles support data quality?

Linked Data provides best practices for publishing and connecting structured data on the Web, creating a global space with billions of assertions. Bizer, Heath, and Berners-Lee (2009) described this in 'Linked Data - The Story So Far,' with 4,533 citations. It enhances data integration and quality through interoperability.

What role does the Entity Relationship Model play?

The Entity Relationship Model offers a unified view of data for design and integration. Chen (2002) presented it in 'The Entity Relationship Model — Toward a Unified View of Data,' cited 5,769 times. It supports data quality by standardizing structures in management systems.

Open Research Questions

  • ? How can privacy-preserving record linkage scale to big data volumes while maintaining linkage accuracy?
  • ? What metrics best capture data quality dimensions beyond accuracy for diverse consumer needs?
  • ? How do entity resolution techniques handle name disambiguation in multilingual datasets?
  • ? What integration methods resolve conflicts in linked data from heterogeneous sources?
  • ? How can FAIR principles be automated in data management pipelines for real-time stewardship?

Research Data Quality and Management with AI

PapersFlow provides specialized AI tools for Decision Sciences researchers. Here are the most relevant for this topic:

See how researchers in Economics & Business use PapersFlow

Field-specific workflows, example queries, and use cases.

Economics & Business Guide

Start Researching Data Quality and Management with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

See how PapersFlow works for Decision Sciences researchers