PapersFlow Research Brief

Social Sciences · Decision Sciences

Data Quality and Management
Research Guide

What is Data Quality and Management?

Data Quality and Management is the cluster of techniques for assessing, improving, and maintaining data quality, including record linkage, data cleaning, entity resolution, information quality benchmarks, privacy-preserving record linkage, name disambiguation, data integration, and addressing big data challenges.

This field encompasses 61,971 works focused on data quality assessment and improvement methods such as duplicate detection and string similarity measures. Key contributions include frameworks like FAIR principles for data stewardship and models for privacy protection such as k-anonymity. Data consumers define quality beyond accuracy to include broader dimensions like accessibility and timeliness.

Topic Hierarchy

100%

graph TD D["Social Sciences"] F["Decision Sciences"] S["Management Science and Operations Research"] T["Data Quality and Management"] D --> F F --> S S --> T style T fill:#DC5238,stroke:#c4452e,stroke-width:2px

Scroll to zoom • Drag to pan

62.0K

Papers

N/A

5yr Growth

400.5K

Total Citations

Research Sub-Topics

Record Linkage

Covers probabilistic, deterministic, and machine learning approaches to linking records across databases. Researchers evaluate accuracy in large-scale datasets.

15 papers

Entity Resolution

Focuses on resolving duplicates and merging entities in structured and unstructured data. Studies blocking, matching, and clustering techniques for scalability.

15 papers

Data Cleaning

Examines automated detection and correction of errors, outliers, and inconsistencies in datasets. Researchers develop tools for error localization and repair.

15 papers

Name Disambiguation

Addresses author and entity name variants using similarity metrics and supervised learning. Applied in bibliometrics and citation analysis.

15 papers

Privacy-Preserving Record Linkage

Develops cryptographic and secure multi-party computation methods for linking without revealing sensitive data. Balances utility with privacy guarantees.

15 papers

Why It Matters

Data Quality and Management enables reliable data sharing in healthcare, as shown by the REDCap consortium, which built an international community of software platform partners cited 21,869 times and used in clinical research for secure data collection (Harris et al., 2019). In scientific research, the FAIR Guiding Principles ensure data is findable, accessible, interoperable, and reusable, supporting over 16,387 citations and improving reproducibility across disciplines (Wilkinson et al., 2016). Privacy models like k-anonymity allow hospitals and banks to release person-specific data to researchers with scientific guarantees against identification, cited 8,343 times (Sweeney, 2002). Poor data quality affects business decisions, with Wang and Strong (1996) identifying multiple dimensions beyond accuracy that impact economic outcomes, cited 4,344 times.

Reading Guide

Where to Start

'Beyond Accuracy: What Data Quality Means to Data Consumers' by Wang and Strong (1996), because it provides a foundational, consumer-focused framework of data quality dimensions with 4,344 citations, accessible before technical methods.

Key Papers Explained

Wang and Strong (1996) in 'Beyond Accuracy: What Data Quality Means to Data Consumers' establishes multiple quality dimensions, which Wilkinson et al. (2016) in 'The FAIR Guiding Principles for scientific data management and stewardship' operationalizes through findability and reusability standards; Sweeney (2002) in 'k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY' builds privacy protections compatible with these; Bizer, Heath, and Berners-Lee (2009) in 'Linked Data - The Story So Far' extends integration practices; Chen (2002) in 'The Entity Relationship Model — Toward a Unified View of Data' supplies the structural foundation.

Paper Timeline

100%

graph LR P0["Bayesian Data Analysis
1995 · 13.7K cites"] P1["k-ANONYMITY: A MODEL FOR PROTECT...
2002 · 8.3K cites"] P2["The Entity Relationship Model — ...
2002 · 5.8K cites"] P3["Linked Data - The Story So Far
2009 · 4.5K cites"] P4["Business Intelligence and Analyt...
2012 · 5.8K cites"] P5["The FAIR Guiding Principles for ...
2016 · 16.4K cites"] P6["The REDCap consortium: Building ...
2019 · 21.9K cites"] P0 --> P1 P1 --> P2 P2 --> P3 P3 --> P4 P4 --> P5 P5 --> P6 style P6 fill:#DC5238,stroke:#c4452e,stroke-width:2px

Scroll to zoom • Drag to pan

Most-cited paper highlighted in red. Papers ordered chronologically.

Advanced Directions

Current work emphasizes scalability in big data cleaning, duplicate detection, and string similarity, as reflected in the 61,971 papers; no recent preprints or news available, so frontiers remain in privacy-preserving entity resolution and information quality benchmarks for social sciences applications.

Papers at a Glance

#	Paper	Year	Venue	Citations	Open Access
1	The REDCap consortium: Building an international community of ...	2019	Journal of Biomedical ...	21.9K	✓
2	The FAIR Guiding Principles for scientific data management and...	2016	Scientific Data	16.4K	✓
3	Bayesian Data Analysis	1995	—	13.7K	✕
4	k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY	2002	International Journal ...	8.3K	✕
5	Business Intelligence and Analytics: From Big Data to Big Impact	2012	MIS Quarterly	5.8K	✕
6	The Entity Relationship Model — Toward a Unified View of Data	2002	—	5.8K	✕
7	Linked Data - The Story So Far	2009	International Journal ...	4.5K	✕
8	Beyond Accuracy: What Data Quality Means to Data Consumers	1996	Journal of Management ...	4.3K	✕
9	Software Framework for Topic Modelling with Large Corpora	2010	—	3.8K	✓
10	Journal of Statistical Software	2009	Wiley Interdisciplinar...	3.6K	✓

Frequently Asked Questions

What are the FAIR Guiding Principles?

The FAIR Guiding Principles are guidelines for scientific data management and stewardship that make data findable, accessible, interoperable, and reusable. Wilkinson et al. (2016) introduced them in 'The FAIR Guiding Principles for scientific data management and stewardship,' which has 16,387 citations. These principles support global data integration in research.

How does k-anonymity protect privacy in data sharing?

k-anonymity is a model that protects privacy by ensuring each record in released data is indistinguishable from at least k-1 other records. Sweeney (2002) defined it in 'k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY,' allowing data holders like hospitals to share field-structured data with researchers while providing guarantees against re-identification. The paper has 8,343 citations.

What data quality dimensions matter to consumers?

Data consumers consider quality beyond accuracy, including dimensions like completeness, timeliness, and accessibility. Wang and Strong (1996) showed in 'Beyond Accuracy: What Data Quality Means to Data Consumers' that narrow focus on accuracy misses broader impacts, with 4,344 citations. This broader view guides improvement efforts in organizations.

What is record linkage in data management?

Record linkage identifies and links records from different databases referring to the same entities, often using techniques like string similarity and duplicate detection. The field includes privacy-preserving methods and entity resolution, central to the 61,971 works in data quality management. Papers like those on name disambiguation address challenges in big data integration.

How do Linked Data principles support data quality?

Linked Data provides best practices for publishing and connecting structured data on the Web, creating a global space with billions of assertions. Bizer, Heath, and Berners-Lee (2009) described this in 'Linked Data - The Story So Far,' with 4,533 citations. It enhances data integration and quality through interoperability.

What role does the Entity Relationship Model play?

The Entity Relationship Model offers a unified view of data for design and integration. Chen (2002) presented it in 'The Entity Relationship Model — Toward a Unified View of Data,' cited 5,769 times. It supports data quality by standardizing structures in management systems.

Open Research Questions

? How can privacy-preserving record linkage scale to big data volumes while maintaining linkage accuracy?
? What metrics best capture data quality dimensions beyond accuracy for diverse consumer needs?
? How do entity resolution techniques handle name disambiguation in multilingual datasets?
? What integration methods resolve conflicts in linked data from heterogeneous sources?
? How can FAIR principles be automated in data management pipelines for real-time stewardship?

Recent Trends

The field holds steady at 61,971 works with no specified 5-year growth rate; highly cited papers like Harris et al. 'The REDCap consortium: Building an international community of software platform partners' (21,869 citations) indicate sustained focus on collaborative platforms, while Wilkinson et al. (2016) FAIR principles (16,387 citations) drive ongoing data stewardship adoption; no recent preprints or news in last 12 months.

2019

Research Data Quality and Management with AI

PapersFlow provides specialized AI tools for Decision Sciences researchers. Here are the most relevant for this topic:

Systematic Review

AI-powered evidence synthesis with documented search strategies

AI Literature Review

Automate paper discovery and synthesis across 474M+ papers

Deep Research Reports

Multi-source evidence synthesis with counter-evidence

See how researchers in Economics & Business use PapersFlow

Field-specific workflows, example queries, and use cases.

Economics & Business Guide

Start Researching Data Quality and Management with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

Try PapersFlow Free See AI Literature Review

See how PapersFlow works for Decision Sciences researchers

Topic Hierarchy

Research Sub-Topics

Record Linkage

Entity Resolution

Data Cleaning

Name Disambiguation

Privacy-Preserving Record Linkage

Related Topics

Why It Matters

Reading Guide

Where to Start

Key Papers Explained

Paper Timeline

Advanced Directions

Papers at a Glance

Frequently Asked Questions

What are the FAIR Guiding Principles?

How does k-anonymity protect privacy in data sharing?

What data quality dimensions matter to consumers?

What is record linkage in data management?

How do Linked Data principles support data quality?

What role does the Entity Relationship Model play?

Open Research Questions

Recent Trends

Research Data Quality and Management with AI

Systematic Review

AI Literature Review

Deep Research Reports

Start Researching Data Quality and Management with AI