Subtopic Deep Dive
Name Disambiguation
Research Guide
What is Name Disambiguation?
Name disambiguation resolves ambiguities in author or entity names across publications using similarity metrics, clustering, and supervised learning techniques.
Name disambiguation tackles issues like abbreviations, misspellings, and identical names shared by multiple authors (Han et al., 2004, 365 citations). Supervised approaches using machine learning models achieve high accuracy in citation databases. Over 10 papers from 2003-2023, with 300+ citations each, advance methods from unsupervised clustering to deep neural networks.
Why It Matters
Name disambiguation enables accurate bibliometric analysis by correctly attributing publications to authors, improving h-index calculations and collaboration network mapping. In knowledge graphs, it supports precise entity linking for better semantic search (Peng et al., 2023, 482 citations; Noy et al., 2019, 331 citations). Applications include cleaning Linked Data for quality assessment (Zaveri et al., 2015, 573 citations) and populating knowledge bases from extracted facts (Dredze et al., 2010, 343 citations).
Key Research Challenges
Handling Name Variants
Abbreviations, misspellings, and pseudonyms create multiple representations for one author (Han et al., 2004). Supervised models require labeled data to distinguish these from shared names. Unsupervised methods like clustering struggle with sparse bibliographic features (Mann and Yarowsky, 2003).
Scalability to Large Datasets
Processing millions of citations demands efficient algorithms for real-time disambiguation. Deep learning models increase accuracy but raise computational costs (Ganea and Hofmann, 2017, 332 citations). Zero-shot approaches aim to link entities without retraining (Wu et al., 2020, 324 citations).
Contextual Entity Linking
Ambiguous names in tweets or short texts lack sufficient context for resolution (Derczynski et al., 2014, 340 citations). Joint entity-relation extraction helps but requires novel tagging schemes (Zheng et al., 2017, 730 citations). Local neural attention improves document-level disambiguation (Ganea and Hofmann, 2017).
Essential Papers
Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks
Yubo Chen, Liheng Xu, Kang Liu et al. · 2015 · 914 citations
Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, Jun Zhao. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural...
Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme
Suncong Zheng, Feng Wang, Hongyun Bao et al. · 2017 · 730 citations
Joint extraction of entities and relations is an important task in information extraction. To tackle this problem, we firstly propose a novel tagging scheme that can convert the joint extraction ta...
Quality assessment for Linked Data: A Survey
Amrapali Zaveri, Anisa Rula, Andrea Maurino et al. · 2015 · Semantic Web · 573 citations
The development and standardization of Semantic Web technologies has resulted in an unprecedented volume of data being published on the Web as Linked Data (LD). However, we observe widely varying d...
Knowledge Graphs: Opportunities and Challenges
Ciyuan Peng, Feng Xia, Mehdi Naseriparsa et al. · 2023 · Artificial Intelligence Review · 482 citations
Abstract With the explosive growth of artificial intelligence (AI) and big data, it has become vitally important to organize and represent the enormous volume of knowledge appropriately. As graph d...
Two supervised learning approaches for name disambiguation in author citations
Hui Han, C. Lee Giles, Hongyuan Zha et al. · 2004 · 365 citations
Due to name abbreviations, identical names, name misspellings, and pseudonyms inpublications or bibliographies (citations), an author may have multiple names and multiple authors may share the same...
Entity Disambiguation for Knowledge Base Population
Mark Dredze, Paul McNamee, Delip Rao et al. · 2010 · 343 citations
The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues s...
Analysis of named entity recognition and linking for tweets
Leon Derczynski, Diana Maynard, Giuseppe Rizzo et al. · 2014 · Information Processing & Management · 340 citations
Reading Guide
Foundational Papers
Start with Han et al. (2004) for supervised approaches on citation data, then Mann and Yarowsky (2003) for unsupervised clustering, and Dredze et al. (2010) for knowledge base applications.
Recent Advances
Study Ganea and Hofmann (2017) for deep joint disambiguation, Wu et al. (2020) for scalable zero-shot linking, and Peng et al. (2023) for knowledge graph challenges.
Core Methods
Core techniques include supervised classification with bibliographic features (Han et al., 2004), neural entity embeddings with attention (Ganea and Hofmann, 2017), and dense retrieval for zero-shot linking (Wu et al., 2020).
How PapersFlow Helps You Research Name Disambiguation
Discover & Search
Research Agent uses searchPapers and citationGraph to map evolution from Han et al. (2004) to recent deep learning advances like Ganea and Hofmann (2017). findSimilarPapers expands on 'Two supervised learning approaches for name disambiguation' to uncover variants. exaSearch queries 'author name disambiguation supervised clustering' for 250M+ OpenAlex papers.
Analyze & Verify
Analysis Agent applies readPaperContent to extract features from Han et al. (2004), then verifyResponse with CoVe checks claims against citations. runPythonAnalysis in sandbox computes similarity metrics (e.g., Levenshtein distance on author names) using pandas/NumPy, with GRADE grading for evidence strength in clustering accuracy.
Synthesize & Write
Synthesis Agent detects gaps in scalability between foundational (Han et al., 2004) and modern zero-shot methods (Wu et al., 2020). Writing Agent uses latexEditText, latexSyncCitations for bibliometric reports, and latexCompile for publication-ready docs. exportMermaid visualizes disambiguation pipelines as flowcharts.
Use Cases
"Reimplement supervised name disambiguation from Han 2004 with Python code."
Research Agent → searchPapers('Han Giles name disambiguation') → Code Discovery (paperExtractUrls → paperFindGithubRepo → githubRepoInspect) → runPythonAnalysis sandbox evaluates clustering accuracy on sample citations.
"Write LaTeX review of name disambiguation challenges post-2015."
Research Agent → citationGraph(Han 2004) → Synthesis Agent gap detection → Writing Agent → latexEditText(draft) → latexSyncCitations(20 papers) → latexCompile(PDF output with resolved author graphs).
"Find code for entity linking in knowledge graphs."
Research Agent → exaSearch('scalable zero-shot entity linking code') → findSimilarPapers(Wu 2020) → Code Discovery → githubRepoInspect → runPythonAnalysis tests BERT-based linking on custom dataset.
Automated Workflows
Deep Research workflow conducts systematic review: searchPapers(50+ name disambiguation papers) → citationGraph → DeepScan(7-step verification with CoVe checkpoints) → structured report on trends from Han (2004) to Peng (2023). Theorizer generates hypotheses on hybrid supervised-unsupervised models from clustered abstracts. Chain-of-Verification reduces errors in evaluating method comparisons across Dredze (2010) and Ganea (2017).
Frequently Asked Questions
What is name disambiguation?
Name disambiguation identifies unique authors or entities from ambiguous name variants in publications using similarity and learning methods (Han et al., 2004).
What are common methods?
Supervised learning with neighborhood features (Han et al., 2004), unsupervised clustering (Mann and Yarowsky, 2003), and deep neural attention (Ganea and Hofmann, 2017).
What are key papers?
Foundational: Han et al. (2004, 365 citations), Dredze et al. (2010, 343 citations); Recent: Wu et al. (2020, 324 citations), Peng et al. (2023, 482 citations).
What are open problems?
Scalable zero-shot disambiguation for streaming data and contextual linking in short texts remain unsolved (Wu et al., 2020; Derczynski et al., 2014).
Research Data Quality and Management with AI
PapersFlow provides specialized AI tools for Decision Sciences researchers. Here are the most relevant for this topic:
Systematic Review
AI-powered evidence synthesis with documented search strategies
AI Literature Review
Automate paper discovery and synthesis across 474M+ papers
Deep Research Reports
Multi-source evidence synthesis with counter-evidence
See how researchers in Economics & Business use PapersFlow
Field-specific workflows, example queries, and use cases.
Start Researching Name Disambiguation with AI
Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.
See how PapersFlow works for Decision Sciences researchers
Part of the Data Quality and Management Research Guide