Subtopic Deep Dive

Name Disambiguation
Research Guide

What is Name Disambiguation?

Name disambiguation resolves ambiguities in author or entity names across publications using similarity metrics, clustering, and supervised learning techniques.

Name disambiguation tackles issues like abbreviations, misspellings, and identical names shared by multiple authors (Han et al., 2004, 365 citations). Supervised approaches using machine learning models achieve high accuracy in citation databases. Over 10 papers from 2003-2023, with 300+ citations each, advance methods from unsupervised clustering to deep neural networks.

Curated Papers

Key Challenges

Why It Matters

Name disambiguation enables accurate bibliometric analysis by correctly attributing publications to authors, improving h-index calculations and collaboration network mapping. In knowledge graphs, it supports precise entity linking for better semantic search (Peng et al., 2023, 482 citations; Noy et al., 2019, 331 citations). Applications include cleaning Linked Data for quality assessment (Zaveri et al., 2015, 573 citations) and populating knowledge bases from extracted facts (Dredze et al., 2010, 343 citations).

Key Research Challenges

Handling Name Variants

Abbreviations, misspellings, and pseudonyms create multiple representations for one author (Han et al., 2004). Supervised models require labeled data to distinguish these from shared names. Unsupervised methods like clustering struggle with sparse bibliographic features (Mann and Yarowsky, 2003).

Scalability to Large Datasets

Processing millions of citations demands efficient algorithms for real-time disambiguation. Deep learning models increase accuracy but raise computational costs (Ganea and Hofmann, 2017, 332 citations). Zero-shot approaches aim to link entities without retraining (Wu et al., 2020, 324 citations).

Contextual Entity Linking

Ambiguous names in tweets or short texts lack sufficient context for resolution (Derczynski et al., 2014, 340 citations). Joint entity-relation extraction helps but requires novel tagging schemes (Zheng et al., 2017, 730 citations). Local neural attention improves document-level disambiguation (Ganea and Hofmann, 2017).

Essential Papers

Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks

Yubo Chen, Liheng Xu, Kang Liu et al. · 2015 · 914 citations

Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, Jun Zhao. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural...

Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme

Suncong Zheng, Feng Wang, Hongyun Bao et al. · 2017 · 730 citations

Joint extraction of entities and relations is an important task in information extraction. To tackle this problem, we firstly propose a novel tagging scheme that can convert the joint extraction ta...

Quality assessment for Linked Data: A Survey

Amrapali Zaveri, Anisa Rula, Andrea Maurino et al. · 2015 · Semantic Web · 573 citations

The development and standardization of Semantic Web technologies has resulted in an unprecedented volume of data being published on the Web as Linked Data (LD). However, we observe widely varying d...

Knowledge Graphs: Opportunities and Challenges

Ciyuan Peng, Feng Xia, Mehdi Naseriparsa et al. · 2023 · Artificial Intelligence Review · 482 citations

Abstract With the explosive growth of artificial intelligence (AI) and big data, it has become vitally important to organize and represent the enormous volume of knowledge appropriately. As graph d...

Two supervised learning approaches for name disambiguation in author citations

Hui Han, C. Lee Giles, Hongyuan Zha et al. · 2004 · 365 citations

Due to name abbreviations, identical names, name misspellings, and pseudonyms inpublications or bibliographies (citations), an author may have multiple names and multiple authors may share the same...

Entity Disambiguation for Knowledge Base Population

Mark Dredze, Paul McNamee, Delip Rao et al. · 2010 · 343 citations

The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues s...

Analysis of named entity recognition and linking for tweets

Leon Derczynski, Diana Maynard, Giuseppe Rizzo et al. · 2014 · Information Processing & Management · 340 citations

Reading Guide

Foundational Papers

Start with Han et al. (2004) for supervised approaches on citation data, then Mann and Yarowsky (2003) for unsupervised clustering, and Dredze et al. (2010) for knowledge base applications.

Recent Advances

Study Ganea and Hofmann (2017) for deep joint disambiguation, Wu et al. (2020) for scalable zero-shot linking, and Peng et al. (2023) for knowledge graph challenges.

Core Methods

Core techniques include supervised classification with bibliographic features (Han et al., 2004), neural entity embeddings with attention (Ganea and Hofmann, 2017), and dense retrieval for zero-shot linking (Wu et al., 2020).

How PapersFlow Helps You Research Name Disambiguation

Discover & Search

Research Agent uses searchPapers and citationGraph to map evolution from Han et al. (2004) to recent deep learning advances like Ganea and Hofmann (2017). findSimilarPapers expands on 'Two supervised learning approaches for name disambiguation' to uncover variants. exaSearch queries 'author name disambiguation supervised clustering' for 250M+ OpenAlex papers.

Analyze & Verify

Analysis Agent applies readPaperContent to extract features from Han et al. (2004), then verifyResponse with CoVe checks claims against citations. runPythonAnalysis in sandbox computes similarity metrics (e.g., Levenshtein distance on author names) using pandas/NumPy, with GRADE grading for evidence strength in clustering accuracy.

Synthesize & Write

Synthesis Agent detects gaps in scalability between foundational (Han et al., 2004) and modern zero-shot methods (Wu et al., 2020). Writing Agent uses latexEditText, latexSyncCitations for bibliometric reports, and latexCompile for publication-ready docs. exportMermaid visualizes disambiguation pipelines as flowcharts.

Use Cases

"Reimplement supervised name disambiguation from Han 2004 with Python code."

Research Agent → searchPapers('Han Giles name disambiguation') → Code Discovery (paperExtractUrls → paperFindGithubRepo → githubRepoInspect) → runPythonAnalysis sandbox evaluates clustering accuracy on sample citations.

"Write LaTeX review of name disambiguation challenges post-2015."

Research Agent → citationGraph(Han 2004) → Synthesis Agent gap detection → Writing Agent → latexEditText(draft) → latexSyncCitations(20 papers) → latexCompile(PDF output with resolved author graphs).

"Find code for entity linking in knowledge graphs."

Research Agent → exaSearch('scalable zero-shot entity linking code') → findSimilarPapers(Wu 2020) → Code Discovery → githubRepoInspect → runPythonAnalysis tests BERT-based linking on custom dataset.

Automated Workflows

Deep Research workflow conducts systematic review: searchPapers(50+ name disambiguation papers) → citationGraph → DeepScan(7-step verification with CoVe checkpoints) → structured report on trends from Han (2004) to Peng (2023). Theorizer generates hypotheses on hybrid supervised-unsupervised models from clustered abstracts. Chain-of-Verification reduces errors in evaluating method comparisons across Dredze (2010) and Ganea (2017).

Try Doxa for Name Disambiguation Research

Frequently Asked Questions

What is name disambiguation?

Name disambiguation identifies unique authors or entities from ambiguous name variants in publications using similarity and learning methods (Han et al., 2004).

What are common methods?

Supervised learning with neighborhood features (Han et al., 2004), unsupervised clustering (Mann and Yarowsky, 2003), and deep neural attention (Ganea and Hofmann, 2017).

What are key papers?

Foundational: Han et al. (2004, 365 citations), Dredze et al. (2010, 343 citations); Recent: Wu et al. (2020, 324 citations), Peng et al. (2023, 482 citations).

What are open problems?

Scalable zero-shot disambiguation for streaming data and contextual linking in short texts remain unsolved (Wu et al., 2020; Derczynski et al., 2014).