Subtopic Deep Dive

← Machine Learning and Data Classification

Positive and Unlabeled Learning for Classification
Research Guide

What is Positive and Unlabeled Learning for Classification?

Positive and Unlabeled (PU) Learning for Classification develops algorithms to train classifiers using only positive labeled examples and unlabeled data, without negative labels.

PU learning addresses scenarios where negative examples are unavailable or unreliable, common in text classification and web mining. Methods include two-stage approaches selecting reliable negatives from unlabeled data and risk estimation techniques. Surveys like Chapelle et al. (2006) cover related semi-supervised learning with 4273 citations.

Curated Papers

Key Challenges

Why It Matters

PU learning enables classification in domains like text mining where labeling negatives is costly or impossible, as in Nigam et al. (2000) using EM on labeled and unlabeled documents (2732 citations). Blum and Mitchell (1998) co-training method leverages unlabeled data for improved web page classification (5545 citations). Applications extend to transfer learning (Weiss et al., 2016, 5880 citations) and feature selection in high-dimensional data (Li et al., 2017, 2172 citations).

Key Research Challenges

Risk Estimation Bias

Estimating classification risk from PU data risks bias without negative labels. Zhou et al. (2003) highlight consistency issues in semi-supervised graph methods (3732 citations). Reliable negative selection remains critical.

Unlabeled Data Noise

Unlabeled data contains hidden negatives, contaminating training. Blum and Mitchell (1998) co-training assumes independent views to mitigate this (5545 citations). Noise robustness varies across domains like text.

Domain Shift Handling

PU methods struggle with distribution shifts between positive and unlabeled data. Ben-David et al. (2009) theory shows domain adaptation bounds (3322 citations). Transfer to new domains requires adaptation.

Essential Papers

A survey of transfer learning

Karl R. Weiss, Taghi M. Khoshgoftaar, Dingding Wang · 2016 · Journal Of Big Data · 5.9K citations

Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are...

Combining labeled and unlabeled data with co-training

Avrim Blum, Tom M. Mitchell · 1998 · 5.5K citations

Article Free Access Share on Combining labeled and unlabeled data with co-training Authors: Avrim Blum School of Computer Science, Carnegie Mellon University, Pittsburgh, PA School of Computer Scie...

Semi-Supervised Learning

Olivier Chapelle, Bernhard Schlkopf, Alexander Zien · 2006 · The MIT Press eBooks · 4.3K citations

A comprehensive review of an area of machine learning that deals with the use of unlabeled data in classification problems: state-of-the-art algorithms, a taxonomy of the field, applications, bench...

Learning with Local and Global Consistency

Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal et al. · 2003 · MPG.PuRe (Max Planck Society) · 3.7K citations

We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learn...

A theory of learning from different domains

Shai Ben-David, John Blitzer, Koby Crammer et al. · 2009 · Machine Learning · 3.3K citations

Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a sour...

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Pedro Domingos, Michael J. Pazzani · 1997 · Machine Learning · 3.0K citations

Text Classification from Labeled and Unlabeled Documents using EM

Kamal Nigam, Andrew Kachites McCallum, Sebastian Thrun et al. · 2000 · Machine Learning · 2.7K citations

Reading Guide

Foundational Papers

Start with Blum and Mitchell (1998) for co-training using unlabeled data (5545 citations), then Chapelle et al. (2006) survey (4273 citations) for taxonomy including PU-related methods.

Recent Advances

Weiss et al. (2016) transfer learning survey (5880 citations) links PU to domain adaptation; Li et al. (2017) feature selection (2172 citations) addresses PU high-dimensional challenges.

Core Methods

Core techniques: co-training (Blum 1998), graph regularization (Zhou 2003), EM for text (Nigam 2000), domain theory (Ben-David 2009).

How PapersFlow Helps You Research Positive and Unlabeled Learning for Classification

Discover & Search

Research Agent uses citationGraph on Blum and Mitchell (1998) to map co-training's influence on PU learning, revealing 5545 citations including Nigam et al. (2000). exaSearch queries 'positive unlabeled learning classification algorithms' to find two-stage methods; findSimilarPapers extends to Chapelle et al. (2006) semi-supervised survey.

Analyze & Verify

Analysis Agent applies readPaperContent to extract PU risk estimators from Zhou et al. (2003), then verifyResponse with CoVe checks claims against abstracts. runPythonAnalysis reproduces EM algorithm from Nigam et al. (2000) using pandas for text data simulation; GRADE scores evidence strength on unlabeled data assumptions.

Synthesize & Write

Synthesis Agent detects gaps in PU risk estimation via contradiction flagging across Blum (1998) and Ben-David (2009). Writing Agent uses latexEditText for two-stage method equations, latexSyncCitations for 10+ papers, and latexCompile for arXiv-ready review; exportMermaid diagrams co-training flow.

Use Cases

"Reproduce EM text classification from labeled/unlabeled docs in Python"

Research Agent → searchPapers 'Nigam EM unlabeled' → Analysis Agent → readPaperContent + runPythonAnalysis (pandas/NumPy EM impl.) → matplotlib accuracy plot.

"Write LaTeX review of PU learning two-stage methods"

Research Agent → citationGraph 'Blum co-training' → Synthesis → gap detection → Writing Agent → latexEditText equations + latexSyncCitations (Chapelle 2006) + latexCompile PDF.

"Find GitHub repos implementing PU risk estimation"

Research Agent → searchPapers 'PU learning risk' → Code Discovery → paperExtractUrls → paperFindGithubRepo → githubRepoInspect (Zhou 2003 graph methods).

Automated Workflows

Deep Research workflow scans 50+ PU papers via searchPapers on 'positive unlabeled classification', chains citationGraph to Blum (1998), outputs structured report with GRADE scores. DeepScan applies 7-step CoVe to verify claims in Weiss et al. (2016) transfer links to PU. Theorizer generates theory on unlabeled noise bounds from Ben-David (2009) and Nigam (2000).

Try Doxa for Positive and Unlabeled Learning for Classification Research

Frequently Asked Questions

What defines Positive and Unlabeled Learning?

PU learning trains classifiers from positive labels and unlabeled data, assuming unlabeled contains both positives and negatives. Differs from semi-supervised by lacking confirmed negatives.

What are main PU methods?

Two-stage methods select reliable negatives from unlabeled data; risk estimation directly bounds classification error. Co-training (Blum and Mitchell, 1998) and EM (Nigam et al., 2000) are foundational.

What are key papers?

Blum and Mitchell (1998, 5545 citations) co-training; Chapelle et al. (2006, 4273 citations) semi-supervised survey; Nigam et al. (2000, 2732 citations) EM for text.

What open problems exist?

Robust risk estimation under label noise; scaling to high dimensions; theoretical bounds for domain shift in PU settings, as in Ben-David et al. (2009).

Research Machine Learning and Data Classification with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

AI Literature Review

Automate paper discovery and synthesis across 474M+ papers

Code & Data Discovery

Find datasets, code repositories, and computational tools

Deep Research Reports

Multi-source evidence synthesis with counter-evidence

AI Academic Writing

Write research papers with AI assistance and LaTeX support

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Positive and Unlabeled Learning for Classification with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

Try PapersFlow Free See AI Literature Review

See how PapersFlow works for Computer Science researchers

Part of the Machine Learning and Data Classification Research Guide