Subtopic Deep Dive

Class Imbalance in Streaming Data
Research Guide

What is Class Imbalance in Streaming Data?

Class Imbalance in Streaming Data addresses handling skewed class distributions in data streams where concept drift occurs, using resampling, cost-sensitive learning, and adaptive thresholding for rare event detection.

This subtopic combines online learning challenges with class imbalance in evolving streams. Key approaches include resampling ensembles (Wang et al., 2014, 384 citations) and handling drift with imbalance (Hoens et al., 2012, 294 citations). Over 10 major papers since 2010 focus on metrics and strategies for dynamic environments.

15
Curated Papers
3
Key Challenges

Why It Matters

Effective methods improve minority class detection in fraud monitoring, as shown in credit card applications (Dal Pozzolo et al., 2017, 577 citations), and anomaly detection in evolving streams (Gomes et al., 2017, 717 citations). They enhance safety-critical predictions like network intrusions. Wang et al. (2018, 321 citations) demonstrate resampling boosts performance under drift in real-time systems.

Key Research Challenges

Handling Concept Drift

Data streams evolve, causing distribution shifts that degrade imbalance models. Wang et al. (2018) show traditional resampling fails post-drift. Adaptive ensembles like Kappa Updated (Cano and Krawczyk, 2019, 175 citations) address this partially.

Rare Minority Detection

Minority classes appear infrequently, leading to poor recall. Hoens et al. (2012) overview metrics for evaluation. Online ensembles struggle with evolving classes (Sun et al., 2016, 171 citations).

Scalable Online Resampling

Real-time processing limits complex resampling. Wang et al. (2014) propose ensembles but scalability drops with volume. Deployment issues amplify in production (Paleyes et al., 2022, 487 citations).

Essential Papers

1.

A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects

Ibomoiye Domor Mienye, Yanxia Sun · 2022 · IEEE Access · 975 citations

Ensemble learning techniques have achieved state-of-the-art performance in diverse machine learning applications by combining the predictions from two or more base models. This paper presents a con...

2.

Adaptive random forests for evolving data stream classification

Heitor Murilo Gomes, Albert Bifet, Jesse Read et al. · 2017 · Machine Learning · 717 citations

3.

Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy

Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen et al. · 2017 · IEEE Transactions on Neural Networks and Learning Systems · 577 citations

Detecting frauds in credit card transactions is perhaps one of the best testbeds for computational intelligence algorithms. In fact, this problem involves a number of relevant challenges, namely: c...

4.

Challenges in Deploying Machine Learning: A Survey of Case Studies

Andrei Paleyes, Raoul-Gabriel Urma, Neil D. Lawrence · 2022 · ACM Computing Surveys · 487 citations

In recent years, machine learning has transitioned from a field of academic research interest to a field capable of solving real-world business problems. However, the deployment of machine learning...

5.

Artificial intelligence in recommender systems

Qian Zhang, Jie Lü, Yaochu Jin · 2020 · Complex & Intelligent Systems · 397 citations

Abstract Recommender systems provide personalized service support to users by learning their previous behaviors and predicting their current preferences for particular products. Artificial intellig...

6.

Resampling-Based Ensemble Methods for Online Class Imbalance Learning

Shuo Wang, Leandro L. Minku, Xin Yao · 2014 · IEEE Transactions on Knowledge and Data Engineering · 384 citations

Online class imbalance learning is a new learning problem that combines the challenges of both online learning and class imbalance learning. It deals with data streams having very skewed class dist...

7.

A Systematic Study of Online Class Imbalance Learning With Concept Drift

Shuo Wang, Leandro L. Minku, Xin Yao · 2018 · IEEE Transactions on Neural Networks and Learning Systems · 321 citations

As an emerging research topic, online class imbalance learning often combines the challenges of both class imbalance and concept drift. It deals with data streams having very skewed class distribut...

Reading Guide

Foundational Papers

Start with Wang et al. (2014) for resampling ensembles and Hoens et al. (2012) for drift-imbalance overview, as they define core problems and metrics.

Recent Advances

Study Wang et al. (2018) for systematic drift analysis and Cano and Krawczyk (2019) for Kappa ensembles advancing adaptive performance.

Core Methods

Core techniques: online SMOTE resampling (Wang et al., 2014), adaptive random forests (Gomes et al., 2017), and evolving class ensembles (Sun et al., 2016).

How PapersFlow Helps You Research Class Imbalance in Streaming Data

Discover & Search

Research Agent uses searchPapers('class imbalance streaming data drift') to find Wang et al. (2014), then citationGraph reveals 384 citing works including Wang et al. (2018), and findSimilarPapers on Hoens et al. (2012) uncovers related drift handling.

Analyze & Verify

Analysis Agent applies readPaperContent on Gomes et al. (2017) for adaptive forest details, verifyResponse with CoVe checks drift-imbalance claims against abstracts, and runPythonAnalysis simulates imbalance metrics (e.g., G-mean) on stream datasets with GRADE scoring for evidence strength.

Synthesize & Write

Synthesis Agent detects gaps in resampling for evolved classes (Sun et al., 2016), flags contradictions between Hoens et al. (2012) and recent ensembles; Writing Agent uses latexEditText for methods sections, latexSyncCitations for 10+ papers, latexCompile for full reports, and exportMermaid for drift-oversampling flowcharts.

Use Cases

"Simulate resampling ensemble on imbalanced stream with drift using Python."

Research Agent → searchPapers → Analysis Agent → runPythonAnalysis (reimplements Wang et al. 2014 SMOTE-ensemble on synthetic stream, outputs G-mean plot and AUC).

"Draft LaTeX review of online imbalance methods with citations."

Research Agent → citationGraph (Wang et al. 2018 cluster) → Synthesis Agent → gap detection → Writing Agent → latexEditText + latexSyncCitations + latexCompile (exports formatted PDF with sections on drift adaptation).

"Find GitHub code for Kappa Updated Ensemble."

Research Agent → searchPapers('Kappa Updated Ensemble') → Code Discovery → paperExtractUrls (Cano 2019) → paperFindGithubRepo → githubRepoInspect (gets MOA implementation, drift tests, usage examples).

Automated Workflows

Deep Research workflow scans 50+ papers via searchPapers on 'streaming imbalance drift', structures report with Wang et al. (2014-2018) timeline using citationGraph. DeepScan applies 7-step CoVe to verify Hoens et al. (2012) claims against Dal Pozzolo et al. (2017) fraud benchmarks. Theorizer generates hypotheses on cost-sensitive thresholds from ensemble gaps (Gomes et al., 2017).

Frequently Asked Questions

What defines class imbalance in streaming data?

It involves skewed class ratios in continuous data streams with potential concept drift, requiring online adaptation (Wang et al., 2014).

What are main methods?

Resampling ensembles (Wang et al., 2014), cost-sensitive learning, and drift detectors in Kappa ensembles (Cano and Krawczyk, 2019).

What are key papers?

Foundational: Wang et al. (2014, 384 citations), Hoens et al. (2012, 294 citations); Recent: Wang et al. (2018, 321 citations), Sun et al. (2016, 171 citations).

What open problems exist?

Scalable handling of class evolution and deployment under high-velocity streams (Paleyes et al., 2022; Sun et al., 2016).

Research Data Stream Mining Techniques with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Class Imbalance in Streaming Data with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

See how PapersFlow works for Computer Science researchers