PapersFlow Research Brief

Physical Sciences · Computer Science

Spam and Phishing Detection
Research Guide

What is Spam and Phishing Detection?

Spam and Phishing Detection is the application of computational techniques, including machine learning and behavioral analysis, to identify and prevent phishing attacks, spam messages, bots, review spam, URL-based threats, and Sybil attacks in social networks.

The field encompasses 36,857 works focused on detection methods such as spam filtering, machine learning classifiers, and social network defenses. Techniques include support vector machines for classification and multi-label learning for handling multiple threat types simultaneously. Research emphasizes behavioral analysis and security education to counter evolving phishing tactics.

Topic Hierarchy

100%
graph TD D["Physical Sciences"] F["Computer Science"] S["Information Systems"] T["Spam and Phishing Detection"] D --> F F --> S S --> T style T fill:#DC5238,stroke:#c4452e,stroke-width:2px
Scroll to zoom • Drag to pan
36.9K
Papers
N/A
5yr Growth
374.9K
Total Citations

Research Sub-Topics

Why It Matters

Spam and Phishing Detection protects users from misinformation cascades that spread false news six times faster than truth on platforms like Twitter, as shown in 'The spread of true and false news online' (2018) analyzing 126,000 rumor cascades from 2006-2017. It enables fake news detection on social media, addressing low-quality content with intentional misinformation, per 'Fake News Detection on Social Media' (2017). Industries including online social networks benefit from measurements of user interactions to filter spam and Sybil attacks, as in 'Measurement and analysis of online social networks' (2007) studying sites like Orkut and Flickr.

Reading Guide

Where to Start

'Support vector machines' (1998) by Hearst et al., as it introduces foundational classification techniques widely applied in spam and phishing detection tasks.

Key Papers Explained

'Support vector machines' (1998) by Hearst et al. establishes SVMs for text-based classification, extended by 'A systematic analysis of performance measures for classification tasks' (2009) by Sokolova and Lapalme for evaluation metrics. 'ML-KNN: A lazy learning approach to multi-label learning' (2007) and 'A Review on Multi-Label Learning Algorithms' (2013) by Zhang and Zhou build on this for multi-threat detection. 'The spread of true and false news online' (2018) by Vosoughi et al. applies these to social propagation analysis.

Paper Timeline

100%
graph LR P0["Support vector machines
1998 · 6.6K cites"] P1["ML-KNN: A lazy learning approach...
2007 · 3.5K cites"] P2["Measurement and analysis of onli...
2007 · 3.0K cites"] P3["A systematic analysis of perform...
2009 · 5.9K cites"] P4["A Review on Multi-Label Learning...
2013 · 3.2K cites"] P5["VADER: A Parsimonious Rule-Based...
2014 · 5.4K cites"] P6["The spread of true and false new...
2018 · 7.8K cites"] P0 --> P1 P1 --> P2 P2 --> P3 P3 --> P4 P4 --> P5 P5 --> P6 style P6 fill:#DC5238,stroke:#c4452e,stroke-width:2px
Scroll to zoom • Drag to pan

Most-cited paper highlighted in red. Papers ordered chronologically.

Advanced Directions

Current work targets fake news and misinformation spread, as in 'Fake News Detection on Social Media' (2017) by Shu et al. and 'The spread of true and false news online' (2018) by Vosoughi et al., amid no recent preprints. Focus remains on adapting classifiers to concept drift per 'A survey on concept drift adaptation' (2014).

Papers at a Glance

# Paper Year Venue Citations Open Access
1 The spread of true and false news online 2018 Science 7.8K
2 Support vector machines 1998 IEEE Intelligent Syste... 6.6K
3 A systematic analysis of performance measures for classificati... 2009 Information Processing... 5.9K
4 VADER: A Parsimonious Rule-Based Model for Sentiment Analysis ... 2014 Proceedings of the Int... 5.4K
5 ML-KNN: A lazy learning approach to multi-label learning 2007 Pattern Recognition 3.5K
6 A Review on Multi-Label Learning Algorithms 2013 IEEE Transactions on K... 3.2K
7 Measurement and analysis of online social networks 2007 3.0K
8 A survey on concept drift adaptation 2014 ACM Computing Surveys 3.0K
9 Fake News Detection on Social Media 2017 ACM SIGKDD Exploration... 3.0K
10 Social information filtering 1995 2.8K

Frequently Asked Questions

What role does machine learning play in Spam and Phishing Detection?

Support vector machines provide robust classification for spam and phishing tasks, as detailed in 'Support vector machines' (1998) with applications in text categorization. Multi-label learning algorithms like ML-KNN handle instances associated with multiple labels, such as combined spam and phishing traits, from 'ML-KNN: A lazy learning approach to multi-label learning' (2007). Surveys like 'A Review on Multi-Label Learning Algorithms' (2013) catalog progresses in these methods over a decade.

How does sentiment analysis contribute to phishing detection?

VADER, a rule-based model, analyzes sentiment in social media text to detect manipulative language in phishing and spam, outperforming benchmarks like LIWC, as in 'VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text' (2014). It addresses challenges in short, informal content common in phishing attempts. The model supports real-time filtering in social networks.

What performance measures are used in spam classification?

Classification tasks in spam detection rely on measures like accuracy, precision, recall, and F-measure, systematically analyzed in 'A systematic analysis of performance measures for classification tasks' (2009). These metrics evaluate detectors under imbalanced datasets typical of phishing scenarios. The paper provides frameworks for comparing machine learning models.

Why is concept drift relevant to phishing detection?

Concept drift occurs when phishing patterns change over time, requiring adaptive learning strategies outlined in 'A survey on concept drift adaptation' (2014). It categorizes methods for online supervised learning in dynamic environments like social media. Adaptation maintains detection efficacy against evolving threats.

How do social networks factor into spam detection?

Online social networks enable spam and Sybil attacks, analyzed through user measurements in 'Measurement and analysis of online social networks' (2007) covering Orkut, YouTube, and Flickr. Fake news spreads rapidly, as in 'Fake News Detection on Social Media' (2017) targeting low-quality intentional misinformation. Detection leverages network structures for filtering.

Open Research Questions

  • ? How can multi-label learning be optimized for real-time detection of overlapping spam, phishing, and bot activities in evolving social networks?
  • ? What adaptive strategies best counter concept drift in phishing attacks across diverse platforms like Twitter and review sites?
  • ? How do behavioral signals integrate with machine learning to improve Sybil attack detection without relying solely on URL filtering?
  • ? Which performance measures most accurately evaluate spam detectors under severe class imbalance in large-scale rumor cascades?

Research Spam and Phishing Detection with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Spam and Phishing Detection with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

See how PapersFlow works for Computer Science researchers