Subtopic Deep Dive

Phishing Detection Techniques
Research Guide

What is Phishing Detection Techniques?

Phishing Detection Techniques encompass machine learning classifiers, feature extraction from URLs and emails, and hybrid models using lexical, structural, and behavioral signals for real-time phishing identification.

Researchers apply classifiers like Random Forest and SVM to URL features such as length, suspicious keywords, and domain age (Şahingöz et al., 2018; Abu-Nimeh et al., 2007). Surveys highlight AI-enabled methods including deep learning for email analysis (Basit et al., 2020). Over 20 papers from 2006-2022 compare techniques, with foundational work achieving 90%+ accuracy on benchmark datasets.

15
Curated Papers
3
Key Challenges

Why It Matters

Phishing attacks cause billions in annual financial losses and enable data breaches affecting millions of users (Akinyelu and Adewumi, 2014). Techniques like URL-based ML detection block 95% of phishing sites in email filters (Şahingöz et al., 2018). Hybrid models improve real-time browser protections, reducing click fraud in banking apps (Abu-Nimeh et al., 2007; Basit et al., 2020). Deployments in Gmail and enterprise systems prevent identity theft at scale.

Key Research Challenges

Evolving Attack Obfuscation

Phishers use URL encoding, typosquatting, and fast-flux domains to evade lexical features (Şahingöz et al., 2018). Static classifiers drop 20-30% accuracy on zero-day variants. Behavioral signals like click patterns help but require real-time adaptation (Basit et al., 2020).

Feature Extraction Scalability

Extracting structural features from millions of daily emails demands low-latency processing (Abu-Nimeh et al., 2007). Hybrid lexical-behavioral models increase compute by 5x without accuracy gains. Balancing precision and speed remains unsolved for mobile filters.

Imbalanced Dataset Bias

Phishing datasets have 1:1000 legitimate-to-phish ratios, biasing classifiers toward false negatives (Akinyelu and Adewumi, 2014). Random Forest mitigates via undersampling but loses generalization. Surveys note persistent 15% false positive rates in production (Basit et al., 2020).

Essential Papers

1.

A survey on sentiment analysis methods, applications, and challenges

Mayur Wankhade, Annavarapu Chandra Sekhara Rao, Chaitanya Kulkarni · 2022 · Artificial Intelligence Review · 1.3K citations

2.

The spread of low-credibility content by social bots

Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol et al. · 2018 · Nature Communications · 952 citations

3.

Machine learning based phishing detection from URLs

Özgür Koray Şahingöz, Ebubekir Buber, Önder Demir et al. · 2018 · Expert Systems with Applications · 712 citations

4.

Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior

Antigoni Founta, Constantinos Djouvas, Despoina Chatzakou et al. · 2018 · Proceedings of the International AAAI Conference on Web and Social Media · 560 citations

In recent years online social networks have suffered an increase in sexism, racism, and other types of aggressive and cyberbullying behavior, often manifesting itself through offensive, abusive, or...

5.

Deception detection for news: Three types of fakes

Victoria L. Rubin, Yimin Chen, Nadia Conroy · 2015 · Proceedings of the Association for Information Science and Technology · 529 citations

ABSTRACT A fake news detection system aims to assist users in detecting and filtering out varieties of potentially deceptive news. The prediction of the chances that a particular news item is inten...

6.

A comparison of machine learning techniques for phishing detection

Saeed Abu‐Nimeh, Dario Nappa, Xinlei Wang et al. · 2007 · 479 citations

There are many applications available for phishing detection. However, unlike predicting spam, there are only few studies that compare machine learning techniques in predicting phishing. The presen...

7.

Disinformation and social bot operations in the run up to the 2017 French presidential election

Emilio Ferrara · 2017 · First Monday · 462 citations

Recent accounts from researchers, journalists, as well as federal investigators, reached a unanimous conclusion: social media are systematically exploited to manipulate and alter public opinion. So...

Reading Guide

Foundational Papers

Start with Abu-Nimeh et al. (2007, 479 citations) for ML technique benchmarks establishing 90%+ accuracy baselines, then Akinyelu and Adewumi (2014) for Random Forest on emails achieving 98% precision.

Recent Advances

Study Şahingöz et al. (2018, 712 citations) for URL-specific features and Basit et al. (2020, 365 citations) for comprehensive AI survey covering deep learning advances.

Core Methods

Core techniques: lexical (keyword tf-idf), structural (URL length, IP presence), behavioral (redirect chains); classifiers include Random Forest, SVM, hybrids with CNNs for images in phishing PDFs (Smutz and Stavrou, 2012).

How PapersFlow Helps You Research Phishing Detection Techniques

Discover & Search

Research Agent uses searchPapers('phishing URL features machine learning') to find Şahingöz et al. (2018, 712 citations), then citationGraph reveals 150+ downstream works and findSimilarPapers uncovers hybrid models. exaSearch scans 250M+ OpenAlex papers for 'real-time phishing classifiers' yielding 50 recent preprints.

Analyze & Verify

Analysis Agent runs readPaperContent on Şahingöz et al. (2018) to extract URL feature lists, then runPythonAnalysis recreates their Random Forest model on provided datasets with NumPy/pandas for 94% accuracy verification. verifyResponse (CoVe) with GRADE grading flags contradictions in claims vs. results; statistical tests confirm p<0.01 significance.

Synthesize & Write

Synthesis Agent detects gaps like 'mobile phishing behavioral signals' across Basit et al. (2020) and Abu-Nimeh et al. (2007), then Writing Agent uses latexEditText for model comparisons, latexSyncCitations for 20 references, and latexCompile to generate a 5-page review. exportMermaid diagrams classifier pipelines from feature extraction to prediction.

Use Cases

"Reproduce Random Forest phishing accuracy from Abu-Nimeh 2007 on modern dataset"

Research Agent → searchPapers → Analysis Agent → readPaperContent + runPythonAnalysis (pandas crosstabs, matplotlib ROC curves) → researcher gets tuned model with 92% F1-score and feature importance plot.

"Write LaTeX section comparing URL vs email phishing detectors"

Synthesis Agent → gap detection → Writing Agent → latexEditText (insert tables) → latexSyncCitations (Şahingöz, Abu-Nimeh) → latexCompile → researcher gets PDF-ready section with compiled equations.

"Find GitHub code for phishing URL classifiers from recent papers"

Research Agent → citationGraph (Basit 2020) → Code Discovery → paperExtractUrls → paperFindGithubRepo → githubRepoInspect → researcher gets 3 runnable repos with feature extractors and training scripts.

Automated Workflows

Deep Research workflow conducts systematic review: searchPapers(50 phishing papers) → DeepScan(7-step: extract methods, GRADE evidence, runPythonAnalysis on 5 classifiers) → structured report ranking Random Forest vs. hybrids. Theorizer generates theory: 'Behavioral signals outperform lexical by 12% on obfuscated URLs' from Basit et al. (2020) + Şahingöz et al. (2018), validated via CoVe. DeepScan verifies Abu-Nimeh et al. (2007) claims against modern benchmarks.

Frequently Asked Questions

What defines Phishing Detection Techniques?

Machine learning classifiers using URL/email features like domain age, suspicious keywords, and structural patterns for real-time identification (Şahingöz et al., 2018).

What are key methods in phishing detection?

Random Forest on URL features achieves 94% accuracy (Şahingöz et al., 2018); SVM and logistic regression compare favorably on email datasets (Abu-Nimeh et al., 2007); AI hybrids add behavioral analysis (Basit et al., 2020).

What are influential papers?

Abu-Nimeh et al. (2007, 479 citations) benchmarks ML techniques; Şahingöz et al. (2018, 712 citations) specializes in URLs; Basit et al. (2020, 365 citations) surveys AI methods.

What open problems exist?

Zero-day obfuscation evades static features; imbalanced data biases models; scalable real-time behavioral analysis lacks deployment benchmarks (Basit et al., 2020).

Research Spam and Phishing Detection with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Phishing Detection Techniques with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

See how PapersFlow works for Computer Science researchers