Subtopic Deep Dive

Adversarial Examples in Deep Learning
Research Guide

What is Adversarial Examples in Deep Learning?

Adversarial examples are inputs to deep neural networks modified by small, often imperceptible perturbations that cause misclassification while appearing natural to humans.

This subtopic examines attack methods like FGSM and PGD for generating such examples (Goodfellow et al., 2014 implied in later works). Key papers include 'Practical Black-Box Attacks against Machine Learning' by Papernot et al. (2017, 3366 citations) and 'Boosting Adversarial Attacks with Momentum' by Dong et al. (2018, 2784 citations). Over 10 provided papers from 2012-2019 span white-box, black-box, and physical attacks, totaling >25,000 citations.

15
Curated Papers
3
Key Challenges

Why It Matters

Adversarial examples expose vulnerabilities in deep models used in autonomous vehicles and medical imaging, as shown in 'Robust Physical-World Attacks on Deep Learning Visual Classification' by Eykholt et al. (2018, 2006 citations) where stop signs were perturbed to fool classifiers. Papernot et al. (2017) demonstrated black-box attacks transferable across models, impacting deployed systems. Yuan et al. (2019) surveyed attacks and defenses, highlighting needs for secure AI in safety-critical domains.

Key Research Challenges

Black-Box Attack Transferability

Generating effective adversarial examples without model access relies on transferable perturbations, as in Papernot et al. (2017) using query-based attacks (3366 citations). Challenges include low success rates across diverse architectures. ZOO by Chen et al. (2017) addresses gradient estimation but scales poorly (1677 citations).

Physical-World Robustness

Perturbations must survive real-world conditions like lighting and angles, per Kurakin et al. (2018) printed examples (1842 citations). Eykholt et al. (2018) showed physical attacks on traffic signs (2006 citations). Variability in environments reduces attack reliability.

Imperceptibility vs. Effectiveness

Balancing minimal L_p-norm perturbations with high misclassification rates challenges attacks like momentum-boosted methods in Dong et al. (2018, 2784 citations). Detection defenses like feature squeezing (Xu et al., 2018, 1778 citations) exploit larger distortions. Trade-offs limit stealthy attacks.

Essential Papers

1.

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, Moustapha Cissé, Yann Dauphin et al. · 2017 · arXiv (Cornell University) · 4.7K citations

Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle ...

2.

Deep Reinforcement Learning with Double Q-Learning

Hado van Hasselt, Arthur Guez, David Silver · 2016 · Proceedings of the AAAI Conference on Artificial Intelligence · 3.5K citations

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they har...

3.

Practical Black-Box Attacks against Machine Learning

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow et al. · 2017 · 3.4K citations

Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to hu...

4.

Boosting Adversarial Attacks with Momentum

Yinpeng Dong, Fangzhou Liao, Tianyu Pang et al. · 2018 · 2.8K citations

Deep neural networks are vulnerable to adversarial examples, which poses security concerns on these algorithms due to the potentially severe consequences. Adversarial attacks serve as an important ...

5.

Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures

Matt Fredrikson, Somesh Jha, Thomas Ristenpart · 2015 · 2.6K citations

Machine-learning (ML) algorithms are increasingly utilized in privacy-sensitive applications such as predicting lifestyle choices, making medical diagnoses, and facial recognition. In a model inver...

6.

Robust Physical-World Attacks on Deep Learning Visual Classification

Kevin Eykholt, Ivan Evtimov, Earlence Fernandes et al. · 2018 · 2.0K citations

Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that em...

7.

Adversarial Examples in the Physical World

Alexey Kurakin, Ian Goodfellow, Samy Bengio · 2018 · 1.8K citations

Most existing machine learning classifiers are highly vulnerable to adversarial examples. An adversarial example is a sample of input data which has been modified very slightly in a way that is int...

Reading Guide

Foundational Papers

Start with Biggio et al. (2012, 734 citations) on poisoning attacks as precursors, then Gu and Rigazio (2014, 632 citations) for early DNN vulnerability evidence.

Recent Advances

Study Dong et al. (2018, 2784 citations) for advanced iterative attacks and Eykholt et al. (2018, 2006 citations) for physical implications; Yuan et al. (2019, 1676 citations) for comprehensive review.

Core Methods

Core techniques: white-box gradient methods (FGSM, PGD), black-box zeroth-order optimization (ZOO), momentum acceleration, and physical simulations with lighting/viewpoint variations.

How PapersFlow Helps You Research Adversarial Examples in Deep Learning

Discover & Search

Research Agent uses searchPapers('adversarial examples FGSM PGD') to find Papernot et al. (2017), then citationGraph to map 3366 citations including Dong et al. (2018), and findSimilarPapers for physical attacks like Eykholt et al. (2018). exaSearch uncovers related black-box methods from 250M+ OpenAlex papers.

Analyze & Verify

Analysis Agent applies readPaperContent on Kurakin et al. (2018) to extract physical attack protocols, verifyResponse with CoVe to check perturbation norms against claims, and runPythonAnalysis to recreate FGSM in NumPy sandbox with GRADE scoring for reproducibility. Statistical verification confirms transferability rates from Papernot et al. (2017).

Synthesize & Write

Synthesis Agent detects gaps in black-box defenses via contradiction flagging across Xu et al. (2018) and Chen et al. (2017), while Writing Agent uses latexEditText for attack comparisons, latexSyncCitations for 10+ papers, latexCompile for reports, and exportMermaid for attack-defense flowcharts.

Use Cases

"Reproduce PGD attack success rates on CIFAR-10 from recent papers"

Research Agent → searchPapers('PGD adversarial examples CIFAR') → Analysis Agent → runPythonAnalysis(NumPy recreation of Dong et al. 2018 momentum + PGD) → matplotlib plots of robustness curves.

"Draft LaTeX survey on physical adversarial attacks"

Research Agent → citationGraph(Eykholt 2018 + Kurakin 2018) → Synthesis → gap detection → Writing Agent → latexEditText(intro), latexSyncCitations(5 papers), latexCompile(PDF output with diagrams).

"Find GitHub repos for ZOO black-box attack code"

Research Agent → searchPapers('ZOO Chen 2017') → Code Discovery → paperExtractUrls → paperFindGithubRepo → githubRepoInspect(attack scripts, dependencies) → exportCsv(results).

Automated Workflows

Deep Research workflow scans 50+ papers on adversarial examples via searchPapers → citationGraph → structured report with GRADE-verified metrics from Papernot et al. (2017). DeepScan's 7-step chain analyzes Eykholt et al. (2018) with readPaperContent → runPythonAnalysis(perturbation stats) → CoVe verification. Theorizer generates hypotheses on attack universality from foundational Biggio et al. (2012) poisoning to modern physical attacks.

Frequently Asked Questions

What defines an adversarial example?

Adversarial examples are inputs with small perturbations causing deep network misclassification, remaining imperceptible to humans (Kurakin et al., 2018).

What are key attack methods?

Methods include FGSM, PGD, momentum-boosted iterations (Dong et al., 2018), and black-box Zoo (Chen et al., 2017).

What are top papers?

Papernot et al. (2017, 3366 citations) on black-box attacks; Dong et al. (2018, 2784 citations) on momentum; Eykholt et al. (2018, 2006 citations) on physical attacks.

What open problems exist?

Improving physical robustness (Kurakin et al., 2018), scalable black-box attacks (Chen et al., 2017), and universal perturbations across architectures.

Research Adversarial Robustness in Machine Learning with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Adversarial Examples in Deep Learning with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

See how PapersFlow works for Computer Science researchers