Subtopic Deep Dive

Black-Box Adversarial Attacks
Research Guide

What is Black-Box Adversarial Attacks?

Black-box adversarial attacks generate adversarial examples by querying target machine learning models without access to their internal parameters or gradients.

These attacks simulate realistic threat models where adversaries lack white-box access, relying on query efficiency and transferability from surrogate models. Papernot et al. (2017) introduced practical black-box attacks using transferability, achieving high success rates with limited queries (3366 citations). Research spans vision, NLP, and autonomous systems, with over 10 key papers cited here.

10
Curated Papers
3
Key Challenges

Why It Matters

Black-box attacks expose vulnerabilities in deployed ML systems like autonomous vehicles and facial recognition, as shown in Sharif et al. (2016) where adversaries bypassed authentication using eyeglass frames (1531 citations). They guide defense strategies for production models, with Papernot et al. (2017) demonstrating attacks on real-world APIs. Tian et al. (2018) applied them to DNN-driven cars via DeepTest, revealing safety risks in sensor processing (1187 citations).

Key Research Challenges

Query Efficiency Limits

Black-box attacks require many model queries, making them impractical for rate-limited APIs. Papernot et al. (2017) reduced queries via transferability but success drops on robust models. Optimizing query budgets remains open (3366 citations).

Transferability Variability

Adversarial examples transfer inconsistently from surrogate to target models. Jia and Liang (2017) showed variable transfer in NLP reading comprehension attacks (1271 citations). Enhancing reliable transfer across architectures challenges defenses.

Real-World Realism

Lab attacks often fail under physical constraints like lighting or angles. Sharif et al. (2016) succeeded with printed eyeglasses but scaling to diverse environments is hard (1531 citations). Tian et al. (2018) highlighted sensor noise impacts in driving scenarios (1187 citations).

Essential Papers

1.

Practical Black-Box Attacks against Machine Learning

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow et al. · 2017 · 3.4K citations

Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to hu...

2.

Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures

Matt Fredrikson, Somesh Jha, Thomas Ristenpart · 2015 · 2.6K citations

Machine-learning (ML) algorithms are increasingly utilized in privacy-sensitive applications such as predicting lifestyle choices, making medical diagnoses, and facial recognition. In a model inver...

3.

Accessorize to a Crime

Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer et al. · 2016 · 1.5K citations

Machine learning is enabling a myriad innovations, including new algorithms for cancer diagnosis and self-driving cars. The broad use of machine learning makes it important to understand the extent...

4.
5.

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

Eyke Hüllermeier, Willem Waegeman · 2021 · Machine Learning · 1.3K citations

6.

Adversarial Examples for Evaluating Reading Comprehension Systems

Robin Jia, Percy Liang · 2017 · 1.3K citations

Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear. To reward systems w...

7.

DeepTest

Yuchi Tian, Kexin Pei, Suman Jana et al. · 2018 · 1.2K citations

Recent advances in Deep Neural Networks (DNNs) have led to the development of DNN-driven autonomous cars that, using sensors like camera, LiDAR, etc., can drive without any human intervention. Most...

Reading Guide

Foundational Papers

Start with Papernot et al. (2017) for core transferability concept (3366 citations), then Fredrikson et al. (2015) for model inversion basics (2650 citations); no pre-2015 papers available.

Recent Advances

Tian et al. (2018) on DeepTest for autonomous driving (1187 citations); Jin et al. (2020) on BERT text attacks (815 citations); Nasr et al. (2019) on privacy inference (1457 citations).

Core Methods

Transfer attacks (Papernot et al., 2017), physical perturbations (Sharif et al., 2016), NLP adversaries (Jia and Liang, 2017), testing suites (Tian et al., 2018).

How PapersFlow Helps You Research Black-Box Adversarial Attacks

Discover & Search

Research Agent uses searchPapers and citationGraph to map Papernot et al. (2017) as the central hub, revealing 3366 citations and downstream works like Tian et al. (2018); exaSearch uncovers query-efficient variants, while findSimilarPapers links to Sharif et al. (2016) for physical attacks.

Analyze & Verify

Analysis Agent applies readPaperContent to extract query counts from Papernot et al. (2017), verifies transferability claims via verifyResponse (CoVe), and runs PythonAnalysis to replicate attack success rates on MNIST with NumPy; GRADE grading scores evidence strength for efficiency metrics.

Synthesize & Write

Synthesis Agent detects gaps in query-efficient physical attacks post-Sharif et al. (2016), flags contradictions in transferability; Writing Agent uses latexEditText, latexSyncCitations for Papernot et al. (2017), and latexCompile to produce reports with exportMermaid diagrams of attack pipelines.

Use Cases

"Reproduce query-efficient black-box attack from Papernot 2017 on surrogate models"

Research Agent → searchPapers('Papernot black-box') → Analysis Agent → readPaperContent + runPythonAnalysis (NumPy replication of transfer attack) → matplotlib plot of success rates vs. queries.

"Write LaTeX section comparing transferability in Jia 2017 and Papernot 2017"

Synthesis Agent → gap detection → Writing Agent → latexEditText (draft comparison) → latexSyncCitations (add Jia Liang 2017) → latexCompile → PDF with formatted tables.

"Find GitHub repos implementing DeepTest black-box attacks from Tian 2018"

Research Agent → searchPapers('DeepTest Tian') → Code Discovery → paperExtractUrls → paperFindGithubRepo → githubRepoInspect → list of verified attack codebases with README summaries.

Automated Workflows

Deep Research workflow conducts systematic review: searchPapers on 'black-box attacks' → citationGraph on Papernot et al. (2017) → structured report with 50+ papers ranked by citations. DeepScan applies 7-step analysis with CoVe checkpoints to verify Sharif et al. (2016) physical attack claims. Theorizer generates hypotheses on query reduction from Tian et al. (2018) and Jia and Liang (2017).

Frequently Asked Questions

What defines black-box adversarial attacks?

Attacks that query target models without gradient or parameter access, using transferability or direct optimization, as in Papernot et al. (2017).

What are key methods in black-box attacks?

Transfer-based attacks from surrogate models (Papernot et al., 2017) and query-efficient optimization; physical variants use printed perturbations (Sharif et al., 2016).

What are the most cited papers?

Papernot et al. (2017, 3366 citations) on practical attacks; Fredrikson et al. (2015, 2650 citations) on model inversion; Sharif et al. (2016, 1531 citations) on physical attacks.

What open problems exist?

Improving query efficiency under rate limits, reliable transferability across defenses, and physical robustness beyond lab settings (Tian et al., 2018).

Research Adversarial Robustness in Machine Learning with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Black-Box Adversarial Attacks with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

See how PapersFlow works for Computer Science researchers