Subtopic Deep Dive

← Privacy-Preserving Technologies in Data

Federated Learning
Research Guide

What is Federated Learning?

Federated learning is a distributed machine learning approach that trains models across multiple decentralized clients or devices while keeping raw data localized to preserve privacy.

Introduced in McMahan et al. (2016) with communication-efficient algorithms for deep networks, federated learning addresses data silos in mobile and edge settings. Key surveys like Kairouz et al. (2021, 4038 citations) and Zhang et al. (2021, 1539 citations) cover over 50 seminal works on aggregation, non-IID data, and privacy. Applications span healthcare (Xu et al., 2020) and digital health (Rieke et al., 2020, 2068 citations).

Curated Papers

Key Challenges

Why It Matters

Federated learning enables AI model training on edge devices under privacy regulations like GDPR, avoiding raw data centralization (McMahan et al., 2016). In healthcare, it supports collaborative informatics across hospitals without data sharing (Xu et al., 2020; Rieke et al., 2020). Industrial applications leverage it for siloed data in manufacturing (Li et al., 2020), while edge intelligence integrates it with computing constraints (Zhou et al., 2019). Wei et al. (2020, 1991 citations) demonstrate its role in combining differential privacy for robust performance against inference attacks (Nasr et al., 2019).

Key Research Challenges

Non-IID Data Distribution

Client datasets often exhibit statistical heterogeneity, degrading global model convergence. Zhao et al. (2018, 1895 citations) quantify performance drops in federated settings with non-IID splits. Aggregation methods like FedAvg struggle without personalization.

Communication Efficiency

Frequent model updates between clients and server incur high bandwidth costs in deep networks. McMahan et al. (2016, 5171 citations) propose sparsification and quantization to reduce payloads. Scaling to thousands of devices remains bottlenecked.

Privacy Leakage Risks

Gradient updates enable inference attacks reconstructing private data. Nasr et al. (2019, 1457 citations) analyze white-box attacks on centralized and federated learning. Wei et al. (2020) integrate differential privacy but trade off utility.

Essential Papers

Communication-Efficient Learning of Deep Networks from Decentralized Data

H. Brendan McMahan, Eider Moore, Daniel Ramage et al. · 2016 · arXiv (Cornell University) · 5.2K citations

Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve spe...

Advances and Open Problems in Federated Learning

Peter Kairouz, H. Brendan McMahan, Brendan Avent et al. · 2020 · Foundations and Trends® in Machine Learning · 4.0K citations

Federated learning (FL) is a machine learning setting where many clients (e.g., mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g...

The future of digital health with federated learning

Nicola Rieke, Jonny Hancox, Wenqi Li et al. · 2020 · npj Digital Medicine · 2.1K citations

Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing

Zhi Zhou, Xu Chen, En Li et al. · 2019 · Proceedings of the IEEE · 2.0K citations

With the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications and services, spanning from personal assistant to recommendation syst...

Federated Learning With Differential Privacy: Algorithms and Performance Analysis

Kang Wei, Jun Li, Ming Ding et al. · 2020 · IEEE Transactions on Information Forensics and Security · 2.0K citations

Federated learning (FL), as a type of distributed machine learning, is capable of significantly preserving clients&#x2019; private data from being exposed to adversaries. Nevertheless, private ...

Federated Learning with Non-IID Data

Yue Zhao, Meng Li, Liangzhen Lai et al. · 2018 · arXiv (Cornell University) · 1.9K citations

Federated learning enables resource-constrained edge compute devices, such as mobile phones and IoT devices, to learn a shared model for prediction, while keeping the training data local. This dece...

A survey on federated learning

Chen Zhang, Yu Xie, Hang Bai et al. · 2021 · Knowledge-Based Systems · 1.5K citations

Reading Guide

Foundational Papers

Start with McMahan et al. (2016) for FedAvg invention and core communication-efficient framework, then Zhao et al. (2018) for non-IID realities.

Recent Advances

Study Kairouz et al. (2021) for comprehensive open problems, Wei et al. (2020) for DP integration, and Rieke et al. (2020) for healthcare advances.

Core Methods

Core techniques: FedAvg aggregation (McMahan et al., 2016), DP-SGD for privacy (Wei et al., 2020), personalization for non-IID (Zhao et al., 2018), sparsification for efficiency.

How PapersFlow Helps You Research Federated Learning

Discover & Search

Research Agent uses searchPapers and citationGraph to map 5000+ citations from McMahan et al. (2016), revealing clusters on non-IID handling like Zhao et al. (2018). exaSearch uncovers niche applications in healthcare (Rieke et al., 2020), while findSimilarPapers expands from Kairouz et al. (2021) surveys.

Analyze & Verify

Analysis Agent employs readPaperContent on Wei et al. (2020) to extract DP-FL algorithms, then runPythonAnalysis simulates utility-privacy tradeoffs with NumPy/pandas on reproduced gradients. verifyResponse (CoVe) cross-checks claims against Nasr et al. (2019) attacks, with GRADE scoring evidence strength for non-IID claims in Zhao et al. (2018).

Synthesize & Write

Synthesis Agent detects gaps in communication efficiency post-McMahan et al. (2016) via contradiction flagging across Zhou et al. (2019) and Kairouz et al. (2021). Writing Agent applies latexEditText and latexSyncCitations for FedAvg proofs, latexCompile for arXiv-ready reports, and exportMermaid for aggregation flow diagrams.

Use Cases

"Reproduce non-IID performance drops from Zhao et al. (2018) on CIFAR-10."

Research Agent → searchPapers(Zhao 2018) → Analysis Agent → readPaperContent → runPythonAnalysis(FedAvg on non-IID splits with matplotlib plots) → researcher gets accuracy curves and statistical p-values.

"Write a survey section on FL privacy attacks citing Nasr et al. (2019)."

Research Agent → citationGraph(Nasr 2019) → Synthesis Agent → gap detection → Writing Agent → latexEditText(draft) → latexSyncCitations(10 papers) → latexCompile → researcher gets PDF with equations and figures.

"Find GitHub repos implementing differential privacy in FL from Wei et al. (2020)."

Research Agent → findSimilarPapers(Wei 2020) → Code Discovery → paperExtractUrls → paperFindGithubRepo → githubRepoInspect → researcher gets top 5 repos with code quality scores and FL-DP benchmarks.

Automated Workflows

Deep Research workflow conducts systematic reviews: searchPapers(250+ FL papers) → citationGraph → DeepScan(7-step verification on Kairouz et al., 2021) → structured report with gaps. Theorizer generates hypotheses on edge-FL integration from Zhou et al. (2019) + McMahan et al. (2016). DeepScan applies CoVe checkpoints to validate non-IID claims across Zhao et al. (2018) and surveys.

Try Doxa for Federated Learning Research

Frequently Asked Questions

What defines federated learning?

Federated learning trains a shared model by aggregating local updates from decentralized clients without exchanging raw data, as defined in McMahan et al. (2016).

What are core methods in federated learning?

FedAvg (McMahan et al., 2016) averages client models; extensions add differential privacy (Wei et al., 2020) and handle non-IID data (Zhao et al., 2018).

What are key papers?

Foundational: McMahan et al. (2016, 5171 citations). Surveys: Kairouz et al. (2021, 4038 citations), Zhang et al. (2021). Privacy: Wei et al. (2020), Nasr et al. (2019).

What are open problems?

Kairouz et al. (2021) highlight scalability to heterogeneous devices, robustness to poisoned clients, and balancing privacy-utility in non-IID settings.

Research Privacy-Preserving Technologies in Data with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

AI Literature Review

Automate paper discovery and synthesis across 474M+ papers

Code & Data Discovery

Find datasets, code repositories, and computational tools

Deep Research Reports

Multi-source evidence synthesis with counter-evidence

AI Academic Writing

Write research papers with AI assistance and LaTeX support

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Federated Learning with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

Try PapersFlow Free See AI Literature Review

See how PapersFlow works for Computer Science researchers

Part of the Privacy-Preserving Technologies in Data Research Guide