RAKSUL TechBlog

RAKSULグループのエンジニアが技術トピックを発信するブログです

Federated Learning: Training Models Where the Data Lives

こんにちは、ラクスルベトナムTech LeadのMinhです。

本記事はノバセル テクノ場 出張版2025 Advent Calendar 2025の5日目の記事になります。

私は日本語が得意ではないので英語での投稿とさせてください。

1. Introduction: AI Wants More Data, but the World Says “No”

Figure 1. High-level federated learning architecture. Adapted from Nasim et al. [1].

Modern deep learning systems—especially foundation models and large language models—improve predictably with scale. Empirical scaling laws show that model performance increases as a function of data, parameters, and compute, with insufficient data becoming the dominant bottleneck as models grow larger [2][3]. Yet many of the most valuable datasets are increasingly:

  • Highly sensitive — private messages, medical images, transaction histories
  • Fragmented — distributed across devices, clinics, banks, countries
  • Heavily regulated — GDPR, HIPAA, PSD2, sector-specific banking rules, and data-localization laws

The traditional approach—centralizing everything into a data lake—is increasingly:

  • Legally constrained or prohibited
  • Politically complex, especially across business units or independent institutions
  • Operationally risky, widening the blast radius of any breach

As a result, organizations face a paradox: AI needs more data, but the world provides less access to it. Recent surveys describe Federated Learning (FL) as a promising response to this tension: a decentralized training paradigm where learning is aggregated, not the raw data [4].

2. What Is Federated Learning?

At its core, Federated Learning trains a shared model by sending the model to the data rather than sending the data to the model. Crucially, FL is not merely a distributed optimization algorithm. Modern surveys emphasize that it is a systems framework that encompasses:

  • Communication protocols
  • Client selection and orchestration
  • Robustness to failures and adversarial updates
  • Model and resource heterogeneity
  • Deployment and MLOps considerations

2.1 Two Common Federated Learning Settings

Cross-Device FL

  • Scale: Millions of edge clients (phones, browsers, wearables)
  • Challenges: tiny per-client datasets, intermittent connectivity, non-IID data distribution, resource constraints
  • Use cases: mobile keyboards, on-device personalization, recommendation and speech models

Cross-Silo FL

  • Scale: Tens to hundreds of reliable, institution-level clients (e.g., hospitals, banks)
  • Characteristics: larger datasets, stable networking, contractual governance
  • Use cases: multi-hospital diagnostics, cross-bank fraud detection, credit scoring

FL’s versatility across these settings makes it attractive as an alternative to centralized AI pipelines.

3. What Federated Learning Actually Solves

3.1 Privacy, Regulation, and Data Locality

In many industries, data simply cannot leave its source due to legal, ethical, or operational constraints:

  • Healthcare: Patient data is tightly regulated and fragmented across hospitals and imaging centers [5][6].
  • Finance: Transactions and identity data are siloed under strict confidentiality rules and regulatory oversight [7][8].
  • Consumer devices: Typed text, speech snippets, and behavioral signals are highly sensitive and difficult to centralize responsibly [9].

FL provides a practical mechanism to:

  • Train on local data while keeping it under local control
  • Comply with privacy and data-localization rules
  • Leverage diverse datasets across many participants
  • Reduce organizational risk by minimizing data movement

This is particularly attractive for enterprises seeking to de-risk AI adoption.

3.2 Personalization Without Over-Collection

User-facing products increasingly require personalization, but traditional approaches rely on intensive data collection. FL offers a cleaner alternative.

  • Gboard trains next-word prediction models using FL + secure aggregation + device-level differential privacy, allowing personalization without exposing raw keystrokes [10][11].
  • Apple’s Private Federated Learning (PFL) and the pfl-research framework generalize this for large-scale on-device learning with formal privacy protections [12].

Business value: FL enables tailored experiences without the operational or reputational risk of sending sensitive behavioral data to the cloud.

3.3 Cross-Organization Collaboration Without Data Sharing

FL enables institutions to jointly train models without exposing their data, creating new opportunities for collaboration.

  • Healthcare: Multi-hospital FL improves medical imaging performance—especially for rare diseases—while keeping data on-prem [5][6].
  • Finance: Multi-bank fraud detection and credit scoring systems use FL to combine intelligence without sharing transactions [7][8].

Strategic impact: FL transforms previously impossible partnerships into viable collaborations by aligning incentives, governance, and privacy requirements.

4. Architecture of a Federated Learning System

4.1 High-Level Architecture

Figure 2. Federated learning architecture for bird species classification. Adapted from Mulero-Pérez et al. [13].

  1. Initialization

    A central server initializes a global model (e.g., CNN, transformer) and selects clients for participation.

  2. Local Training & Model Update Transmission

    Each client trains the model on its private dataset. Data never leaves the device, satisfying privacy and data-locality constraints. Then, clients send updates (weights, gradients, or compressed representations) to the server.

  3. Aggregation

    The server aggregates updates via Federated Averaging (FedAvg) or more advanced algorithms (FedProx, FedNova, Scaffold).

  4. Iteration

    The updated global model is redistributed, and training proceeds iteratively until convergence.

4.2 Key Components

  • Federation Controller / Central Server

    Coordinates training rounds, maintains global state, enforces privacy mechanisms, and handles orchestration.

  • Clients (Edge Devices or Institutional Nodes)

    Provide local data and compute for training; may participate intermittently (cross-device) or reliably (cross-silo).

  • Local Private Data

    Remains on-device or on-prem, reducing governance overhead, breach risk, and compliance burden.

  • Global Model

    A shared model co-trained across participants, capturing collective knowledge without collecting sensitive records.

4.3 Architectural Challenges

  • System heterogeneity: varying hardware, availability, data distributions
  • Communication bottlenecks: limited bandwidth in edge environments
  • Stragglers: slow or unavailable clients
  • Privacy and security risks: model updates may leak data without DP or secure aggregation
  • Robustness: Byzantine clients or poisoned updates

5. Conclusion

Federated Learning has progressed from a conceptual idea to a mature architectural pattern for privacy‑preserving machine learning. Production deployments—such as Google’s Gboard and Apple’s PFL systems—show that FL is reliable and scalable in real‑world environments. As regulations tighten and edge computing advances, Federated Learning is positioned to become a foundational approach for building AI systems that uphold privacy, governance, and data sovereignty.

References [1] Nasim, M.D.A. et al. (2025). Principles and Components of Federated Learning Architectures. arXiv:2502.05273.

[2] Kaplan, J. et al. (2020). Scaling Laws for Neural Language Models. OpenAI.

[3] Hoffmann, J. et al. (2022). Training Compute-Optimal Large Language Models. DeepMind.

[4] Kairouz, P. et al. (2021). Advances and Open Problems in Federated Learning. Proceedings of the IEEE. arXiv:1912.04977

[5] Zhang, F. et al. (2023). Recent Methodological Advances in Federated Learning for Healthcare. arXiv:2310.02874.

[6] Rehman, M. H. U. et al. (2023). Federated Learning for Medical Imaging Radiology. British Journal of Radiology.

[7] Kennedy, C. H. et al. (2025). The Role of Federated Learning in Improving Financial Security: A Survey. IEEE GCAIoT / arXiv:2510.14991.

[8] Brundyn, A. et al. (2022). Using Federated Learning to Bridge Data Silos in Financial Services. NVIDIA Technical Blog.

[9] Google Research (2021). Predicting Text Selections with Federated Learning. Google AI Blog.

[10] Hard, A. et al. (2018). Federated Learning for Mobile Keyboard Prediction. Google Research.

[11] Xu, Z. et al. (2023). Federated Learning of Gboard Language Models with Differential Privacy. arXiv:2305.18465.

[12] Apple Machine Learning Research (2023). Private Federated Learning and pfl‑research Framework.

[13] Mulero-Pérez, D. et al. (2025). A Federated Learning Architecture for Bird Species Classification in Wetlands. Journal of Sensor and Actuator Networks, 14(4), 71. https://doi.org/10.3390/jsan14040071