Federated Learning

Horizontal Federated Learning (HFL) is a privacy-preserving computational technology and one of the most mature implementations of federated learning. Originally proposed by Google, HFL has been extensively utilized in Google's Gboard, where it significantly enhances predictive text capabilities by using historical typing data from users without compromising their privacy.

How Horizontal Federated Learning Works

HFL operates by distributing the machine learning model across numerous clients (such as mobile phones or personal computers) that all possess similar types of data. Instead of pooling data into a central server, each client uses its own data to train the model locally. The updates from these local models (typically gradients of the model's parameters) are then sent to a central server which aggregates them to improve a global model.

Key Components

  1. Server:

    • Role: Maintains and updates the global machine learning model and coordinates the training process across multiple clients.

    • Functionality: The server initiates training rounds, selects sufficient clients for each training phase, and aggregates the gradients provided by clients to update the global model.

  2. Clients:

    • Role: Each client holds its private data locally and participates in the model training by using this data.

    • Functionality: Clients train the local version of the model using their data and send the model updates back to the server.

Phases of Horizontal Federated Learning

  1. Selection Phase:

    • Clients connect to the server when they have sufficient data to participate in training.

    • The server either accepts or rejects these connections based on predefined criteria, such as the number of participants needed for a training round.

  2. Training Phase:

    • Once selected, clients receive the current model parameters and training configurations from the server.

    • Each client trains the model locally using its data and computes the gradients.

  3. Update Phase:

    • Clients send their computed gradients to the server.

    • The server aggregates these gradients, typically by averaging, and updates the global model parameters accordingly.

Security Measures in Federated Learning

Ensuring the privacy and security of training data during the aggregation process is crucial. This is addressed by:

  • Secure Aggregation: Clients encrypt their gradients before sending them to the server. The server then performs aggregation on encrypted data, ensuring that no sensitive information from individual clients is exposed.

  • Privacy Enhancements: Techniques such as differential privacy may be applied where noise is added to the data or gradients to prevent potential leakage of sensitive information.

Last updated