Federated Learning & its Implications for Your Machine Learning Deployment

December 22, 2022

Although more and more of today's organizations are looking to leverage HPC or machine learning (ML), not all have easy methods to gather and secure enough data.

For example, healthcare researchers need as much data as possible to discover life-saving treatments while keeping sensitive patient information private. Today, new Federated Learning architectures have been developed to aid machine learning deployments.

What is Federated Learning?

As mentioned, some organizations have enough data to leverage machine learning but need to consolidate it for analysis. Federated Learning (FL) solves this problem by changing the structure under which data is processed and machine learning models are trained. Instead of relying on one central server for all model training, remote servers in locations where the information is collected handle the workload. This approach enables single organizations with multiple sites and groups of independent organizations to share data while remaining compliant with government regulations.

As a result, they can generate better-trained ML models based on larger datasets in the same or less time as traditional methods.

Traditional ML vs. Federated Learning Models

In the traditional machine learning model, data is collected at remote locations and sent to a central hub that hosts a server to process and train the model. In this structure, data must travel from the data owner to the centralized server.

With Federated Learning, each location trains a locally-hosted model - an approach that allows the data to remain local. As each site collects its data, it updates its model. Then, it sends only its updated model (not the data) to a centralized aggregation server. That server updates a centralized model and propagates it to each location.

Key Considerations of the Federated Learning Aggregation Server

As the primary engine of the Federated Learning model, the aggregation server must be designed with specific considerations in mind. The most important of these include the following:

1. Systems Governance
The Federated Learning model requires that users have the power to define individual workloads and collaborate with others on them. Then, they must be able to enforce model training based on agreement with collaborators. At the same time, the aggregation server needs to be able to prioritize nodes so that it can rely on more heavily productive ones.

2. Algorithmic Decisions
Although the aggregation server relies on the nodes' input, it cannot allow individual contributors to influence or tamper with its algorithm. Robust aggregation is a method by which the server handles the input of outlier data from the nodes. Weighting allows the server to prevent nodes that only send small amounts of data from having more than their share of influence over collaborators with more data. Additionally, it helps to avoid poisonous input from collaborators with ill intent.

3. Privacy and Data Accuracy
By leveraging differential privacy, the aggregation server can quantify the level of data anonymization and prevent ML models from memorizing personal user data. At the same time, it must also ensure that the data its using is accurate.

4. Data Security
Other important tasks of the aggregation server include malware prevention and verifying that computing occurs within each node's silo.

5. Performance
Again, Federated Learning architectures eliminate the need for large data transfers. Therefore, despite the above concerns, Federated Learning models have delivered learning rates comparable to other successful methods.

Real-World Studies - Federated Learning Data Models in Action

Thanks to the ability of Federated Learning models to deliver practical ML training while preserving data privacy and compliance, larger-than-ever medical studies have been conducted. According to this study, the largest aggregates the learning from 71 healthcare institutions spanning six continents.

As a result, researchers combined 21x more data to produce a deep neural network that's 33% better at finding a rare brain tumor and 23% better at measuring the tumor's extent. At the same time, every institution could abide by its local laws governing the use of patient data.

For its part, Intel has collaborated with the University of Pennsylvania to create the OpenFL project, which provides a Python 3 framework to support valuable sharing among data scientists. And although the medical community has been the first to embrace Federated Learning, it was designed to be use-case neutral. Therefore, virtually any industry that needs to aggregate anonymized data at scale can benefit.

Design & Build Your Federated Learning Architecture with UNICOM Engineering

As an Intel Technology Provider and Dell Technologies Titanium OEM partner, UNICOM Engineering stands ready to design, build, and deploy the right hardware solution for your next ML, AI, or HPC initiative. Our deep technical expertise can drive your transitions to next-gen platforms and provide the flexibility and agility required to bring your solutions to market.

And our global footprint allows your solutions to be built and supported worldwide by a single company. You can understand why leading technology providers trust UNICOM Engineering as their application deployment partner. Schedule a consultation today to learn how UNICOM Engineering can assist your business.

Subscribe to Our Blog for Updates

Get expert blog content delivered straight to your inbox.