FedPeWS: Personalized Warmup via Subnetworks for Enhanced Heterogeneous Federated Learning

Nurbek Tastan1 Samuel Horvath1 Karthik Nandakumar1,2

1Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
2Michigan State University (MSU)

Abstract

Statistical data heterogeneity is a significant barrier to convergence in federated learning (FL). While prior work has advanced heterogeneous FL through better optimization objectives, these methods fall short when there is extreme data heterogeneity among collaborating participants. We hypothesize that convergence under extreme data heterogeneity is primarily hindered due to the aggregation of conflicting updates from the participants in the initial collaboration rounds. To overcome this problem, we propose a warmup phase where each participant learns a personalized mask and updates only a subnetwork of the full model. This personalized warmup allows the participants to focus initially on learning specific subnetworks tailored to the heterogeneity of their data. After the warmup phase, the participants revert to standard federated optimization, where all parameters are communicated. We empirically demonstrate that the proposed personalized warmup via subnetworks (FedPeWS) approach improves accuracy and convergence speed over standard federated optimization methods.


Conceptual illustration of training personalized subnetworks in federated learning.

FedPeWS - Main Algorithm


How FedPeWS works?

Illustration of the proposed FedPeWS algorithm for two participants, which aggregates partial subnetworks (xitmit) during the warmup phase to obtain a shared global model xgt. Here, xit and mit denote the local model and personalized mask of the ith participant in the $t^{\text{th}}$ round.
FedPeWS-Fixed. Fixed Mask Generation
Illustration of manual mask setting in the FedPeWS-Fixed method. The left figure illustrates the complete network with all neurons active and full connections. The middle figure represents subnetwork 1, utilizing only the left portion of the full network, where m1 corresponds to this left side. Conversely, the right figure indicates the part of the network used for subnetwork 2. This setting is employed in all experiments involving N=2 participants.

Results: Improved communication efficiency and accuracy

The required number of collaboration rounds to reach target accuracy υ % and the final accuracy after T rounds. The results are averaged over 3 seeds. × indicates that the algorithm cannot reach target accuracy υ within T rounds and NA means that it reaches υ only in one random seed.
Dataset / Batch size Synthetic-32K, 32 Synthetic-3.2K, 8
Parameters {ηg/λ/τ} {1.0/5.0/0.125} {0.5/2.0/0.2} {0.25/1.0/0.1875} {0.1/2.0/0.1}
Target accuracy υ(%) 99 90 75 99
No. of rounds to reach target accuracy FedAvg 148 ± 3.79 199 ± NA × 371 ± NA
FedAvg+PeWS 115 ± 7.21 182 ± 6.81 286 ± 7.93 301 ± 10.59
Final accuracy after T collaboration rounds FedAvg 99.94 ± 0.05 91.40 ± 7.25 67.64 ± 0.90 97.33 ± 3.89
FedAvg+PeWS 99.96 ± 0.01 99.49 ± 0.60 83.50 ± 3.52 99.66 ± 0.19


Results on Synthetic-{32, 3.2}K datasets with batch sizes {32, 8}, global learning rates ηg{1.0,0.5,0.25,0.1} and communication rounds T{200,250,400,500}. FedPeWS consistently converges faster and outperforms FedAvg.
Visualization of validation accuracy and loss on the Synthetic-32K dataset with N=4.

Results: Sensitivity analysis

(a) CIFAR-MNIST dataset
(b) {Path-OCT-Tissue}MNIST dataset
Results for experiments with $T=300$ on (a) CIFAR-MNIST and (b) {Path-OCT-Tissue}MNIST datasets. (a) Participant 1 uses MNIST; Participant 2 uses CIFAR-10; ablation study for λ and τ. (b) N=3 participants use {PathMNIST, OCTMNIST, TissueMNIST}; ablation study for λ and τ. FedPeWS-Fixed results appear in the last row; τ=0.0 denotes FedAvg.

Results: Comparison to SOTA algorithms

Comparison to the SOTA algorithms
Dataset CIFAR-MNIST {P-O-T}MNIST
FedAvg 71.78 ± 0.66 52.83 ± 1.26
FedProx 72.27 ± 0.88 51.28 ± 1.03
SCAFFOLD 71.83 ± 0.24 53.05 ± 0.60
FedNova 71.63 ± 0.98 53.05 ± 0.83
MOON 71.84 ± 1.09 52.10 ± 0.19
FedAvg+PeWS 75.83 ± 0.88 55.12 ± 0.56
FedProx+PeWS 75.04 ± 0.85 54.67 ± 0.43

Conclusion

In this work, we introduced a novel concept called personalized warmup via subnetworks for heterogeneous FL -- a strategy that enhances convergence speed and can seamlessly integrate with existing optimization techniques. Results demonstrate that the proposed FedPeWS approach achieves higher accuracy than the relevant baselines, especially when there is extreme statistical heterogeneity.

Contact

Contact me at nurbek [dot] tastan [at] mbzuai [dot] ac [dot] ae.

Citation

@InProceedings{tastan2025fedpews,
    title={Fed{PeWS}: Personalized Warmup via Subnetworks for Enhanced Heterogeneous Federated Learning},
    author={Nurbek Tastan and Samuel Horv{\'a}th and Martin Tak{\'a}{\v{c}} and Karthik Nandakumar},
    booktitle={The Second Conference on Parsimony and Learning (Proceedings Track)},
    year={2025},
    url={https://openreview.net/forum?id=iYwiyS1YdQ} 
}