Route-and-Aggregate Decentralized Federated Learning under Communication Errors

Weicai Li, Tiejun Lv, Wei Ni, Jingbo Zhao, Ekram Hossain, H. Vincent Poor

Research output: Contribution to journalArticlepeer-review

Abstract

Decentralized federated learning (D-FL) allows clients to aggregate learning models locally, offering flexibility and scalability. Existing D-FL methods use gossip protocols, which are inefficient when not all nodes in the network are D-FL clients. This article puts forth a new D-FL strategy, termed route-and-aggregate (R&A) D-FL, where participating clients exchange models with their peers through established routes (as opposed to flooding) and adaptively normalize their aggregation coefficients to compensate for communication errors. The impact of routing and imperfect links on the convergence of D-FL is analyzed, revealing that convergence is minimized when routes with the minimum end-to-end (E2E) packet error rates (PERs) are employed to deliver models. Our analysis is experimentally validated through three image classification tasks and two next-word prediction tasks, utilizing widely recognized datasets and models. R&A D-FL outperforms the flooding-based D-FL method in terms of training accuracy by 35% in our tested ten-client network, and shows strong synergy between D-FL and networking. In another test with ten D-FL clients, the training accuracy of R&A D-FL with communication errors approaches that of the ideal centralized federated learning (C-FL) without communication errors, as the number of routing nodes (i.e., nodes that do not participate in the training of D-FL) rises to 28.

Original languageEnglish (US)
JournalIEEE Transactions on Neural Networks and Learning Systems
DOIs
StateAccepted/In press - 2025
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Keywords

  • Convergence
  • decentralized federated learning (D-FL)
  • imperfect channel
  • peer-to-peer network
  • routing

Fingerprint

Dive into the research topics of 'Route-and-Aggregate Decentralized Federated Learning under Communication Errors'. Together they form a unique fingerprint.

Cite this