Parallel Successive Learning for Dynamic Distributed Model Training Over Heterogeneous Wireless Networks

Seyyedali Hosseinalipour, Su Wang, Nicolo Michelusi, Vaneet Aggarwal, Christopher G. Brinton, David J. Love, Mung Chiang

Research output: Contribution to journalArticlepeer-review

Abstract

Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices, via iterative local updates (at devices) and global aggregations (at the server). In this paper, we develop <italic>parallel successive learning</italic> (PSL), which expands the FedL architecture along three dimensions: (i) <italic>Network</italic>, allowing decentralized cooperation among the devices via device-to-device (D2D) communications. (ii) <italic>Heterogeneity</italic>, interpreted at three levels: (ii-a) Learning: PSL considers heterogeneous number of stochastic gradient descent iterations with different mini-batch sizes at the devices; (ii-b) Data: PSL presumes a <italic>dynamic environment</italic> with data arrival and departure, where the distributions of local datasets evolve over time, captured via a new metric for <italic>model/concept drift</italic>. (ii-c) Device: PSL considers devices with different computation and communication capabilities. (iii) <italic>Proximity</italic>, where devices have different distances to each other and the access point. PSL considers the realistic scenario where global aggregations are conducted with <italic>idle times</italic> in-between them for resource efficiency improvements, and incorporates <italic>data dispersion</italic> and <italic>model dispersion with local model condensation</italic> into FedL. Our analysis sheds light on the notion of <italic>cold</italic> vs. <italic>warmed up</italic> models, and model <italic>inertia</italic> in distributed machine learning. We then propose <italic>network-aware dynamic model tracking</italic> to optimize the model learning vs. resource efficiency tradeoff, which we show is an NP-hard signomial programming problem. We finally solve this problem through proposing a general optimization solver. Our numerical results reveal new findings on the interdependencies between the idle times in-between the global aggregations, model/concept drift, and D2D cooperation configuration.

Original languageEnglish (US)
Pages (from-to)1-16
Number of pages16
JournalIEEE/ACM Transactions on Networking
DOIs
StateAccepted/In press - 2023
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Keywords

  • Computational modeling
  • Cooperative federated learning
  • Data models
  • Device-to-device communication
  • device-to-device communications
  • Dispersion
  • Distributed databases
  • dynamic machine learning
  • network optimization
  • Training
  • Wireless sensor networks

Fingerprint

Dive into the research topics of 'Parallel Successive Learning for Dynamic Distributed Model Training Over Heterogeneous Wireless Networks'. Together they form a unique fingerprint.

Cite this