Greedy Centroid Initialization for Federated K-means

Kun Yang, Mohammad Mohammadi Amiri, Sanjeev R. Kulkarni

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

K-means is a widely used data clustering algorithm which aims to partition a set of data points into $K$ clusters through finding the best $K$ centroids representing the data points. Initialization plays a vital role in the traditional centralized K-means clustering algorithm where the clustering is carried out at a central node accessing the entire data points. In this paper, we focus on K-means in a federated setting, where the clients store data locally, and the raw data never leaves the devices. Given the importance of initialization on the federated K-means algorithm, we aim to find better initial centroids by leveraging the local data on each client. To this end, we start the centroid initialization at the clients rather than at the server, which has no information about the clients' data initially. The clients first select their local initial clusters, and they share their clustering information (cluster centroids and sizes) with the server. The server then uses a greedy algorithm to choose the global initial centroids based on the information received from the clients. Numerical results on synthetic and public datasets show that our proposed method can achieve better and more stable performance than three federated K-means variants, and similar performance to the centralized K-means algorithm.

Original languageEnglish (US)
Title of host publication2023 57th Annual Conference on Information Sciences and Systems, CISS 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665451819
DOIs
StatePublished - 2023
Event57th Annual Conference on Information Sciences and Systems, CISS 2023 - Baltimore, United States
Duration: Mar 22 2023Mar 24 2023

Publication series

Name2023 57th Annual Conference on Information Sciences and Systems, CISS 2023

Conference

Conference57th Annual Conference on Information Sciences and Systems, CISS 2023
Country/TerritoryUnited States
CityBaltimore
Period3/22/233/24/23

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Hardware and Architecture
  • Information Systems
  • Artificial Intelligence
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality

Keywords

  • Clustering
  • Federated Learning
  • K-means
  • Machine Learning

Fingerprint

Dive into the research topics of 'Greedy Centroid Initialization for Federated K-means'. Together they form a unique fingerprint.

Cite this