TY - GEN
T1 - Greedy Centroid Initialization for Federated K-means
AU - Yang, Kun
AU - Amiri, Mohammad Mohammadi
AU - Kulkarni, Sanjeev R.
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - K-means is a widely used data clustering algorithm which aims to partition a set of data points into $K$ clusters through finding the best $K$ centroids representing the data points. Initialization plays a vital role in the traditional centralized K-means clustering algorithm where the clustering is carried out at a central node accessing the entire data points. In this paper, we focus on K-means in a federated setting, where the clients store data locally, and the raw data never leaves the devices. Given the importance of initialization on the federated K-means algorithm, we aim to find better initial centroids by leveraging the local data on each client. To this end, we start the centroid initialization at the clients rather than at the server, which has no information about the clients' data initially. The clients first select their local initial clusters, and they share their clustering information (cluster centroids and sizes) with the server. The server then uses a greedy algorithm to choose the global initial centroids based on the information received from the clients. Numerical results on synthetic and public datasets show that our proposed method can achieve better and more stable performance than three federated K-means variants, and similar performance to the centralized K-means algorithm.
AB - K-means is a widely used data clustering algorithm which aims to partition a set of data points into $K$ clusters through finding the best $K$ centroids representing the data points. Initialization plays a vital role in the traditional centralized K-means clustering algorithm where the clustering is carried out at a central node accessing the entire data points. In this paper, we focus on K-means in a federated setting, where the clients store data locally, and the raw data never leaves the devices. Given the importance of initialization on the federated K-means algorithm, we aim to find better initial centroids by leveraging the local data on each client. To this end, we start the centroid initialization at the clients rather than at the server, which has no information about the clients' data initially. The clients first select their local initial clusters, and they share their clustering information (cluster centroids and sizes) with the server. The server then uses a greedy algorithm to choose the global initial centroids based on the information received from the clients. Numerical results on synthetic and public datasets show that our proposed method can achieve better and more stable performance than three federated K-means variants, and similar performance to the centralized K-means algorithm.
KW - Clustering
KW - Federated Learning
KW - K-means
KW - Machine Learning
UR - http://www.scopus.com/inward/record.url?scp=85154069044&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85154069044&partnerID=8YFLogxK
U2 - 10.1109/CISS56502.2023.10089666
DO - 10.1109/CISS56502.2023.10089666
M3 - Conference contribution
AN - SCOPUS:85154069044
T3 - 2023 57th Annual Conference on Information Sciences and Systems, CISS 2023
BT - 2023 57th Annual Conference on Information Sciences and Systems, CISS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 57th Annual Conference on Information Sciences and Systems, CISS 2023
Y2 - 22 March 2023 through 24 March 2023
ER -