Differentially-Private “Draw and Discard” Machine Learning: Training Distributed Model from Enormous Crowds

Vasyl Pihur, Aleksandra Korolova, Frederick Liu, Subhash Sankuratripati, Moti Yung, Dachuan Huang, Ruogu Zeng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations


The setting of our problem is a distributed architecture facing an enormous user set, where events are repeating and evolving over time, and we want to absorb the stream of events into the model: first local model, then absorb it in the global one, and also care about user privacy. Naturally, we learn a phenomenon which happens distributedly in many places (like malware spread over smartphones, user behavior to operation and UX of an app, or other such events). To this end, we considered a configuration where the learning server is built to deal with the possibly high frequency high-volume environment in a natural distributed fashion, while taking care of statistical convergence and privacy properties of the setting as well. We propose a novel framework for privacy-preserving client-distributed machine learning. It is based on the desire to allow differential privacy guarantees in the local model of privacy in a way that satisfies systems constraints using high number of asynchronous client-server communication (i.e., not much coordination among separate clients, which is a simple communication model to implement, which in some settings already exist, i.e., in apps facing users), and provides attractive model learning properties. We develop a generic randomized learning algorithm “Draw and Discard” because it relies on random sampling and discarding of models for load distribution and scalability, which also provides additional server-side privacy protections and improved model quality through averaging. The model is general and we show its applicability to Generalized Linear models. We analyze the statistical stability and privacy guarantees provided by our approach against faults and against several types of adversaries. We then showcase experimental results. Our framework (first reported in [28]) has been experimentally deployed in a real industrial setting. We view the result as an initial combination of ML and of distributed systems, and we believe it poses numerous directions for further developments.

Original languageEnglish (US)
Title of host publicationCyber Security, Cryptology, and Machine Learning - 6th International Symposium, CSCML 2022, Proceedings
EditorsShlomi Dolev, Amnon Meisels, Jonathan Katz
PublisherSpringer Science and Business Media Deutschland GmbH
Number of pages19
ISBN (Print)9783031076886
StatePublished - 2022
Externally publishedYes
Event6th International Symposium on Cyber Security Cryptography and Machine Learning, CSCML 2022 - Beer Sheva, Israel
Duration: Jun 30 2022Jul 1 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13301 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference6th International Symposium on Cyber Security Cryptography and Machine Learning, CSCML 2022
CityBeer Sheva

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science


  • Differential privacy
  • Distributed machine learning
  • High-volume distributed computing
  • Local privacy model


Dive into the research topics of 'Differentially-Private “Draw and Discard” Machine Learning: Training Distributed Model from Enormous Crowds'. Together they form a unique fingerprint.

Cite this