Learning in Gated Neural Networks

Ashok Vardhan Makkuva, Sreeram Kannan, Sewoong Oh, Pramod Viswanath

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

Gating is a key feature in modern neural networks including LSTMs, GRUs and sparsely-gated deep neural networks. The backbone of such gated networks is a mixture-of-experts layer, where several experts make regression decisions and gating controls how to weigh the decisions in an input-dependent manner. Despite having such a prominent role in both modern and classical machine learning, very little is understood about parameter recovery of mixture-of-experts since gradient descent and EM algorithms are known to be stuck in local optima in such models. In this paper, we perform a careful analysis of the optimization landscape and show that with appropriately designed loss functions, gradient descent can indeed learn the parameters of a MoE accurately. A key idea underpinning our results is the design of two distinct loss functions, one for recovering the expert parameters and another for recovering the gating parameters. We demonstrate the first sample complexity results for parameter recovery in this model for any algorithm and demonstrate significant performance gains over standard loss functions in numerical experiments.

Original languageEnglish (US)
Pages (from-to)3338-3348
Number of pages11
JournalProceedings of Machine Learning Research
Volume108
StatePublished - 2020
Externally publishedYes
Event23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020 - Virtual, Online
Duration: Aug 26 2020Aug 28 2020

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Learning in Gated Neural Networks'. Together they form a unique fingerprint.

Cite this