Nonparametric Detection of Anomalous Data Streams

Shaofeng Zou, Yingbin Liang, H. Vincent Poor, Xinghua Shi

Research output: Contribution to journalArticlepeer-review

19 Scopus citations


A nonparametric anomalous hypothesis testing problem is investigated, in which there are totally n observed sequences out of which s anomalous sequences are to be detected. Each typical sequence consists of m independent and identically distributed (i.i.d.) samples drawn from a distribution p, whereas each anomalous sequence consists of m i.i.d. samples drawn from a distribution q that is distinct from p. The distributions p and q are assumed to be unknown in advance. Distribution-free tests are constructed by using the maximum mean discrepancy as the metric, which is based on mean embeddings of distributions into a reproducing kernel Hilbert space. The probability of error is bounded as a function of the sample size m, the number s of anomalous sequences, and the number n of sequences. It is shown that with s known, the constructed test is exponentially consistent if m is greater than a constant factor of n, for any p and q, whereas with s unknown, m should have an order strictly greater than n. Furthermore, it is shown that no test can be consistent for arbitrary p and q if m is less than a constant factor of n. Thus, the order-level optimality of the proposed test is established. Numerical results are provided to demonstrate that the proposed tests outperform (or perform as well as) tests based on other competitive approaches under various cases.

Original languageEnglish (US)
Article number7997789
Pages (from-to)5785-5797
Number of pages13
JournalIEEE Transactions on Signal Processing
Issue number21
StatePublished - Nov 1 2017

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Electrical and Electronic Engineering


  • Anomalous hypothesis testing
  • consistency
  • distribution-free tests
  • maximum mean discrepancy (MMD)


Dive into the research topics of 'Nonparametric Detection of Anomalous Data Streams'. Together they form a unique fingerprint.

Cite this