TY - JOUR

T1 - Nonparametric Detection of Anomalous Data Streams

AU - Zou, Shaofeng

AU - Liang, Yingbin

AU - Poor, H. Vincent

AU - Shi, Xinghua

N1 - Funding Information:
Manuscript received August 20, 2016; revised February 17, 2017 and May 26, 2017; accepted July 13, 2017. Date of publication July 31, 2017; date of current version September 5, 2017. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. D. Robert Iskander. The work of S. Zou was supported by a National Science Foundation CAREER Award under Grant CCF-10-26565, when S. Zou was a PhD student at Syracuse University. The work of Y. Liang was supported in part by a National Science Foundation CAREER Award under Grant CCF-10-26565 and in part by the National Science Foundation under Grant CCF-16-17789. The work of H. V. Poor was supported by the National Science Foundation under Grants CMMI-1435778 and ECCS-1343210. The work of X. Shi was supported by the National Science Foundation under Grants DGE-1523154 and IIS-1502172. This paper was presented in part at the 52nd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 2014. (Corresponding author: Shaofeng Zou.) S. Zou is with the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA (e-mail: szou3@illinois.edu).
Publisher Copyright:
© 2017 IEEE.

PY - 2017/11/1

Y1 - 2017/11/1

N2 - A nonparametric anomalous hypothesis testing problem is investigated, in which there are totally n observed sequences out of which s anomalous sequences are to be detected. Each typical sequence consists of m independent and identically distributed (i.i.d.) samples drawn from a distribution p, whereas each anomalous sequence consists of m i.i.d. samples drawn from a distribution q that is distinct from p. The distributions p and q are assumed to be unknown in advance. Distribution-free tests are constructed by using the maximum mean discrepancy as the metric, which is based on mean embeddings of distributions into a reproducing kernel Hilbert space. The probability of error is bounded as a function of the sample size m, the number s of anomalous sequences, and the number n of sequences. It is shown that with s known, the constructed test is exponentially consistent if m is greater than a constant factor of n, for any p and q, whereas with s unknown, m should have an order strictly greater than n. Furthermore, it is shown that no test can be consistent for arbitrary p and q if m is less than a constant factor of n. Thus, the order-level optimality of the proposed test is established. Numerical results are provided to demonstrate that the proposed tests outperform (or perform as well as) tests based on other competitive approaches under various cases.

AB - A nonparametric anomalous hypothesis testing problem is investigated, in which there are totally n observed sequences out of which s anomalous sequences are to be detected. Each typical sequence consists of m independent and identically distributed (i.i.d.) samples drawn from a distribution p, whereas each anomalous sequence consists of m i.i.d. samples drawn from a distribution q that is distinct from p. The distributions p and q are assumed to be unknown in advance. Distribution-free tests are constructed by using the maximum mean discrepancy as the metric, which is based on mean embeddings of distributions into a reproducing kernel Hilbert space. The probability of error is bounded as a function of the sample size m, the number s of anomalous sequences, and the number n of sequences. It is shown that with s known, the constructed test is exponentially consistent if m is greater than a constant factor of n, for any p and q, whereas with s unknown, m should have an order strictly greater than n. Furthermore, it is shown that no test can be consistent for arbitrary p and q if m is less than a constant factor of n. Thus, the order-level optimality of the proposed test is established. Numerical results are provided to demonstrate that the proposed tests outperform (or perform as well as) tests based on other competitive approaches under various cases.

KW - Anomalous hypothesis testing

KW - consistency

KW - distribution-free tests

KW - maximum mean discrepancy (MMD)

UR - http://www.scopus.com/inward/record.url?scp=85028814239&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85028814239&partnerID=8YFLogxK

U2 - 10.1109/TSP.2017.2733472

DO - 10.1109/TSP.2017.2733472

M3 - Article

AN - SCOPUS:85028814239

SN - 1053-587X

VL - 65

SP - 5785

EP - 5797

JO - IEEE Transactions on Signal Processing

JF - IEEE Transactions on Signal Processing

IS - 21

M1 - 7997789

ER -