TY - JOUR
T1 - Nonparametric Detection of Anomalous Data Streams
AU - Zou, Shaofeng
AU - Liang, Yingbin
AU - Poor, H. Vincent
AU - Shi, Xinghua
N1 - Funding Information:
Manuscript received August 20, 2016; revised February 17, 2017 and May 26, 2017; accepted July 13, 2017. Date of publication July 31, 2017; date of current version September 5, 2017. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. D. Robert Iskander. The work of S. Zou was supported by a National Science Foundation CAREER Award under Grant CCF-10-26565, when S. Zou was a PhD student at Syracuse University. The work of Y. Liang was supported in part by a National Science Foundation CAREER Award under Grant CCF-10-26565 and in part by the National Science Foundation under Grant CCF-16-17789. The work of H. V. Poor was supported by the National Science Foundation under Grants CMMI-1435778 and ECCS-1343210. The work of X. Shi was supported by the National Science Foundation under Grants DGE-1523154 and IIS-1502172. This paper was presented in part at the 52nd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 2014. (Corresponding author: Shaofeng Zou.) S. Zou is with the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA (e-mail: szou3@illinois.edu).
Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/1
Y1 - 2017/11/1
N2 - A nonparametric anomalous hypothesis testing problem is investigated, in which there are totally n observed sequences out of which s anomalous sequences are to be detected. Each typical sequence consists of m independent and identically distributed (i.i.d.) samples drawn from a distribution p, whereas each anomalous sequence consists of m i.i.d. samples drawn from a distribution q that is distinct from p. The distributions p and q are assumed to be unknown in advance. Distribution-free tests are constructed by using the maximum mean discrepancy as the metric, which is based on mean embeddings of distributions into a reproducing kernel Hilbert space. The probability of error is bounded as a function of the sample size m, the number s of anomalous sequences, and the number n of sequences. It is shown that with s known, the constructed test is exponentially consistent if m is greater than a constant factor of n, for any p and q, whereas with s unknown, m should have an order strictly greater than n. Furthermore, it is shown that no test can be consistent for arbitrary p and q if m is less than a constant factor of n. Thus, the order-level optimality of the proposed test is established. Numerical results are provided to demonstrate that the proposed tests outperform (or perform as well as) tests based on other competitive approaches under various cases.
AB - A nonparametric anomalous hypothesis testing problem is investigated, in which there are totally n observed sequences out of which s anomalous sequences are to be detected. Each typical sequence consists of m independent and identically distributed (i.i.d.) samples drawn from a distribution p, whereas each anomalous sequence consists of m i.i.d. samples drawn from a distribution q that is distinct from p. The distributions p and q are assumed to be unknown in advance. Distribution-free tests are constructed by using the maximum mean discrepancy as the metric, which is based on mean embeddings of distributions into a reproducing kernel Hilbert space. The probability of error is bounded as a function of the sample size m, the number s of anomalous sequences, and the number n of sequences. It is shown that with s known, the constructed test is exponentially consistent if m is greater than a constant factor of n, for any p and q, whereas with s unknown, m should have an order strictly greater than n. Furthermore, it is shown that no test can be consistent for arbitrary p and q if m is less than a constant factor of n. Thus, the order-level optimality of the proposed test is established. Numerical results are provided to demonstrate that the proposed tests outperform (or perform as well as) tests based on other competitive approaches under various cases.
KW - Anomalous hypothesis testing
KW - consistency
KW - distribution-free tests
KW - maximum mean discrepancy (MMD)
UR - http://www.scopus.com/inward/record.url?scp=85028814239&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85028814239&partnerID=8YFLogxK
U2 - 10.1109/TSP.2017.2733472
DO - 10.1109/TSP.2017.2733472
M3 - Article
AN - SCOPUS:85028814239
SN - 1053-587X
VL - 65
SP - 5785
EP - 5797
JO - IEEE Transactions on Signal Processing
JF - IEEE Transactions on Signal Processing
IS - 21
M1 - 7997789
ER -