TY - JOUR

T1 - Linear boolean classification, coding and the critical problem

AU - Abbe, Emmanuel Auguste

AU - Alon, Noga

AU - Bandeira, Afonso S.

AU - Sandon, Colin

N1 - Funding Information:
E. Abbe was supported in part by the Army Research Office under Grant W911NF-16-1-0051 and in part by NSF within the Directorate for Biological Sciences under Grant CIF-1706648. N. Alon was supported in part by the Israel Science Foundation, in part by a USAIsraeli BSF Grant, in part by the Oswald Veblen Fund for Mathematics, and in part by the Israeli I-Core Program. A. S. Bandeira was supported by the Air Force Office of Scientific Research under Grant FA9550-12-1-0317. This paper was presented at the ISIT 2014 [1].
Publisher Copyright:
© 1963-2012 IEEE.

PY - 2016/4

Y1 - 2016/4

N2 - This paper considers the problem of linear Boolean classification, where the goal is to determine in which set, among two given sets of Boolean vectors, an unknown vector belongs to by making linear queries. Finding the least number of queries is equivalent to determining the minimal rank of a matrix over GF(2) , whose kernel does not intersect a given set S. In the case where $S$ is a Hamming ball, this reduces to finding linear codes of largest dimension. For a general set $S$ , this is an instance of the critical problem posed by Crapo and Rota in 1970, open in general. This paper focuses on the case where $S$ is an annulus. As opposed to balls, it is shown that an optimal kernel is composed not only of dense but also of sparse vectors, and the optimal mixture is identified in various cases. These findings corroborate a proposed conjecture that for an annulus of inner and outer radius $nq$ and $np$ respectively, the optimal relative rank is given by the normalized entropy $(1-q)H(p/(1-q))$ , an extension of the Gilbert-Varshamov bound.

AB - This paper considers the problem of linear Boolean classification, where the goal is to determine in which set, among two given sets of Boolean vectors, an unknown vector belongs to by making linear queries. Finding the least number of queries is equivalent to determining the minimal rank of a matrix over GF(2) , whose kernel does not intersect a given set S. In the case where $S$ is a Hamming ball, this reduces to finding linear codes of largest dimension. For a general set $S$ , this is an instance of the critical problem posed by Crapo and Rota in 1970, open in general. This paper focuses on the case where $S$ is an annulus. As opposed to balls, it is shown that an optimal kernel is composed not only of dense but also of sparse vectors, and the optimal mixture is identified in various cases. These findings corroborate a proposed conjecture that for an annulus of inner and outer radius $nq$ and $np$ respectively, the optimal relative rank is given by the normalized entropy $(1-q)H(p/(1-q))$ , an extension of the Gilbert-Varshamov bound.

KW - Linear codes

KW - classification

KW - combinatorics

KW - learning theory

UR - http://www.scopus.com/inward/record.url?scp=84963808129&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963808129&partnerID=8YFLogxK

U2 - 10.1109/TIT.2016.2531085

DO - 10.1109/TIT.2016.2531085

M3 - Article

AN - SCOPUS:84963808129

VL - 62

SP - 1667

EP - 1673

JO - IRE Professional Group on Information Theory

JF - IRE Professional Group on Information Theory

SN - 0018-9448

IS - 4

M1 - 7410095

ER -