Probabilistic clustering of high dimensional norms

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Separating decompositions of metric spaces are an important randomized clustering paradigm that was formulated by Bartal in [Bar96] and is defined as follows. Given a metric space (X; dX), its modulus of separated decomposability, denoted SEP(X; dX), is the in mum over those 2 (0;1] such that for every finite subset S X and every there exists a distribution over random partitions P of S into sets of diameter at most such that for every x; y 2 S the probability that both x and y do not fall into the same cluster of the random partition P is at most dX(x; y) Here we obtain new bounds on SEP(X; kkX) when (X; k kX) is a finite dimensional normed space, yielding, as a special case, that n minfp; log ng for every p 2 [2;1]. This improves over the work [CCG+98] of Charikar, Chekuri, Goel, Guha, and Plotkin, who obtained this bound when p = 2, yet for p 2 (2;1] they obtained the asymptotically weaker estimate SEP(n p ) . n11=p. One should note that it was claimed in [CCG+98] that the bound SEP(n p ) . n11=p is sharp for every p 2 [2;1], and in particular it was claimed in [CCG+98] that SEP(n 1) n. However, the above results show that this claim of [CCG+98] is incorrect for every p 2 (2;1]. Our new bounds on the modulus of separated decomposability rely on extremal results for orthogonal hyperplane projections of convex bodies, specifically using the work [BN02] of Barthe and the author. This yields additional refined estimates, an example of which is that for every n 2 N and k 2 f1; : : : ; ng we have SEP 2 )6k denotes the subset of Rn consisting of all those vectors that have at most k nonzero entries, equipped with the Euclidean metric. The above statements have implications to the Lipschitz extension problem through its connection to random partitions that was developed by Lee and the author in [LN04, LN05]. Given a metric space (X; dX), let e(X) denote the in mum over those K 2 (0;1] such that for every Banach space Y and every subset S X, every 1-Lipschitz function f : S ! Y has a K-Lipschitz extension to all of X. Johnson, Lindenstrauss and Schechtman proved in [JLS86] that e(X) . dim(X) for every finite dimensional normed space (X; k kX). It is a longstanding open problem to determine the correct asymptotic dependence on dim(X) in this context, with the best known lower bound, due to Johnson and Lindenstrauss [JL84], being that the quantity e(X) must sometimes be at least a constant multiple of p dim(X). In particular, the previously best known upper bound on e(n was the O(n) estimate of [JLS86]. It is shown here that for every n 2 N we have n log n, thus answering (up to logarithmic factors) a question that was posed by Brudnyi and Brudnyi in [BB05, Problem 2]. More generally, n minfp; log ng for every p 2 [2;1], thus resolving (negatively) a conjecture of Brudnyi and Brudnyi in [BB05, Conjecture 5].

Original languageEnglish (US)
Title of host publication28th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017
EditorsPhilip N. Klein
PublisherAssociation for Computing Machinery
Pages690-709
Number of pages20
ISBN (Electronic)9781611974782
DOIs
StatePublished - 2017
Event28th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017 - Barcelona, Spain
Duration: Jan 16 2017Jan 19 2017

Publication series

NameProceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
Volume0

Other

Other28th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017
Country/TerritorySpain
CityBarcelona
Period1/16/171/19/17

All Science Journal Classification (ASJC) codes

  • Software
  • General Mathematics

Fingerprint

Dive into the research topics of 'Probabilistic clustering of high dimensional norms'. Together they form a unique fingerprint.

Cite this