Finding low error clusterings

Maria Florina Balcan, Mark Braverman

Research output: Contribution to conferencePaperpeer-review

19 Scopus citations


A common approach for solving clustering problems is to design algorithms to approximately optimize various objective functions (e.g., k-means or min-sum) defined in terms of some given pair-wise distance or similarity information. However, in many learning motivated clustering applications (such as clustering proteins by function) there is some unknown target clustering; in such cases the pairwise information is merely based on heuristics and the real goal is to achieve low error on the data. In these settings, an arbitrary c-approximation algorithm for some objective would work well only if any c-approximation to that objective is close to the target clustering. In recent work, Balcan et. al [7] have shown how both for the k-means and k-median objectives this property allows one to produce clusterings of low error, even for values c such that getting a c-approximation to these objective functions is provably NP-hard. In this paper we analyze the min-sum objective from this perspective. While [7] also considered the min-sum problem, the results they derived for this objective were substantially weaker. In this work we derive new and more subtle structural properties for min-sum in this context and use these to design efficient algorithms for producing accurate clusterings, both in the transductive and in the inductive case. We also analyze the correlation clustering problem from this perspective, and point out interesting differences between this objective and k-median, k-means, or min-sum objectives.

Original languageEnglish (US)
StatePublished - 2009
Externally publishedYes
Event22nd Conference on Learning Theory, COLT 2009 - Montreal, QC, Canada
Duration: Jun 18 2009Jun 21 2009


Other22nd Conference on Learning Theory, COLT 2009
CityMontreal, QC

All Science Journal Classification (ASJC) codes

  • Education


Dive into the research topics of 'Finding low error clusterings'. Together they form a unique fingerprint.

Cite this