A common approach for solving clustering problems is to design algorithms to approximately optimize various objective functions (e.g., k-means or min-sum) defined in terms of some given pair-wise distance or similarity information. However, in many learning motivated clustering applications (such as clustering proteins by function) there is some unknown target clustering; in such cases the pairwise information is merely based on heuristics and the real goal is to achieve low error on the data. In these settings, an arbitrary c-approximation algorithm for some objective would work well only if any c-approximation to that objective is close to the target clustering. In recent work, Balcan et. al  have shown how both for the k-means and k-median objectives this property allows one to produce clusterings of low error, even for values c such that getting a c-approximation to these objective functions is provably NP-hard. In this paper we analyze the min-sum objective from this perspective. While  also considered the min-sum problem, the results they derived for this objective were substantially weaker. In this work we derive new and more subtle structural properties for min-sum in this context and use these to design efficient algorithms for producing accurate clusterings, both in the transductive and in the inductive case. We also analyze the correlation clustering problem from this perspective, and point out interesting differences between this objective and k-median, k-means, or min-sum objectives.
|Original language||English (US)|
|State||Published - Dec 1 2009|
|Event||22nd Conference on Learning Theory, COLT 2009 - Montreal, QC, Canada|
Duration: Jun 18 2009 → Jun 21 2009
|Other||22nd Conference on Learning Theory, COLT 2009|
|Period||6/18/09 → 6/21/09|
All Science Journal Classification (ASJC) codes