## Abstract

We study the complexity of approximate representation and learning of submodular functions over the uniform distribution on the Boolean hypercube {0,1}^{n}. Our main result is the following structural theorem: any submodular function is ε-close in l_{2} to a real-valued decision tree (DT) of depth O (1/ε^{2}). This immediately implies that any submodular function is e-close to a function of at most 2^{O(1/ε2)} variables and has a spectral l_{1} norm of 2^{O(1/ε2)}. It also implies the closest previous result that states that submodular functions can be approximated by polynomials of degree O(1/ε^{2}) (Cheraghchi et al., 2012). Our result is proved by constructing an approximation of a submodular function by a DT of rank 4/ε^{2} and a proof that any rank-r DT can be ε-approximated by a DT of depth 5/2(r + log(1/ε)). We show that these structural results can be exploited to give an attribute-efficient PAC learning algorithm for submodular functions running in time Õ(n^{2}) · 2^{O(1/ε4)}. The best previous algorithm for the problem requires n^{o(1//ε2)} time and examples (Cheraghchi et al., 2012) but works also in the agnostic setting. In addition, we give improved learning algorithms for a number of related settings. We also prove that our PAC and agnostic learning algorithms are essentially optimal via two lower bounds: (1) an information-theoretic lower bound of 2 ^{Ω(1/ε2/3)} on the complexity of learning monotone submodular functions in any reasonable model (including learning with value queries); (2) computational lower bound of n^{Ω(1/ε2/3)} based on a reduction to learning of sparse parities with noise, widely-believed to be intractable. These are the first lower bounds for learning of submodular functions over the uniform distribution.

Original language | English (US) |
---|---|

Pages (from-to) | 711-740 |

Number of pages | 30 |

Journal | Journal of Machine Learning Research |

Volume | 30 |

State | Published - 2013 |

Externally published | Yes |

Event | 26th Conference on Learning Theory, COLT 2013 - Princeton, NJ, United States Duration: Jun 12 2013 → Jun 14 2013 |

## All Science Journal Classification (ASJC) codes

- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence

## Keywords

- Decision tree
- Learning
- Submodular function
- Uniform distribution