In recent years, many shape representations and geometric algorithms have been proposed for matching 3D shapes. Usually, each algorithm is tested on a different (small) database of 3D models, and thus no direct comparison is available for competing methods. In this paper, we describe the Princeton Shape Benchmark (PSB), a publicly available database of polygonal models collected from the World Wide Web and a suite of tools for comparing shape matching and classification algorithms. One feature of the benchmark is that it provides multiple semantic labels for each 3D model. For instance, it includes one classification of the 3D models based on function, another that considers function and form, and others based on how the object was constructed (e.g., man-made versus natural objects). We find that experiments with these classifications can expose different properties of shape-based retrieval algorithms. For example, out of 12 shape descriptors tested, Extended Gaussian Images  performed best for distinguishing man-made from natural objects, while they performed among the worst for distinguishing specific object types. Based on experiments with several different shape descriptors, we conclude that no single descriptor is best for all classifications, and thus the main contribution of this paper is to provide a framework to determine the conditions under which each descriptor performs best.