Model evaluation plays a special role in interactive machine learning (IML) systems in which users rely on their assessment of a model's performance in order to determine how to improve it. A better understanding of what model criteria are important to users can therefore inform the design of user interfaces for model evaluation as well as the choice and design of learning algorithms. We present work studying the evaluation practices of end users interactively building supervised learning systems for real-world gesture analysis problems. We examine users' model evaluation criteria, which span conventionally relevant criteria such as accuracy and cost, as well as novel criteria such as unexpectedness. We observed that users employed evaluation techniques - including cross-validation and direct, real-time evaluation - not only to make relevant judgments of algorithms' performance and interactively improve the trained models, but also to learn to provide more effective training data. Furthermore, we observed that evaluation taught users about what types of models were easy or possible to build, and users sometimes used this information to modify the learning problem definition or their plans for using the trained models in practice. We discuss the implications of these findings with regard to the role of generalization accuracy in IML, the design of new algorithms and interfaces, and the scope of potential benefits of incorporating human interaction in the design of supervised learning systems.