TY - GEN
T1 - Human model evaluation in interactive supervised learning
AU - Fiebrink, Rebecca
AU - Cook, Perry R.
AU - Trueman, Daniel
PY - 2011
Y1 - 2011
N2 - Model evaluation plays a special role in interactive machine learning (IML) systems in which users rely on their assessment of a model's performance in order to determine how to improve it. A better understanding of what model criteria are important to users can therefore inform the design of user interfaces for model evaluation as well as the choice and design of learning algorithms. We present work studying the evaluation practices of end users interactively building supervised learning systems for real-world gesture analysis problems. We examine users' model evaluation criteria, which span conventionally relevant criteria such as accuracy and cost, as well as novel criteria such as unexpectedness. We observed that users employed evaluation techniques - including cross-validation and direct, real-time evaluation - not only to make relevant judgments of algorithms' performance and interactively improve the trained models, but also to learn to provide more effective training data. Furthermore, we observed that evaluation taught users about what types of models were easy or possible to build, and users sometimes used this information to modify the learning problem definition or their plans for using the trained models in practice. We discuss the implications of these findings with regard to the role of generalization accuracy in IML, the design of new algorithms and interfaces, and the scope of potential benefits of incorporating human interaction in the design of supervised learning systems.
AB - Model evaluation plays a special role in interactive machine learning (IML) systems in which users rely on their assessment of a model's performance in order to determine how to improve it. A better understanding of what model criteria are important to users can therefore inform the design of user interfaces for model evaluation as well as the choice and design of learning algorithms. We present work studying the evaluation practices of end users interactively building supervised learning systems for real-world gesture analysis problems. We examine users' model evaluation criteria, which span conventionally relevant criteria such as accuracy and cost, as well as novel criteria such as unexpectedness. We observed that users employed evaluation techniques - including cross-validation and direct, real-time evaluation - not only to make relevant judgments of algorithms' performance and interactively improve the trained models, but also to learn to provide more effective training data. Furthermore, we observed that evaluation taught users about what types of models were easy or possible to build, and users sometimes used this information to modify the learning problem definition or their plans for using the trained models in practice. We discuss the implications of these findings with regard to the role of generalization accuracy in IML, the design of new algorithms and interfaces, and the scope of potential benefits of incorporating human interaction in the design of supervised learning systems.
KW - Evaluation
KW - Gesture
KW - Interactive machine learning
KW - Music
UR - http://www.scopus.com/inward/record.url?scp=79958111278&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79958111278&partnerID=8YFLogxK
U2 - 10.1145/1978942.1978965
DO - 10.1145/1978942.1978965
M3 - Conference contribution
AN - SCOPUS:79958111278
SN - 9781450302289
T3 - Conference on Human Factors in Computing Systems - Proceedings
SP - 147
EP - 156
BT - CHI 2011 - 29th Annual CHI Conference on Human Factors in Computing Systems, Conference Proceedings and Extended Abstracts
PB - Association for Computing Machinery
ER -