TY - GEN
T1 - Text Characterization Toolkit (TCT)
AU - Simig, Daniel
AU - Wang, Tianlu
AU - Dankers, Verna
AU - Henderson, Peter
AU - Batsuren, Khuyagbaatar
AU - Hupkes, Dieuwke
AU - Diab, Mona
N1 - Publisher Copyright:
©2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - We present a tool, Text Characterization Toolkit (TCT), that researchers can use to study characteristics of large datasets. Furthermore, such properties can lead to understanding the influence of such attributes on models' behaviour. Traditionally, in most NLP research, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis. Here, we argue that - especially given the well-known fact that benchmarks often contain biases, artefacts, and spurious correlations - deeper results analysis should become the de-facto standard when presenting new models or benchmarks. TCT aims at filling this gap by facilitating such deeper analysis for datasets at scale, where datasets can be for training/development/evaluation. TCT includes both an easy-to-use tool, as well as off-the-shelf scripts that can be used for specific analyses. We also present use-cases from several different domains. TCT is used to predict difficult examples for given well-known trained models; TCT is also used to identify (potentially harmful) biases present in a dataset.
AB - We present a tool, Text Characterization Toolkit (TCT), that researchers can use to study characteristics of large datasets. Furthermore, such properties can lead to understanding the influence of such attributes on models' behaviour. Traditionally, in most NLP research, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis. Here, we argue that - especially given the well-known fact that benchmarks often contain biases, artefacts, and spurious correlations - deeper results analysis should become the de-facto standard when presenting new models or benchmarks. TCT aims at filling this gap by facilitating such deeper analysis for datasets at scale, where datasets can be for training/development/evaluation. TCT includes both an easy-to-use tool, as well as off-the-shelf scripts that can be used for specific analyses. We also present use-cases from several different domains. TCT is used to predict difficult examples for given well-known trained models; TCT is also used to identify (potentially harmful) biases present in a dataset.
UR - https://www.scopus.com/pages/publications/105027202301
UR - https://www.scopus.com/pages/publications/105027202301#tab=citedBy
U2 - 10.18653/v1/2022.aacl-demo.9
DO - 10.18653/v1/2022.aacl-demo.9
M3 - Conference contribution
AN - SCOPUS:105027202301
T3 - Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Long Paper, AACL-IJCNLP 2022
SP - 72
EP - 87
BT - System Demonstrations
A2 - Buntine, Wray
A2 - Liakata, Maria
PB - Association for Computational Linguistics (ACL)
T2 - 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, AACL-IJCNLP 2022
Y2 - 20 November 2022 through 23 November 2022
ER -