Breaking Down Bias: On the Limits of Generalizable Pruning Strategies

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We employ model pruning to examine how LLMs conceptualize racial biases, and whether a generalizable mitigation strategy for such biases appears feasible. Our analysis yields several novel insights. We find that pruning can be an effective method to reduce bias without significantly increasing anomalous model behavior. Neuron-based pruning strategies generally yield better results than approaches pruning entire attention heads. However, our results also show that the effectiveness of either approach quickly deteriorates as pruning strategies become more generalized. For instance, a model that is trained on removing racial biases in the context of financial decision-making poorly generalizes to biases in commercial transactions. Overall, our analysis suggests that racial biases are only partially represented as a general concept within language models. The other part of these biases is highly context-specific, suggesting that generalizable mitigation strategies may be of limited effectiveness. Our findings have important implications for legal frameworks surrounding AI. In particular, they suggest that an effective mitigation strategy should include the allocation of legal responsibility on those that deploy models in a specific use case.

Original languageEnglish (US)
Title of host publicationACMF AccT 2025 - Proceedings of the 2025 ACM Conference on Fairness, Accountability,and Transparency
PublisherAssociation for Computing Machinery, Inc
Pages2437-2450
Number of pages14
ISBN (Electronic)9798400714825
DOIs
StatePublished - Jun 23 2025
Event8th Annual ACM Conference on Fairness, Accountability, and Transparency, FAccT 2025 - Athens, Greece
Duration: Jun 23 2025Jun 26 2025

Publication series

NameACMF AccT 2025 - Proceedings of the 2025 ACM Conference on Fairness, Accountability,and Transparency

Conference

Conference8th Annual ACM Conference on Fairness, Accountability, and Transparency, FAccT 2025
Country/TerritoryGreece
CityAthens
Period6/23/256/26/25

All Science Journal Classification (ASJC) codes

  • General Business, Management and Accounting

Keywords

  • AI Governance
  • Algorithmic Fairness
  • Bias Mitigation
  • Ethical AI
  • Large Language Models
  • Model Pruning
  • Natural Language Processing
  • Neural Networks

Fingerprint

Dive into the research topics of 'Breaking Down Bias: On the Limits of Generalizable Pruning Strategies'. Together they form a unique fingerprint.

Cite this