Value-based clock gating and operation packing: Dynamic strategies to improving processor power and performance

Research output: Contribution to journalArticlepeer-review

55 Scopus citations


The large address space needs of many current applications have pushed processor designs toward 64-bit word widths. Although full 64-bit addresses and operations are indeed sometimes needed, arithmetic operations on much smaller quantities are still more common. In fact, another instruction set trend has been the introduction of instructions geared toward subword operations on 16-bit quantities. For example, most major processors now include instruction set support for multimedia operations allowing parallel execution of several subword operations in the same ALU. This article presents our observations demonstrating that operations on "narrow-width" quantities are common not only in multimedia codes, but also in more general workloads. In fact, across the SPECint95 benchmarks, over half the integer operation executions require 16 bits or less. Based on this data, we propose two hardware mechanisms that dynamically recognize and capitalize on these narrow-width operations. The first, power-oriented optimization reduces processor power consumption by using operand-value-based clock gating to turn off portions of arithmetic units that will be unused by narrow-width operations. This optimization results in a 45%-60% reduction in the integer unit's power consumption for the SPECint95 and MediaBench benchmark suites. Applying this optimization to SPECfp95 benchmarks results in slightly smaller power reductions, but still seems warranted. These reductions in integer unit power consumption equate to a 5%-10% full-chip power savings. Our second, performance-oriented optimization improves processor performance by packing together narrow-width operations so that they share a single arithmetic unit. Conceptually similar to a dynamic form of MMX, this optimization offers speedups of 4.3%-6.2% for SPECint95 and 8.0%-10.4% for MediaBench. Overall, these optimizations highlight an increasing opportunity for value-based optimizations to improve both power and performance in current microprocessors.

Original languageEnglish (US)
Pages (from-to)89-126
Number of pages38
JournalACM Transactions on Computer Systems
Issue number2
StatePublished - May 2000

All Science Journal Classification (ASJC) codes

  • General Computer Science


  • Arithmetic and Logic Structures
  • B.2 [Hardware]
  • C.1.1 [Processor Architectures]
  • Design
  • Experimentation
  • Performance
  • Single Data Stream Architectures - RISC / CISC, VLIW architectures


Dive into the research topics of 'Value-based clock gating and operation packing: Dynamic strategies to improving processor power and performance'. Together they form a unique fingerprint.

Cite this