TY - JOUR
T1 - Evaluation of the raw microprocessor
T2 - Proceedings -31st Annual International Symposium on Computer Architecture
AU - Taylor, Michael Bedford
AU - Lee, Walter
AU - Miller, Jason
AU - Wentzlaff, David
AU - Bratt, Ian
AU - Greenwald, Ben
AU - Hoffmann, Henry
AU - Johnson, Paul
AU - Kim, Jason
AU - Psota, James
AU - Saraf, Arvind
AU - Shnidman, Nathan
AU - Strumpen, Volker
AU - Frank, Matt
AU - Amarasinghe, Saman
AU - Agarwal, Anant
PY - 2004
Y1 - 2004
N2 - This paper evaluates the Raw microprocessor. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications than existing microprocessors, while still running existing ILP-based sequential programs with reasonable performance in the face of increasing wire delays. Raw approaches this challenge by implementing plenty of on-chip resources - including logic, wires, and pins - in a tiled arrangement, and exposing them through a new ISA, so that the software can take advantage of these resources for parallel applications. Raw supports both ILP and streams by routing operands between architecturally- exposed functional units over a point-to-point scalar operand network. This network offers low latency for scalar data transport. Raw manages the effect of wire delays by exposing the interconnect and using software to orchestrate both scalar and stream data transport. We have implemented a prototype Raw microprocessor in IBM's 180 nm, 6-layer copper, CMOS 7SF standard-cell ASIC process. We have also implemented ILP and stream compilers. Our evaluation attempts to determine the extent to which Raw succeeds in meeting its goal of serving as a more versatile, general-purpose processor. Central to achieving this goal is Raw's ability to exploit all forms of parallelism, including ILP, DIP, TLP, and Stream parallelism. Specifically, we evaluate the performance of Raw on a diverse set of codes including traditional sequential programs, streaming applications, server workloads and bit-level embedded computation. Our experimental methodology makes use of a cycle-accurate simulator validated against our real hardware. Compared to a 180 nm Pentium-III, using commodity PC memory system components. Raw performs within a factor of 2x for sequential applications with a very low degree of ILP, about 2x to 9x better for higher levels of ILP, and 10x-100x better when highly parallel applications are coded in a stream language or optimized by hand. The paper also proposes a new versatility metric and uses it to discuss the generality of Raw.
AB - This paper evaluates the Raw microprocessor. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications than existing microprocessors, while still running existing ILP-based sequential programs with reasonable performance in the face of increasing wire delays. Raw approaches this challenge by implementing plenty of on-chip resources - including logic, wires, and pins - in a tiled arrangement, and exposing them through a new ISA, so that the software can take advantage of these resources for parallel applications. Raw supports both ILP and streams by routing operands between architecturally- exposed functional units over a point-to-point scalar operand network. This network offers low latency for scalar data transport. Raw manages the effect of wire delays by exposing the interconnect and using software to orchestrate both scalar and stream data transport. We have implemented a prototype Raw microprocessor in IBM's 180 nm, 6-layer copper, CMOS 7SF standard-cell ASIC process. We have also implemented ILP and stream compilers. Our evaluation attempts to determine the extent to which Raw succeeds in meeting its goal of serving as a more versatile, general-purpose processor. Central to achieving this goal is Raw's ability to exploit all forms of parallelism, including ILP, DIP, TLP, and Stream parallelism. Specifically, we evaluate the performance of Raw on a diverse set of codes including traditional sequential programs, streaming applications, server workloads and bit-level embedded computation. Our experimental methodology makes use of a cycle-accurate simulator validated against our real hardware. Compared to a 180 nm Pentium-III, using commodity PC memory system components. Raw performs within a factor of 2x for sequential applications with a very low degree of ILP, about 2x to 9x better for higher levels of ILP, and 10x-100x better when highly parallel applications are coded in a stream language or optimized by hand. The paper also proposes a new versatility metric and uses it to discuss the generality of Raw.
UR - http://www.scopus.com/inward/record.url?scp=4644353790&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=4644353790&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:4644353790
SN - 1063-6897
VL - 31
SP - 2
EP - 13
JO - Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA
JF - Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA
Y2 - 19 June 2004 through 23 June 2004
ER -