TY - JOUR
T1 - A Systolic Design Methodology with Application to Full-Search Block-Matching Architectures
AU - Chen, Yen Kuang
AU - Kung, S. Y.
N1 - Funding Information:
This work was supported in part by Sarnoff Research Center, Mitsubishi Electric, and the George Van Ness Lothrop Honorific Fellowship.
PY - 1998
Y1 - 1998
N2 - We present a systematic methodology to support the design tradeoffs of array processors in several emerging issues, such as (1) high performance and high flexibility, (2) low cost, low power, (3) efficient memory usage, and (4) system-on-a-chip or the ease of system integration. This methodology is algebraic based, so it can cope with high-dimensional data dependence. The methodology consists of some transformation rules of data dependency graphs for facilitating flexible array designs. For example, two common partitioning approaches, LPGS and LSGP, could be unified under the methodology. It supports the design of high-speed and massively parallel processor arrays with efficient memory usage. More specifically, it leads to a novel systolic cache architecture comprising of shift registers only (cache without tags). To demonstrate how the methodology works, we have presented several systolic design examples based on the block-matching motion estimation algorithm (BMA). By multiprojecting a 4D DG of the BMA to 2D mesh, we can reconstruct several existing array processors. By multiprojecting a 6D DG of the BMA, a novel 2D systolic array can be derived that features significantly improved rates in data reusability (96%) and processor utilization (99%).
AB - We present a systematic methodology to support the design tradeoffs of array processors in several emerging issues, such as (1) high performance and high flexibility, (2) low cost, low power, (3) efficient memory usage, and (4) system-on-a-chip or the ease of system integration. This methodology is algebraic based, so it can cope with high-dimensional data dependence. The methodology consists of some transformation rules of data dependency graphs for facilitating flexible array designs. For example, two common partitioning approaches, LPGS and LSGP, could be unified under the methodology. It supports the design of high-speed and massively parallel processor arrays with efficient memory usage. More specifically, it leads to a novel systolic cache architecture comprising of shift registers only (cache without tags). To demonstrate how the methodology works, we have presented several systolic design examples based on the block-matching motion estimation algorithm (BMA). By multiprojecting a 4D DG of the BMA to 2D mesh, we can reconstruct several existing array processors. By multiprojecting a 6D DG of the BMA, a novel 2D systolic array can be derived that features significantly improved rates in data reusability (96%) and processor utilization (99%).
UR - http://www.scopus.com/inward/record.url?scp=0032064852&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0032064852&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:0032064852
SN - 1387-5485
VL - 19
SP - 51
EP - 77
JO - Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
JF - Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology
IS - 1
ER -