We present a technique for synthesizing power- As well as area-optimized circuits from hierarchical data flow graphs under throughput constraints. We allow for the use of complex RTL modules, such as FFTs and filters, as building blocks for the RTL circuit, in addition to simple RTL modules such as adders and multipliers. Unlike past techniques in the area, we also customize the complex RTL modules to match the environment in which they find themselves. We present a fast and efficient algorithm for mapping multiple behaviors onto the same RTL module during the course of synthesis, thus allowing our synthesis system to explore previously unexplored regions of the design space. These techniques are at the core of an iterative improvement based approach which can accept temporary degradation in solution quality in its quest for a globally optimal solution. The moves in our iterative improvement procedure explore optimizations along different dimensions such as functional unit selection, resource allocation, resource sharing, resource splitting, and selection and resynthesis of complex RTL modules. These inter-related optimizations are dynamically traded off with each other during the course of synthesis, thus exploiting the benefits that arise from their interaction. The synthesis framework also tackles other related high-level synthesis tasks such as scheduling, clock selection, and Vdd selection. Experimental results demonstrate that our algorithm produces circuits whose area and power consumption are comparable to or better than those produced using flattened synthesis, within much shorter CPU times. The efficacy of our algorithm in the power-optimization mode is illustrated by the fact that it produces circuits that consume up to 6.7 times less power than area-optimized circuits working at 5 Volts at area overheads not exceeding 50%.