Visual patterns usually appear at different scales/sizes in natural images. Multi-scale feature representation is of great importance for the single-image super-resolution (SISR) task to reconstruct image objects at different scales. However, such characteristic has been rarely considered by CNN-based SISR methods. In this work, we propose a novel building block, i.e. hierarchically aggregated residual transformation (HART), to achieve multi-scale feature representation in each layer of the network. Within each HART block, we connect multiple convolutions in a hierarchical residual-like manner, which efficiently provides a wide range of effective receptive fields at a more granular level to detect both local and global image features. To theoretically understand the proposed HART block, we recast SISR as an optimal control problem and show that HART effectively approximates the classical 4th-order Runge-Kutta method, which has the merit of small local truncation error for solving numerical ordinary differential equation. By cascading the proposed HART blocks, we establish our high-performing HARTnet. Through extensive experiments on various benchmark datasets under different degradation models, we demonstrate that HARTnet compares favourably against existing state-of-the-art methods (including those in the NTIRE 2019 SR Challenge leaderboard) in terms of both quantitative metrics and visual quality. Moreover, the same HARTnet architecture achieves promising performance on such other image restoration tasks as image denoising and low-light image enhancement.