As companies move towards many-core chips, an efficient on-chip communication fabric to connect these cores assumes critical importance. To address limitations to wire delay scalability and increasing bandwidth demands, state-of-the-art on-chip networks use a modular packet-switched design with routers at every hop which allow sharing of network channels over multiple packet flows. This, however, leads to packets going through a complex router pipeline at every hop, resulting in the overall communication energy/delay being dominated by the router overhead, as opposed to just wire energy/delay. In this work, we propose token flow control (TFC), a flow control mechanism in which nodes in the network send out tokens in their local neighborhood to communicate information about their available resources. These tokens are then used in both routing and flow control: to choose less congested paths in the network and to bypass the router pipeline along those paths. These bypass paths are formed dynamically, can be arbitrarily long and, are highly flexible with the ability to match to a packet's exact route. Hence, this allows packets to potentially skip all routers along their path from source to destination, approaching the communication energy-delay-throughput of dedicated wires. Our detailed implementation analysis shows TFC to be highly scalable and realizable at an aggressive target clock cycle delay of 21FO4 for large networks while requiring low hardware complexity. Evaluations of TFC using both synthetic traffic and traces from the SPLASH-2 benchmark suite show reduction in packet latency by up to 77.1% with upto 39.6% reduction in average router energy consumption as compared to a state-of-the-art baseline packet-switched design. For the same saturation throughput as the baseline network, TFC is able to reduce the amount of buffering by 65% leading to a 48.8% reduction in leakage energy and a 55.4% lower total router energy.