Conventional buffer sizing techniques consider an output port with multiple queues in isolation and provide guidelines for the size of the queue. In practice, however, switches consist of several ports that share a buffering chip. Hence, chip manufacturers, such as Broadcom, are left to devise a set of proprietary resource sharing algorithms to allocate buffers across ports. This algorithm dynamically adjusts the buffer size for output queues and directly impacts the packet loss and latency of individual queues. We show that the problem of allocating buffers across ports, although less known, is indeed responsible for fundamental inefficiencies in today's devices. In particular, the per-port buffer allocation is an ad-hoc decision that (at best) depends on the remaining buffer cells on the chip instead of the type of traffic. In this work, we advocate for a flow-aware and device-wide buffer sharing scheme (FAB), which is practical today in programmable devices. We tested FAB on two specific workloads and showed that it can improve the tail flow completion time by an order of magnitude compared to conventional buffer management techniques.