This paper presents a novel multi-user precoding strategy for frequency-division duplexing massive multiple-input multiple-output downlink systems with rate-limited feedback. Inspired by a multi-armed bandit framework, our approach is to adaptively learn the best precoding action that provides the highest sum-throughput without explicit channel state information feedback. In particular, we present an online learning algorithm to find the best optimal precoding action in a timely manner, called fast upper confidence bound (Fast-UCB) precoding. The key idea is to use a fast-exploration and exploitation with pruning strategies to speed up learning rates in identifying the optimal precoding action. From simulations, we show that the proposed algorithm significantly outperforms the existing online learning algorithms, including the conventional UCB method, in the cumulative regret. In addition, we demonstrate that the Fast-UCB method achieves a higher net sum-throughput than greedy action selection with full-exploration under a short channel coherence time environment, even with much less feedback.