This paper presents a data-aided channel estimator that reduces the channel estimation error of the conventional linear minimum-mean-squared-error (LMMSE) method for multiple-input multiple-output communication systems. The basic idea is to selectively exploit detected symbol vectors obtained from data detection as additional pilot signals. To optimize the selection of the detected symbol vectors, a Markov decision process (MDP) is defined which finds the best selection to minimize the mean-squared-error (MSE) of the channel estimate. Then a reinforcement learning algorithm is developed to solve this MDP in a computationally efficient manner. Simulation results demonstrate that the presented channel estimator significantly reduces the MSE of the channel estimate and therefore improves the block error rate of the system, compared to the conventional LMMSE method.