In this paper, we propose a learning-based detection method for multiple-input multiple-output (MIMO) communications with hardware impairments. In the proposed method, we approximate the conditional distribution of a received signal distorted by the hardware impairments, by generalizing a conventional additive distortion model. We then present a low-overhead strategy for generating training data to learn the approximate conditional distribution. Our strategy only requires traditional pilot signals for channel estimation, but leads to noisy training data containing incorrect labels. To accurately learn the approximate distribution from noisy training data, we develop an expectation maximization algorithm that estimates not only the parameters of the distribution but also transition probabilities from noisy labels to true labels. The maximum likelihood detection is finally performed based on the learned distribution. Using simulations, we demonstrate that the proposed detection method outperforms existing detection methods under both additive and realistic distortion models.