Generalization is defined training of generative adversarial network (GAN), and it's shown that generalization is not guaranteed for the popular distances between distributions such as Jensen-Shannon or Wasserstein. In particular, training may appear to be successful and yet the trained distribution may be arbitrarily far from the target distribution in standard metrics. It is shown that generalization does occur for a much weaker metric we call neural net distance. It is also shown that an approximate pure equilibrium exists in the discriminator/generator game for a natural training objective (Wasserstein) when generator capacity and training set sizes are moderate. Finally, the above theoretical ideas suggest a new training protocol, mix+GAN, which can be combined with any existing method, and empirically is found to improves some existing GAN protocols out of the box.