IP multicast would reduce significantly both network and server overhead for many datacenter applications' communication. Unfortunately, traditional protocols for managing IP multicast, designed for arbitrary network topologies, do not scale with aggregate hardware resources in the number of supported multicast groups. Prior attempts to scale multicast in general settings are all bottlenecked by the forwarding table capacity of a single switch. This paper shows how to leverage the unique topological structure of modern datacenter networks in order to build the first scale-out multicast architecture. In our architecture, a network controller carefully partitions the multicast address space and assigns the partitions across switches in datacenters' multi-rooted tree networks. Our approach further improves scalability by locally aggregating multicast addresses at bottleneck switches that are running out of forwarding table space, at the cost of slightly inflating downstream traffic. We evaluate the system's scalability, traffic overhead, and fault tolerance through a mix of simulation and analysis. For example, experiments show that a datacenter with 27,648 servers and commodity switches with 1000-entry multicast tables can support up to 100,000 multicast groups, allowing each server to subscribe to nearly 200 multicast groups concurrently.