With more applications moving to the cloud, cloud providers need to diagnose performance problems in a timely manner. Offline processing of logs is slow and inefficient, and instrumenting the end-host network stack would violate the tenants' rights to manage their own virtual machines (VMs). Instead, our system Dapper analyzes TCP performance in real time near the end-hosts (e.g., at the hypervisor, NIC, or top-of-rack switch). Dapper determines whether a connection is limited by the sender (e.g., a slow server competing for shared resources), the network (e.g., congestion), or the receiver (e.g., small receive buffer). Emerging edge devices now offer flexible packet processing at high speed on commodity hardware, making it possible to monitor TCP performance in the data plane, at line rate. We use P4 to prototype Dapper and evaluate our design on real and synthetic traffic. To reduce the data-plane state requirements, we perform lightweight detection for all connections, followed by heavier-weight diagnosis just for the troubled connections.