Content caching at the small-cell base stations (sBSs) in a heterogeneous wireless network is considered. A cost function is proposed that captures the backhaul link load called the offloading loss, which measures the fraction of the requested files that are not available in the sBS caches. Previous approaches minimize this offloading loss assuming that the popularity profile of the content is time-invariant and perfectly known. However, in many practical applications, the popularity profile is unknown and time-varying. Therefore, the analysis of caching with non-stationary and statistically dependent popularity profiles (assumed unknown, and hence, estimated) is studied in this paper from a learning-theoretic perspective. A probably approximately correct (PAC) result is derived, in which a high probability bound on the offloading loss difference, i.e., the error between the estimated (outdated) and the optimal offloading loss, is investigated. The difference is a function of the Rademacher complexity of the set of all probability measures on the set of cached content items, the β-mixing coefficient, 1/√t (t is the number of time slots), and a measure of discrepancy between the estimated and true popularity profiles.