TY - GEN
T1 - Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
AU - Rothberg, Edward
AU - Singh, Jaswinder Pal
AU - Gupta, Anoop
PY - 1993
Y1 - 1993
N2 - The distribution of resources among processors, memory and caches is a crucial question faced by designers of large-scale parallel machines. If a machine is to solve problems with a certain data set size, should it be built with a large number of processors each with a small amount of memory, or a smaller number of processors each with a large amount of memory? How much cache memory should be provided per processor for cost-effectiveness? And how do these decisions change as larger problems are run on larger machines? In this paper, we explore the above questions based on the characteristics of five important classes of large-scale parallel scientific applications. We first show that all the applications have a hierarchy of well-defined per-processor working sets, whose size, performance impact and scaling characteristics can help determine how large different levels of a multiprocessor's cache hierarchy should be. Then, we use these working sets together with certain other important characteristics of the applications - such as communication to computation ratios, concurrency, and load balancing behavior - to reflect upon the broader question of the granularity of processing nodes in high-performance multiprocessors. We find that very small caches whose sizes do not increase with the problem or machine size are adequate for all but two of the application classes. Even in the two exceptions, the working sets scale quite slowly with problem size, and the cache sizes needed for problems that will be run in the foreseeable future are small. We also find that relatively fine-grained machines, with large numbers of processors and quite small amounts of memory per processor, are appropriate for all the applications.
AB - The distribution of resources among processors, memory and caches is a crucial question faced by designers of large-scale parallel machines. If a machine is to solve problems with a certain data set size, should it be built with a large number of processors each with a small amount of memory, or a smaller number of processors each with a large amount of memory? How much cache memory should be provided per processor for cost-effectiveness? And how do these decisions change as larger problems are run on larger machines? In this paper, we explore the above questions based on the characteristics of five important classes of large-scale parallel scientific applications. We first show that all the applications have a hierarchy of well-defined per-processor working sets, whose size, performance impact and scaling characteristics can help determine how large different levels of a multiprocessor's cache hierarchy should be. Then, we use these working sets together with certain other important characteristics of the applications - such as communication to computation ratios, concurrency, and load balancing behavior - to reflect upon the broader question of the granularity of processing nodes in high-performance multiprocessors. We find that very small caches whose sizes do not increase with the problem or machine size are adequate for all but two of the application classes. Even in the two exceptions, the working sets scale quite slowly with problem size, and the cache sizes needed for problems that will be run in the foreseeable future are small. We also find that relatively fine-grained machines, with large numbers of processors and quite small amounts of memory per processor, are appropriate for all the applications.
UR - http://www.scopus.com/inward/record.url?scp=0027229779&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0027229779&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:0027229779
SN - 0818638109
T3 - Conference Proceedings - Annual Symposium on Computer Architecture
SP - 14
EP - 25
BT - Conference Proceedings - Annual Symposium on Computer Architecture
PB - Publ by IEEE
T2 - Proceedings of the 20th Annual International Symposium on Computer Architecture
Y2 - 16 May 1993 through 19 May 1993
ER -