Abstract
The performance of five parallel tree building methods in the context of a complete galaxy simulation on four very different platforms that support the coherent shared address space programming model is investigated. A proposed algorithm that uses a separate spatial partitioning of the domain for the tree building phase and eliminates locking at a significant cost in locality and load balance is found to be the best by far. By changing the tree building algorithm, improvements of more than factors of 4-40 on commodity-based systems are achieved in overall application performance even on only 16 processors. This allows commodity shared memory platforms to perform well for hierarchical N-body applications for the first time.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the International Parallel Processing Symposium, IPPS |
Publisher | IEEE Comp Soc |
Pages | 475-484 |
Number of pages | 10 |
ISBN (Print) | 0818684046 |
DOIs | |
State | Published - 1998 |
Event | Proceedings of the 1998 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing - Orlando, FL, USA Duration: Mar 30 1998 → Apr 3 1998 |
Conference
Conference | Proceedings of the 1998 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing |
---|---|
City | Orlando, FL, USA |
Period | 3/30/98 → 4/3/98 |
All Science Journal Classification (ASJC) codes
- Hardware and Architecture