Full text loading...
-
An Algorithm for the Automated Generation of MPI Communication Patterns
- Publisher: European Association of Geoscientists & Engineers
- Source: Conference Proceedings, Second EAGE Workshop on High Performance Computing for Upstream, Sep 2015, Volume 2015, p.1 - 4
Abstract
Large scale partial differential equation (PDE) solvers use some form of message passing to handle communications between compute nodes ( Gropp, Hoefler, Thakur, & Lusk, 2014 ). Message passing can be explicitly handled by the application developer or implicitly by the programming language. The prime example of explicit message passing is the Message Passing Interface (MPI) while Chapel ( Chamberlain, 2007 ) and UPC ( Draper, 1999 ) are examples of the PGAS programming model which make the communications implicit.
It could be argued that the best performing parallel applications are the ones using carefully crafted explicit message passing. The principal reason resides in the message passing implementation being as efficient as possible for a very specific problem. The flexibility is however lost if some changes are required in the fundamentals of the algorithm. The converse is true for implicit message passing: higher flexibility but penultimate performance unachievable ( Cristian Coarfa, 2005 ).
For a developer, both approaches solve a different problem. In applications which are using MPI, usually a handful of calls to the message passing API calls are present in the whole application. Indeed, most of the grunt work resides in finding how the messages are transacted between processes, setting buffers and ways to fill or empty them.
In this work, I propose a library to help the PDE application developer to perform those low level tasks in an automated fashion. The library is also useful for refactoring existing PDE codes for use on supercomputers. By pairing a spatial hashing function with minimal geometrical knowledge extracted from the application, the communication pattern is discovered in P log (N/P) operations where N is the global number of hashes and P the number of processes. This pattern is then used to create the actual buffers the developer needs and handles all blocking, non-blocking and one sided communications.