We present a systematic methodology for exploring the security processing software architecture for a commercial heterogeneous multiprocessor system-on-chip (SoC) for mobile devices. The SoC contains multiple host processors executing applications and a dedicated programmable security processing engine. We developed an exploration methodology to map the code and data of security software libraries onto the platform, with the objective of maximizing the overall application-visible performance. The salient features of the methodology include (i) the use of real performance measurements from a prototyping board that contains the target platform to drive the exploration, (ii) a new data structure access profiling framework that allows us to accurately model the communication overheads involved in offloading a given set of functions to the security processor, and (iii) an exact branch-and-bound based design space exploration algorithm that determines the best mapping of security library functions and data structures to the host and security processors.We used the proposed framework to map a commercial security library to the target mobile application SoC. The resulting optimized software architecture outperformed several manually-designed software architectures, resulting in upto 12.5X speedup for individual cryptographic operations (encryption, hashing) and 2.2X-6.2X speedup for applications such as a Digital Rights Management (DRM) agent and Secure Sockets Layer (SSL) client. We also demonstrate the applicability of our framework to software architecture exploration in other multiprocessor scenarios.