Getting Rid of Coherency Overhead for Memory-Hungry Applications
|Research Area:||Distributed Systems||Year:||2010|
|Type of Publication:||In Proceedings||Keywords:||16-node prototype;coherence protocol overhead;coherent domain;memory decoupling;memory hungry application;parallelization level;processing resource;shared memory cluster architecture;cache storage;memory architecture;pattern clustering;program processors;|
|Book title:||Cluster Computing (CLUSTER), 2010 IEEE International Conference on|
Current commercial solutions intended to provide additional resources to an application being executed in a cluster usually aggregate processors and memory from different nodes. In this paper we present a 16-node prototype for a shared-memory cluster architecture that follows a different approach by decoupling the amount of memory available to an application from the processing resources assigned to it. In this way, we provide a new degree of freedom so that the memory granted to a process can be expanded with the memory from other nodes in the cluster without increasing the number of processors used by the program. This feature is especially suitable for memory-hungry applications that demand large amounts of memory but present a parallelization level that prevents them from using more cores than available in a single node. The main advantage of this approach is that an application can use more memory from other nodes without involving the processors, and caches, from those nodes. As a result, using more memory no longer implies increasing the coherence protocol overhead because the number of caches involved in the coherent domain has become independent from the amount of available memory. The prototype we present in this paper leverages this idea by sharing 128GB of memory among the cluster. Real executions show the feasibility of our prototype and its scalability.