We give a detailed description on the test platforms used in this thesis.
The first platform denoted as Intel is based on a four socket system with each socket equipped with an Intel Xeon CPU (E7-4850@2.00GHz) with 10 cores per CPU and each core twice hyper threaded, resulting in 20 hyper threads.
Cache level | Size | Sharing information |
L1 | 32kB | exclusive |
L2 | 256kB | exclusive |
L3 | 24MB | shared |
Each CPU has its own memory controller assigned with 64GB of memory available via each controller. With 4 CPUs, this leads to 256 GB of main memory.
We refer to the second platform as AMD and like to thank the Institute for Multiscale Simulation, Friedrich-Alexander Universität Erlangen-Nürnberg, for giving us access to their AMD cluster. This is based on 4 AMD Opteron(TM) Processors 6276, each one with 16 cores. On these CPUs, 2 cores share one FPU. The 16 cores are further separated in 2 modules, each module with its own last level cache and NUMA domain. The cache hierarchy then looks as follows
Cache level | Size | Sharing information |
L1 | 16kB | exclusive |
L2 | 2MB | shared by 2 cores |
L3 | 6MB | shared by 8 cores |
Each NUMA domain has 16GB of memory attached to each memory controller. This leads to 128GB available main memory.
This cluster is based on 28 nodes, each one with a dual socket Intel SandyBridge-EP Xeon E5-2670 and 128 GB RAM and 8 cores per socket. Hence, up to 448 cores are available.