Invasion with OpenMP and TBB

These html pages are based on the PhD thesis "Cluster-Based Parallelization of Simulations on Dynamically Adaptive Grids and Dynamic Resource Management" by Martin Schreiber.
There is also more information and a PDF version available.

[next] [tail] [up]

8.1 Invasion with OpenMP and TBB

Our Invasive Computing extensions are build on existing functionality of OpenMP¹ and Intel TBB² . Both parallelization models offer parallelization via pragma language extensions or via embedding into the C++ language with a library, respectively. A parallelization on shared-memory has similar restrictions compared to distributed-memory systems which are not considered in HPC standard threading libraries so far:

On shared-memory systems, an application can always be started using all available resources. However, an application should not be started when some of the accessed computing resources are used by other applications. Otherwise this leads to preemption and caches shared among both running applications [KCS04], hence leading to a severe loss of performance. This is in particular important for urgent computing (see e.g. [BNTB07]) with requirements of starting an application despite other applications already use the required resources.
A changing scalability of algorithms cannot be considered in an a-priori thread allocation. Our DAMR simulations introduced in Part III with their changing workload over the simulation leads to a strongly varying scalability over runtime. For significantly smaller workloads, see e.g. Tsunami parameter studies, this also leads to an underutilization of resources if not dynamically and efficiently shared with other concurrently running applications.

The applications considered in this work are based on time-stepping schemes. Here, we assume a loop, iterating over the time steps required for the simulation and the parallelization only inside the loop. Due to insufficiencies of OpenMP and TBB to change the number of threads inside a parallel region (see e.g. [Ope08] for OpenMP), we allow changes of threads only at the very beginning of each loop, thus only between each simulation time step. To support invasion of cores, we then have to (a) change the number of threads capable of work stealing and (b) set the pinning of the work stealing threads to physical compute cores.

(a): For OpenMP, we set the number of threads with omp_set_num_threads(#cores), and using TBB, the worker threads are set by tbb::scheduler_init(#cores).
(b): Regarding the pinning, we accomplish this by executing a single task for each thread, e.g. using a parallel for loop over the number of available cores and a chunk size of 1. For TBB, we first set the affinity of each task to the corresponding thread which is used to invade a core. In each thread, mutices are then used to avoid work stealing. Otherwise, such work stealing can result in unpinned threads or even a thread pinned to the wrong core. Inside the task, the affinity of the executing thread is then set to the invaded core, based on information provided by the RM.

We only update the number of active threads in each application and their pinning to cores every time if there’s a change in resources either in the number of threads or their pinning.

pict

Figure 8.1: Overview of Invasive Computing layers on shared-memory system. Each application is extended by a client layer adopting the resources and communicating with the resource manager.

Considering the previously mentioned requirements, this leads to a software design presented in Fig. 8.1. This extends each application with an invasive client layer which offers the invasive commands which are discussed in the next section. OpenMP and TBB are supported by this client-side extension. The resource manager then orchestrates the resources for all registered invasive applications.

[next] [front] [up]