Invasive client layer

These html pages are based on the PhD thesis "Cluster-Based Parallelization of Simulations on Dynamically Adaptive Grids and Dynamic Resource Management" by Martin Schreiber.
There is also more information and a PDF version available.

[next] [prev] [prev-tail] [tail] [up]

8.2 Invasive client layer

The Invasive paradigm was originally developed for specialized Invasive hardware, whereas we apply this paradigm on HPC shared-memory systems. This leads to different programming interfaces which are discussed next.

Joined invade and infect call:
We can join the invade and infect calls to a single invade. This is due to the shared-memory systems not requiring to infect a compute resource by replicating the kernel code since it is already accessible in each thread’s memory. As soon as the Invasive Computing paradigm is extended to HPC with distributed-memory systems, the infect call is required, e.g. to synchronize the simulation data such as the number of time steps.
No claims:
For our target application, the main simulation loop leads to invasions only requiring a single claim which represents the currently invaded resources. Thus, we assume a single claim existing for each application and invasions modifying this claim only.

This results in a simplified command space without infects and also no claim for Invasive Computing with our iteration-based shared-memory application. We next discuss the required constraint system for our simulations on dynamically changing grids (Sec. 8.2.1), the communication to the resource manager (Sec. 8.2.2) and the Invasive Computing API for shared-memory systems (Sec. 8.2.3).

8.2.1 Constraints

To schedule the compute resources for which the applications compete for, the applications have to provide information on their current state. Such a constraint specification can lead to very complex structures if requiring a general applicability on heterogeneous systems. E.g. an application can either require 3 cores and one accelerator or in case that the accelerator was not invadable, require more than three cores. This can be expressed with a tree-structure, resulting in a constraint-hierarchy (see [BRS⁺13]). Then, different constraint constellations can be combined with AND or OR relations. On the one hand, this constraint hierarchy yields a high flexibility, whereas on the other hand, forwarding this information to the resource manager can lead to expensive communication overheads to/from the resource manager. This is due to serialization of the constraint tree, sending the serialized version to the RM, receiving it and unpacking it by de-serialization. Additional complexity is generated inside the RM which has to evaluate the constraint hierarchy by using the constraint trees of different applications to optimize the resource assignment concurrently scheduled applications.

With our HPC shared-memory systems considered in this work which are only based on homogeneous cores, we decided to avoid such a constraint hierarchy by using a fixed number of non-hierarchical constraint properties in a list. Our constraint system is then based on the following properties:

min/max cores: This constraint limits the number of requested resources to the specified range of resources which are claimed.
scalability graph: Such a scalability graph is specified with a one-dimensional array with each entry representing the scalability for the number of cores.
workload: This hint can be used to distribute more cores to applications with more workload and vice versa.

Once specified by the application, these constraints then have to be communicated to the resource manager with details on the resource scheduling for scalability graphs and distribution hints further described in Section 8.4.

8.2.2 Communication to resource manager

A centralized resource manager is used which is responsible to optimize the resources based on constraints received from applications. Our client-server design (see Fig 8.1) suggests a message-based communication with the clients transferring constraints of a typically small message size to the server running as a dedicated process. Then the server (resource manager) requires receiving the message, processing it and sending one or more messages to the client (application). To overcome message processing latencies, also the ability of asynchronous communication is required.

All these requirements are fulfilled by the IPC System V message queues (MQ), supporting small message sizes which can be efficiently exchanged with the server. MQs also support testing whether a message can be dequeued for non-blocking communication.

During its setup, the centralized resource manager creates a message queue with a unique identifier. This makes the message queue singleton-like for all invasive applications which use the same identifier to communicate to the RM. Both the client and the server applications can then en- and dequeue messages using this message-queue identifier. Each message further has to be marked with a token specifying the receiver. We then mark the RM with identifier 0 and each invasive application with the application’s system-wide unique process id.

8.2.3 Invasive Computing API

The following invasive interfaces are then offered to the application developer:

setup()
This has to be executed directly after the program starts. It registers the application at the resource manager which only sends back a message if at least a single core was allocated to the application. Here we aim at avoiding initial resource conflicts.
shutdown()
This sends a signal to the RM to release all resources associated to the application. The application is then expected to exit immediately.
invade_blocking(constraints)
We further refer to the initially suggested invade call as a blocking invade since we also introduce non-blocking invasions. To highlight the differences, we first describe message processing of the blocking invade call:
(a)
A message including the constraints is sent to the RM and the program looses the control flow.
(b)
Then the RM optimizes the resource utilization based on all available constraints of each invasive application. This is further discussed in Sections 8.3 and 8.4.
(c)
After evaluations and optimizations inside the RM, possible changes of resources are sent to the application and the corresponding resources are marked to be used by the application. Such a resource update message then includes the core ids which are infected by setting appropriate affinities (see Sec. 8.1).
(d)
Finally, the control flow is given back to the program.
invade_nonblocking(constraints)
As previously described, using a blocking call for invasions involves latencies when waiting for feedback from the RM. These overheads lead to idling of all threads of an application while waiting for the RM’s response. With non-blocking invasion, the RM sends resource-update messages to the applications without a preceding invade. This allows the application to directly process resource update messages similar to the blocking invade by changing the number of cores and setting the pinning appropriately. Then, the constraints are forwarded to the RM, but without waiting for the reply message with the invaded resources.
In contrast to the blocking-invade call, a message including the constraints which describe the changing requirements is sent to the RM with the control flow directly given back to the application after the message was enqueued. Thus, we can overcome an idling of cores.
reinvade_blocking() and reinvade_nonblocking()
With the standard invade calls, constraints are always specified and forwarded to the RM. This leads to overheads of, first of all, serializing the constraint data and, secondly, evaluating and possibly optimizing the resource distribution on the RM side. However, if the resource requirements do not change, we avoid this serialization and optimization procedure with the reinvade interfaces. Here, the RM assumes that the same constraints which were previously forwarded to the RM should be used for optimization. Even if the current resource requirement did not change for application A, a change in resource requirements of application B can result in a change of resources for the application A.
retreat()
To hand back the currently used resources, a retreat can be executed by the application. Then, all resources except a single one is released with the remaining thread continuing execution of the application’s master thread.

[next] [prev] [prev-tail] [front] [up]