These html pages are based on the PhD thesis "Cluster-Based Parallelization of Simulations on Dynamically Adaptive Grids and Dynamic Resource Management" by Martin Schreiber.
There is also more information and a PDF version available.

5.11 Hybrid parallelization

The number of cores on cache-coherent memory domains considerably increased during the last decade. Shared-memory systems with several threads per CPU are nowadays omnipresent and with Intel’s XeonPhi, even more than 100 threads have to be programmed in a shared-memory environment in an efficient way. Such a hybrid parallelization yields several advantages; some of them are:

We discuss two alternative approaches for the inter-cluster communication presented in Section 5.10.1.

Since our results already yield sufficient efficiency for hybrid parallelization to simulate Tsunamis on distributed-memory systems, we did not implement these alternatives.