These html pages are based on the PhD thesis "Cluster-Based Parallelization of Simulations on Dynamically Adaptive Grids and Dynamic Resource Management" by Martin Schreiber.
There is also more information and a PDF version available.

3.4 Related work

With a wide range of already existing research for dynamic grid creation and management, we give a brief overview of the related work. We describe this work top-down, starting with the pure mesh-generation and decomposition approaches, followed by frameworks to run simulations on dynamically changing meshes and then related work which investigated the Sierpiński SFC for running simulations.

3.4.1 Dynamic mesh partitioning

We first introduce different approaches for dynamic mesh partitioning as well as dynamic mesh rebalancing and assume the mesh to be already given. We do not aim for a complete literature survey, but only give an overview of most related work.

Graph partitioner

One way to deal with mesh generation and its partitioning is to consider the mesh as a graph connecting mesh elements via their hyperfaces and apply graph partitioning algorithms on this graph. Such HPC partitioning algorithms aim at creating partitions of meshes with very good properties for efficient parallelization. One optimization property is e.g. reducing the number of interfaces shared with other partitions to reduce the data to be communicated by reducing the number of edge cuts for two-dimensional meshes and face cuts for three-dimensional meshes.

For dynamically adaptive meshes, additional optimization objectives for optimized remeshing are given by minimizing communication data, considering the costs for meshes after load balancing [KSK03] and maximizing quality of load balancing. Examples for such graph-based meshing and partitioning tools are e.g. ParMETIS [KK98]1 and PT-Scotch [CP08]2 .

Geometric partitioner using SFCs

Instead of using graph theory for optimized mesh partitioning, the underlying geometry and spatial placement of the mesh elements can be used. Such geometric partitioners can be based on e.g. SFCs curves. They yield one property which is of particular importance for partitioning: The surjective mapping of each point in an d-dimensional space to a one-dimensional interval, retaining the location of the d-dimensional space in the one-dimensional representation [MJFS01]. Thus, load balancing can be reduced to a one-dimensional representation: each partition is assigned an interval of the one-dimensional problem with similar workload [BZ00]. Libraries supporting SFC-based partitioning are e.g. amatos [BRH+05]3 and Zoltan [BCCD12]4 .

Graph- vs. SFC-based partitioning

Handling dynamic partitioning based on graph partitioners obviously involves complex graph operations [BZ00], resulting in an NP-hard problem [Zum00]. Even with a multilevel diffusive scheduling to compensate the NP-hard problem, a partitioning based on SFCs still leads to faster load balancing while still yielding partition optimality regarding the edge cut close to graph-based partition algorithms [Mit07].

3.4.2 Simulation software including grid generation

We give an overview of well-known frameworks with a built-in grid generation. Again, we also do not aim for a complete survey, but only consider the work which is of most relevance for our considered simulations. We compare the possibilities of each framework regarding our requirements described in the previous sections.

With the memory efficiency requirement for dynamically adaptive simulations on triangular grids being the main driving requirement of this work, none of the frameworks mentioned above, and to our best knowledge also no other frameworks fulfill our demands.

3.4.3 Related software development

This section outlines our contributions compared to the preceding work on the serial and parallel simulation with the Sierpiński stack- and stream-based approach. Based on the dissertation of Csaba Vigh [Vig12], our major algorithmic differences and achievements are the following:

3.4.4 Impact on and from related work

The serial traversal with the stack- and stream-based approach used in this thesis is based on the preceeding research on serial traversals, e.g.  [GMPZ06,BSVB08]. In our work, we developed a software design and algorithms for an efficient parallelization on shared- as well as distributed-memory systems.

There are several groups working on SFC-based traversal and organization schemes for PDE solvers; to our knowledge, only the work of Prof. Michael Bader and his co-workers is also Sierpiński -based and is called sam(oa)2. Their work has the closest relation to this thesis, and that’s why we mention it here. Their focus was initially on a software design and a parallelization approach which only supports MPI parallelization, see e.g.  [Vig12] which was developed in cooperation with Kaveh Rahnema.

[ERRATA START]
Researchers are typically obliged to cite ideas and related work of others. However, sometimes it's required to publish also information on permanent redevelopments and that ideas are taken from others. This is the reason why I'm now publishing this information as part of an errata.

The idea of a cluster-based parallelization concept (e.g. independent storage for partitions, run-length encoded communication system which is updated implicitly by adativity markers, etc.) was developed at the end of 2010 and the first parallel (shared-memory version) version was finished in early 2011, see also https://bitbucket.org/schreiberx/sierpi/. This was explained to Prof. Michael Bader at the beginning of 2011. In particular, based on my development I've provided Prof. Michael Bader plenty of new ideas and concepts on algorithmic concepts of solving simulations with dynamic adaptive mesh refinement in 2010 and 2011. These ideas were orthogonal to their development which was described in a preliminary version of the PhD thesis of Csaba Vigh which Csaba gave to me at the end 2010 or beginning of 2011.

Most of the underlying ideas and concepts were explained to Prof. Michael Bader in 2011 and in particular to Oliver Meister in 2011/2012, see e.g. the presentation of the chair meeting. This also includes in particular the suggested extensions of the clustering approach to SFC cuts (extension from tree splits to intervals of SFC). The limitations of their original approach (e.g. MPI-only, no cluster-based optimizations, no optimizations of residual-aware iterative solvers via cluster-skipping, etc.) and how the cluster-related algorithms work were explained to Oliver Meister when trying to convince him that we should join our forces. The task of merging the original parallelization approach (developed by Kaveh Rahnema and Csaba Vigh) was assigned to Oliver Meister in mid 2012. We will briefly outline how Oliver Meister, one of the main samoa2 developers who was assigned to merge the original parallelization in samoa 2, decided to drop their previous approach in mid 2012. This is based on memory logs, hence the statements are not 1:1 accurate.


It is not the duty of researchers to cite what others (Oliver Meister, Prof. Michael Bader and Kaveh Rahnema) are successively reprogramming and that they intended to rename it -- "We don't call them cluster, we call them segments" (statement of Oliver Meister, 2013). As a matter of honesty, it is the task of researchers to cite the related work which they reprogrammed, to offer a co-authorship to the person who significantly contributed by providing ideas, concepts and strategies. It's also important that real researchers are honest about their decision of dropping their former approach (otherwise Csaba Vigh would be still on the author list of their source code) and to reprogram the approach of others.

It seems to be appropriate to publish this and further comments as part of an errata of my PhD thesis. To make a long story short, Oliver Meister steadily reprogrammed implementations and ideas. There was never any will of cooperation by Oliver Meister and he reprogrammed ideas and concepts from the very beginning on. Only the concept which I've personally developed as part of my PhD allows efficient MPI/OpenMP/TBB parallelization, cluster-based optimizations, node-based communication, efficient updating of RLE data, etc.

We will see if Prof. Michael Bader, Oliver Meister and Kaveh Rahnema cited the sources correctly (citations can be also given by referencing presentations, communications, etc.)...

Finally, to correctly bridge the text to the next paragraph, the cluster-based parallelization concept and many other algorithms and potential features which arise with this novel parallelization concept were explained to Prof. Michael Bader in 2011. [ERRATA END]

Several discussions with him and his group led to certain assimilations of our approaches. However, the major difference still lies in their focus on SFC cuts for partitioning. These SFC cuts lead to further research-related challenges such as the construction of consistent meta information.