Related work

With a wide range of already existing research for dynamic grid creation and management, we give a brief overview of the related work. We describe this work top-down, starting with the pure mesh-generation and decomposition approaches, followed by frameworks to run simulations on dynamically changing meshes and then related work which investigated the Sierpiński SFC for running simulations.

3.4.1 Dynamic mesh partitioning

We first introduce different approaches for dynamic mesh partitioning as well as dynamic mesh rebalancing and assume the mesh to be already given. We do not aim for a complete literature survey, but only give an overview of most related work.

Graph partitioner

One way to deal with mesh generation and its partitioning is to consider the mesh as a graph connecting mesh elements via their hyperfaces and apply graph partitioning algorithms on this graph. Such HPC partitioning algorithms aim at creating partitions of meshes with very good properties for efficient parallelization. One optimization property is e.g. reducing the number of interfaces shared with other partitions to reduce the data to be communicated by reducing the number of edge cuts for two-dimensional meshes and face cuts for three-dimensional meshes.

For dynamically adaptive meshes, additional optimization objectives for optimized remeshing are given by minimizing communication data, considering the costs for meshes after load balancing [KSK03] and maximizing quality of load balancing. Examples for such graph-based meshing and partitioning tools are e.g. ParMETIS [KK98]¹ and PT-Scotch [CP08]² .

Geometric partitioner using SFCs

Instead of using graph theory for optimized mesh partitioning, the underlying geometry and spatial placement of the mesh elements can be used. Such geometric partitioners can be based on e.g. SFCs curves. They yield one property which is of particular importance for partitioning: The surjective mapping of each point in an d-dimensional space to a one-dimensional interval, retaining the location of the d-dimensional space in the one-dimensional representation [MJFS01]. Thus, load balancing can be reduced to a one-dimensional representation: each partition is assigned an interval of the one-dimensional problem with similar workload [BZ00]. Libraries supporting SFC-based partitioning are e.g. amatos [BRH⁺05]³ and Zoltan [BCCD12]⁴ .

Graph- vs. SFC-based partitioning

Handling dynamic partitioning based on graph partitioners obviously involves complex graph operations [BZ00], resulting in an NP-hard problem [Zum00]. Even with a multilevel diffusive scheduling to compensate the NP-hard problem, a partitioning based on SFCs still leads to faster load balancing while still yielding partition optimality regarding the edge cut close to graph-based partition algorithms [Mit07].

3.4.2 Simulation software including grid generation

We give an overview of well-known frameworks with a built-in grid generation. Again, we also do not aim for a complete survey, but only consider the work which is of most relevance for our considered simulations. We compare the possibilities of each framework regarding our requirements described in the previous sections.

With the memory efficiency requirement for dynamically adaptive simulations on triangular grids being the main driving requirement of this work, none of the frameworks mentioned above, and to our best knowledge also no other frameworks fulfill our demands.

3.4.3 Related software development

This section outlines our contributions compared to the preceding work on the serial and parallel simulation with the Sierpiński stack- and stream-based approach. Based on the dissertation of Csaba Vigh [Vig12], our major algorithmic differences and achievements are the following:

3.4.4 Impact on and from related work

The serial traversal with the stack- and stream-based approach used in this thesis is based on the preceeding research on serial traversals, e.g. [GMPZ06,BSVB08]. In our work, we developed a software design and algorithms for an efficient parallelization on shared- as well as distributed-memory systems.

There are several groups working on SFC-based traversal and organization schemes for PDE solvers; to our knowledge, only the work of Prof. Michael Bader and his co-workers is also Sierpiński -based and is called sam(oa)². Their work has the closest relation to this thesis, and that’s why we mention it here. Their focus was initially on a software design and a parallelization approach which only supports MPI parallelization, see e.g. [Vig12] which was developed in cooperation with Kaveh Rahnema.

[ERRATA START]
Researchers are typically obliged to cite ideas and related work of others. However, sometimes it's required to publish also information on permanent redevelopments and that ideas are taken from others. This is the reason why I'm now publishing this information as part of an errata.

The idea of a cluster-based parallelization concept (e.g. independent storage for partitions, run-length encoded communication system which is updated implicitly by adativity markers, etc.) was developed at the end of 2010 and the first parallel (shared-memory version) version was finished in early 2011, see also https://bitbucket.org/schreiberx/sierpi/. This was explained to Prof. Michael Bader at the beginning of 2011. In particular, based on my development I've provided Prof. Michael Bader plenty of new ideas and concepts on algorithmic concepts of solving simulations with dynamic adaptive mesh refinement in 2010 and 2011. These ideas were orthogonal to their development which was described in a preliminary version of the PhD thesis of Csaba Vigh which Csaba gave to me at the end 2010 or beginning of 2011.

Most of the underlying ideas and concepts were explained to Prof. Michael Bader in 2011 and in particular to Oliver Meister in 2011/2012, see e.g. the presentation of the chair meeting. This also includes in particular the suggested extensions of the clustering approach to SFC cuts (extension from tree splits to intervals of SFC). The limitations of their original approach (e.g. MPI-only, no cluster-based optimizations, no optimizations of residual-aware iterative solvers via cluster-skipping, etc.) and how the cluster-related algorithms work were explained to Oliver Meister when trying to convince him that we should join our forces. The task of merging the original parallelization approach (developed by Kaveh Rahnema and Csaba Vigh) was assigned to Oliver Meister in mid 2012. We will briefly outline how Oliver Meister, one of the main samoa2 developers who was assigned to merge the original parallelization in samoa 2, decided to drop their previous approach in mid 2012. This is based on memory logs, hence the statements are not 1:1 accurate.

Oliver Meister (beg 2011): Among others, I've explained Oliver Meister during a visit in Stuttgart around end of 2010/beginning of 2011 my software concepts of using C++ and a code generator. He also reprogrammed this as part of a student's work "Miriam Greis, Jessica Hacklaender, Pascal Hirmer: Systemanalyse und Workflowverwaltung eines Frameworks fuer Cache-effiziente adaptive Simulation" http://elib.uni-stuttgart.de/opus/volltexte/2012/7375/ . Oliver Meister never showed any intention of any collaboration or sharing of this work. (Google made me aware of this work in 2013).
Oliver Meister (mid 2012): ``Michael (Prof. Michael Bader) wants me to merge my development with the parallelization approach of Kaveh however this doesn't make any sense. I'm not doing so, but I'm now also implementing something which supports cluster-based local time stepping". Oliver Meister decided once more to follow ideas of others and decided to drop the previous parallelization approach into the Trash bin. Here, Oliver Meister also broke the agreement which was made by myself with Kaveh Rahnema (Oliver Meister attended this meeting) who was originally responsible for the parallelization of the sam(oa)2 framework. This agreement was to stick to the old parallelization approach and not to reprogram the clustering approach.
Prof. Michael Bader (mid 2012): Presentation at the Newton Institute "Parallelization and Software Concepts for Tsunami Simulation on Dynamically Adaptive Triangular Grids", Isaac Newton Institute, http://rss.sms.cam.ac.uk/media/1292670, http://www.newton.ac.uk/programmes/AMM/seminars/2012082414301.pdf, with performance results indicating that their work is based on the PhD thesis of Csaba Vigh (In fact, this was developed together with Kaveh Rahnema). I should mention here that at this point in time Prof. Michael Bader was probably not aware of Oliver Meister's decision not to follow his order but to "also implement[...] something which supports cluster-based local time stepping". (See e.g. also the video recording of this event).
Oliver Meister (end 2012): ``We don't call them clusters, we call them segments".
Oliver Meister (end 2012): ``I'm now also first implementing OpenMP. Then it's easy to extend it to MPI". Here I'd like to mention that a clustering approach as it was presented to them first results in an OpenMP implementation (multiple partitions per program context) which can then be easily extended to MPI as it was explained to him.
Kaveh Rahnema (end 2012): ``Is it OK that we're using your run-length encoding?" My answer: ``Yes, of course". Here I should mention that with their original approach from 2010, using the RLE neither allows OpenMP parallelization, nor optimal data migration, nor cluster-based optimizations such as workstealing on shared- and distributed memory systems.
Oliver Meister (beg 2013) After offering him a co-authorship on a joined paper on the cluster-based parallelization concept: ``Michael [Bader, Prof.] first wants me to finish the parallelization before I can write on a paper"... and I've waited...
Oliver Meister (mid 2013) Presented the new parallelization in samoa2 including OpenMP as an extension of the original parallelization concept.
Oliver Meister (mid 2013) Claimed ``These are all my ideas". Here it should be also stated that an algorithm to reconstruct the RLE-based connectivity information with a spanning tree was in-fact developed by himself.
Oliver Meister (mid 2013) After asking him about planned publications: ``Yes, we [not including the originator of the ideas] are going to publish it".
Oliver Meister (mid 2013) After asking him a follow-up question how they can publish this work with the related work from me not yet published (e.g. the data migration, 0-length RLE for nodes): ``We're simply not citing it".
Prof. Michael Bader (mid 2013): An about 3 hour conversation with Prof. Michael Bader without mentioning the previous statement of Oliver Meister since this would probably cost him his PhD position. The work on cluster-based optimizations was already accepted at the EuroPar 2013 and there was no reason to discuss this. No concrete agreement was made.
As part of the GeoScience conference, they were still honest in the abstract "Abstract of the GeoScience conference 2013": "... discuss a run-length encoding to deal with the data exchange on the shared inter-process ..." showing that they reprogrammed the run-length encoding, but this was the last time that this was stated.
Oliver Meister (end 2013): He asked if I've published the data migration since they also want to publish something. I've decided to write the final paper in 2014 in a way which also covers the extension to SFC cuts as it was explained to Oliver Meister before he started ``implementing something which also supports cluster-based local time stepping".
Oliver Meister (beg 2014): Again, he asked if I published the data migration.
(2015-...) See publications and presentations of Prof. Michael Bader and Oliver Meister by simply claiming that they can do MPI+OpenMP parallelization without providing details on the underlying algorithms.

It is not the duty of researchers to cite what others (Oliver Meister, Prof. Michael Bader and Kaveh Rahnema) are successively reprogramming and that they intended to rename it -- "We don't call them cluster, we call them segments" (statement of Oliver Meister, 2013). As a matter of honesty, it is the task of researchers to cite the related work which they reprogrammed, to offer a co-authorship to the person who significantly contributed by providing ideas, concepts and strategies. It's also important that real researchers are honest about their decision of dropping their former approach (otherwise Csaba Vigh would be still on the author list of their source code) and to reprogram the approach of others.

It seems to be appropriate to publish this and further comments as part of an errata of my PhD thesis. To make a long story short, Oliver Meister steadily reprogrammed implementations and ideas. There was never any will of cooperation by Oliver Meister and he reprogrammed ideas and concepts from the very beginning on. Only the concept which I've personally developed as part of my PhD allows efficient MPI/OpenMP/TBB parallelization, cluster-based optimizations, node-based communication, efficient updating of RLE data, etc.

We will see if Prof. Michael Bader, Oliver Meister and Kaveh Rahnema cited the sources correctly (citations can be also given by referencing presentations, communications, etc.)...

Finally, to correctly bridge the text to the next paragraph, the cluster-based parallelization concept and many other algorithms and potential features which arise with this novel parallelization concept were explained to Prof. Michael Bader in 2011. [ERRATA END]

Several discussions with him and his group led to certain assimilations of our approaches. However, the major difference still lies in their focus on SFC cuts for partitioning. These SFC cuts lead to further research-related challenges such as the construction of consistent meta information.

3.4 Related work