PFLOTRAN is built on top of the PETSc framework and uses numerous features from PETSc including nonlinear solvers, linear solvers, sparse matrix data structures (both blocked and nonblocked matrices), vectors, constructs for the parallelism of PDEs on structured grids, options database (runtime control of solver options), and binary I/O. Many of these have been enhanced for the benefit of improvements to PFLOTRAN. In addition, since PFLOTRAN is coded in Fortran 90, numerous additions to support for Fortran 90 have been added to PETSc. Additional improvements were made to enable high performance of PFLOTRAN on DOE's performance class machines the Cray XT4/5 and IBM BG/P. Specific major additions to PETSc are discussed below.
PFLOTRAN is written using a modern Fortran 90/95 coding paradigm, and makes extensive use of Fortran 90 pointers. Because PETSc is written in C, some mechanism is required to allow access to PETSc's C data structures using Fortran 90 pointers. PETSc formerly used a compiler specific mechanism (for each Fortran 90 compiler there was specific code that depended on that particular compiler's data structures) to handle Fortran 90 pointers, which proved problematic for PFLOTRAN because the Cray XT family of machines uses the Portland Group compilers, which do not support a compiler specific way of handling Fortran 90 pointers in C.
This led PFLOTRAN and PETSc developers to develop a clever, portable, compiler independent way of managing Fortran 90 pointers from C. PETSc no longer needs to support eight different Fortran compilers with special code, and PFLOTRAN now runs portably on the Cray XT series, as well as any other platform on which PETSc can be built.
For large data sets, efficient ASCII formatted file input/output is not possible, thus binary formats must be used. This poses several problems in the HPC environment.
First the data files must be portable, so that files may be freely copied between different HPC platforms as well as desktop systems and the reading and writing of the files must be efficient from single processor systems to runs with tens of thousands of processors. Several important additions to PETSc were made to make this possible. First we added Fortran 90 support for the PETSc binary I/O operations, we then added an MPI-IO back-end to the PETSc binary I/O and also a HDF5 back-end to those operations, allowing scalable portable performance from the laptop to the peta-scale.
Selection of the actual I/O back-end can be made a runtime depending on the problem size and machine configuration, thus allowing tuning without rebuilding the application. In addition to saving extremely large data sets there is also a need to systematically save and load small amounts of information related to various parameters of the simulation, this is handled with the PetscBag construct. Full support has been added for handling these data structures using Fortran 90 derived types (in addition to C structs that were previously supported).
The bulk of the time spent in the PFLOTRAN simulation occurs in the large sparse linear system solves. Experimentation with the currently available solvers from the TOPS software suite (PETSc, hypre and Trilinos) has demonstrated that the stabilized bi-conjugate gradient (Bi-cg-stab) method is usually the best Krylov accelerator for PFLOTRAN. A drawback of the Bi-CG-stab method is that in its standard implementation requires four global reduction options (MPI_All_reduce() calls) per iteration and PFLOTRAN requires hundreds of iterations per time-step.
With the standard implementation we have found that upwards of 30 percent of the PFLOTRAN runtime for runs with over 8,000 processors on the Cray XT4 was devoted to these global reductions. Thus PETSc and PFLOTRAN developers have implemented a new variant of Bi-CG-stab that has a single global reduction per iteration at the cost of a much more complicated algorithm and more local computation (additional vector operations). This new solver reduces the runtime of the entire simulation by roughly 15 percent. Future improvements (by optimizing the extra computations) are expected to reduce the runtime a bit more. Another solver improvement implemented by the PETSc developers is an improved coarse grid problem for the highly varying coefficients PDE solved by PFLOTRAN, for which standard multigrid coarse grid do not work well. This new solver based on the wirebasket coarse grid spaces introduced by Bramble, Pasciak, and Schatz (1989) and expanded by Dryja, Smith, and Widlund (1994) has theoretical convergence results independent of the PDE coefficients.