Abstract

Development and application of the open-source GPU-based fluid-thermal simulation code, NekRS, are described. Time advancement is based on an efficient kth-order accurate timesplit formulation coupled with scalable iterative solvers. Spatial discretization is based on the high-order spectral element method (SEM), which affords the use of fast, low-memory, matrix-free operator evaluation. Recent developments include support for nonconforming meshes using overset grids and for GPU-based Lagrangian particle tracking. Results of large-eddy simulations of atmospheric boundary layers for wind-energy applications as well as extensive nuclear energy applications are presented.

1 Introduction

NekRS [1] is a new exascale-ready thermal-fluids solver for incompressible and low-Mach number flows. Like its predecessor, Nek5000 [2], it is based on the spectral element method (SEM, Ref. [3]) and is scalable to 105 processors and beyond [46]. The principal differences between Nek5000 and NekRS are that the former is based on F77/C, coupled with MPI and fast matrix–matrix product routines for performance, while the latter is based on C++ coupled with MPI and open concurrent compute abstraction (OCCA) kernels for GPU support. OCCA [7,8] provides backends for CUDA, HIP, OpenCL, and DPC++, for performance portability across all the major GPU vendors. As we detail later, the key NekRS kernels are realizing anywhere from 2- to 12-TFLOPS per accelerator, with sustained performance >0.9 TFLOPS (FP64) on the NVIDIA A100-based Polaris platform at the Argonne Leadership Computing Facility (ALCF). For the pressure Poisson solver, which is critical to the performance of incompressible flow simulations, NekRS supports a wide variety of preconditioners tailored to the computational characteristics of GPU-based supercomputers [9]. While the feature list of NekRS is not as long as that for the 35+ year-old Nek5000 code, the number of GPU-supported features is rapidly increasing, including: moving meshes through an arbitrary Lagrangian–Eulerian formulation and an overset-grid capability; large eddy simulation (LES) subgrid-scale models [10], unsteady Reynolds-averaged Navier–Stokes (RANS) based on a kτ formulation [11]; and Lagrangian/Stokes particle tracking [12]. We note that, at 80% parallel efficiency, which corresponds to 2–3 × 106 points per MPI rank on the NVIDIA V100 and AMD MI250X based hardware, NekRS typically realizes 0.1–0.3 s of wall clock time per time-step. These runtimes are nearly a 3× improvement over production runs of Nek5000 at a similar 80% strong-scale limit on the IBM BG/Q, Mira, which was the state-of-the-art for scalability at the start of the NekRS development [5,13,14].

In the following, we review some performance-critical features of NekRS and describe recent large-scale applications. The paper is organized as follows. Section 2 describes the governing equations and discretization formulations in space and time. Section 3 discusses the simulation capabilities. Section 4 demonstrates nuclear energy applications. Section 5 discusses wind energy applications. We then provide some conclusions in Sec. 6.

2 Formulation

2.1 Temporal Discretization.

For computational efficiency, NekRS uses a kth-order (k=2 or 3) time-split formulation of the incompressible Navier–Stokes (NS) equations in which each term in the governing physics is addressed in distinct substeps [1517]. In nondimensional form, we start with the full equations governing u(x,t), the velocity, and p(x,t), the pressure divided by (constant) density, for xΩ
(1)
(2)
with initial condition, u(x,t=0)=u0, and boundary condition, u(x,t)=ub on that part of the domain boundary where Dirichlet conditions are applied, ΩDΩ. For purposes of this exposition, we assume homogeneous Neumann conditions on ΩN:=Ω\ΩD. The constant density does not appear in Eq. (1) because its nondimensional value is 1. The terms on the left of Eqs. (1) and (2) are treated implicitly, with the temporal derivative computed at time tm using a kth-order backward difference formulation (BDFk)
(3)
where umj denotes the velocity field at time tmj. For the case of uniform time-step size Δt,tm=mΔt, and the BDFk coefficients are [β0,,βk] = [−1] for k =1, [3/2, −4/2, 1/2] for k =2, and [11/6, −18/6, 9/6, −2/6] for k =3. The terms on the right of Eq. (1), which include nonlinear advection and any forcing (e.g., Coriolis or Boussinesq terms) are treated with kth-order extrapolation. If g:=u·u+f, then the right-hand side of Eq. (1) is
(4)

For uniform Δt, we have [α1,,αk] = [1] for k =1, [2, –1] for k =2, and [3, −1,3] for k =3. For the semidiscretization in time, we drop the O(Δtk) residual terms and use Eqs. (3) and (4) in Eq. (1). All other terms are evaluated implicitly at time tm.

The first step in the splitting scheme is simply to known quantities from previous timesteps, which amounts to collecting terms from Eqs. (3) and (4) to construct a tentative velocity field defined as
(5)
As this step is just data collection, no boundary conditions are imposed in Eq. (5). The remaining implicit equations are linear but are still coupled because of the incompressibility constraint (2). Formally taking the divergence of the momentum Eq. (1) and requiring ·um=0 leads to a Poisson equation for the pressure
(6)
Boundary conditions for Eq. (6) are found by taking the dot product of Eq. (1) with the outward-pointing unit normal on Ω, which yields
(7)
We remark that um is known on Ω while 2um is not. As with the nonlinear term, the unknown viscous term can be treated explicitly after applying the vector identity
(8)
The first term on the right is set to zero because of Eq. (2), so one only needs to extrapolate the curl of the vorticity, ×(×um), on ΩD [17]. Once the pressure is known from Eqs. (6) and (7), we can compute each component of um=[u1mu2mu3m] by solving a sequence of Helmholtz problems
(9)

Subject to appropriate boundary conditions on each component of um. In the case of mixed boundary conditions or variable viscosity, we replace Eq. (9) by a system in which the velocity components are coupled. In either case, the viscous system will be diagonally dominant for modest to high Reynolds number applications given that Δt/Re1. For these cases, Jacobi preconditioned conjugate gradient iteration is an efficient solution strategy for the fully discretized form of Eq. (9).

2.2 Spatial Discretization.

NekRS is based on the SEM [3], which is a high-order p-type finite element method (FEM) restricted to elements that are isoparametric images of the unit cube, Ω̂:=[1,1]3. On each element, Ωe,e=1,,E, the solution and geometry are represented by tensor-product sums of Nth-order Lagrange polynomials in Ω̂
(10)

With nodal points ξj taken to be the Gauss–Lobatto–Legendre (GLL) quadrature points, this basis is stable and does not exhibit the Runge phenomenon often associated with high-order polynomials (e.g., Ref. [18]). Production values of N for NekRS are typically in the range 5–10, but we have run single element cases with Nek5000 up to N =100. As noted in the early work of Orszag [19], restriction to tensor-product forms permits the use of fast matrix-free formulations in which operator evaluation is effected in O(EN4) operations and O(EN3) memory references for an E-element mesh, even in the case of deformed geometries. The number of grid points is nEN3, so the storage is only O(n) and the work only O(Nn), which is in sharp contrast to the standard FEM, where O(nN3) storage and work complexities typically compel a restriction to N <4.

The elliptic subproblems (6) and (9) are cast in terms of variational projection operators. For example, for u:=ujm, the discrete form of Eq. (9) reads, FinduX0NH01such that
(11)
Here (p,q):=Ωpqdx is the standard L2 inner product on Ω; a(p,q):=Ωp·qdx is the L2 inner product of the gradients; X0N is the trial/test space comprising the spectral element basis functions; and H01 is the space of square-integrable functions on Ω that vanish on ΩD whose gradient is also square-integrable. In the SEM, it is standard to replace integration by quadrature on the nodal points, which is effective because the rapid convergence of GLL quadrature for smooth functions. Formally expanding u in terms of trial functions ϕj(x) that span X0N,u(x):=j=1nϕj(x)uj, and setting v=ϕi,i=1,,n, leads to the matrix form of Eq. (11)
(12)

where u¯=[u1un]T is the vector of unknown basis coefficients, Aij=ϕi·ϕjdx is the stiffness matrix, and Bij=δijϕidx is the mass matrix, which is diagonal because of the use of GLL quadrature.

As noted earlier, A is never formed. Rather, in the context of iterative solvers, one simply evaluates the action of A on a vector, w¯=Au¯, which requires two essential operators. The first is a copy operation that maps from the global representation, u¯, to the local representation, u¯L:={uijke}, which is expressed as u¯L=Qu¯, where Q is a Boolean matrix. Associated with Q is the transpose, which sums local nodal values to their global counterparts (see Fig. 1). The second is the local stiffness matrix, which in IRd (d =2 or 3) is [20]
(13)

where for d =2, D1=ÎD̂, and D2=D̂Î, with Î is the (N+1)×(N+1) identity and D̂ is the dense differentiation matrix having entries D̂pq=hq(ξp). Each Gije=Gjie is a diagonal matrix comprising a combination of quadrature weights, Jacobians, and metric terms from the map Ω̂Ωe [20]. Defining AL:= block-diag(Ae), we can formally write w¯=Au¯=QTALQu¯. Note that application of Di and DiT amount to efficient tensor contractions that are simultaneously applicable to all elements. For parallel efficiency, all variables are stored in local form so that they can readily be differentiated, integrated, etc., which leads to the local only form w¯L=QQTALu¯L. This approach requires redundant storage of shared interface values but allows Q and QT to be combined into a single gather-scatter (exchange-and-sum) operation when computing the matrix–vector product Au¯.

Fig. 1
Two-dimensional illustration of fundamental SEM mesh configurations for E = 2, N = 4: (left) global mesh, reflecting degrees-of-freedom u¯=[u1 … un]T; (center) collection of spectral elements, Ωe, each with local degrees-of-freedom, u¯e=uije; and (right) reference domain for local coordinates (r,s)∈Ω̂:=[−1,1]2. In this example, u¯L=Qu¯ would copy the values on the shared interface in Ω to the corresponding local edges on Ω1 and Ω2 (e.g., u5 would be copied to u4,01 and u0,02). Application of QT would yield the sum of the values on the shared interface (e.g., w¯=QTu¯L would yield w5=u4,01+u0,02).
Fig. 1
Two-dimensional illustration of fundamental SEM mesh configurations for E = 2, N = 4: (left) global mesh, reflecting degrees-of-freedom u¯=[u1 … un]T; (center) collection of spectral elements, Ωe, each with local degrees-of-freedom, u¯e=uije; and (right) reference domain for local coordinates (r,s)∈Ω̂:=[−1,1]2. In this example, u¯L=Qu¯ would copy the values on the shared interface in Ω to the corresponding local edges on Ω1 and Ω2 (e.g., u5 would be copied to u4,01 and u0,02). Application of QT would yield the sum of the values on the shared interface (e.g., w¯=QTu¯L would yield w5=u4,01+u0,02).
Close modal

2.3 Performance Considerations.

Each Navier–Stokes time-step requires three principal substeps: advection; pressure-correction; and viscous solve.

Advection can be addressed either through extrapolation, as in Eq. (5) or, through a characteristics-based scheme that allows for a larger time-step. BDF3/EXT3 limits the Courant–Friedrichs–Lewy number, CFL:=Δtmaxi(ui/δxi) to 0.5. By using the substitution
(14)

and approximating the material derivative on the left using kth-order backward differencing along the characteristics that are incident to each nodal point, the characteristics scheme is effectively implicit. The requisite terms in the BDFk expansion can be found tracing backward along the characteristics [21] or by solving a sequence of linear hyperbolic advection problems using (say) fourth-order Runge–Kutta, which permits CFL2 [22,23]. For stability reasons [24], the advection terms, (v,u·u), are de-aliased by using M3N/2 Gauss–Legendre quadrature points in each direction, which can make hyperbolic subproblems relatively expensive. Thus, a 4× increase in Δt might yield only a 2× reduction in Navier–Stokes solution time, with some of the cost coming from an increase in the number of advection substeps and some coming from an increase in number of pressure iterations on each step.

The viscous problems (9) can be efficiently solved with diagonally preconditioned conjugate-gradient iterations and are generally solved in a block form which allows QQT to be applied to a vector field involving u¯ rather than separate scalar fields, u¯i,i=1,,3, which reduces message latency overhead by a factor of three.

The pressure Poisson solve (6) is intrinsically the stiffest substep as it represents the action of (infinitely) fast acoustic waves. NekRS uses GMRES with a variety of pressure preconditioners. The default is to use p-multigrid with local Schwarz based smoothing [9,25] coupled with fourth-kind Chebyshev acceleration [26]. It is sometimes effective to precondition the SEM operators with spectrally equivalent low-order FEM meshes [2729]. This idea, which dates back to the seminal work of Orszag [19], is also supported in NekRS and is useful for cases with ill-conditioned meshes [9].

All of the key kernels support multiple OCCA versions that are just-in-time compiled with fixed array dimensions, which significantly improves compiler performance. The kernels are tested in the run-time setup phase and NekRS selects the fastest version; different versions may be used for different data sizes (e.g., the p-multigrid solver for elements of varying order with a different kernel for each). Figure 2 shows typical entries from a NekRS logfile.

Fig. 2
Logfile entries for a NekRS run on the NVIDIA A100-based Polaris at ALCF
Fig. 2
Logfile entries for a NekRS run on the NVIDIA A100-based Polaris at ALCF
Close modal

Similarly, there are several ways to effective nearest-neighbor exchanges for the gather-scatter operation, QQT. Data can be packed on the GPU and exchanged using either GPU-direct or via the host, or it might be packed and exchanged solely through the host. Again, NekRS will time and select the fastest option during the gather-scatter setup of each operation. An additional option that is autoselected at setup is whether to overlap communication and computation by first applying operators to elements that are on the subdomain surface (i.e., those that require nonlocal data), initiating the exchange with neighboring processes, completing operator evaluation for interior elements, and finally completing the gather-scatter for the surface elements upon receipt of nonlocal data. Covering communication in this way typically results in a 10–15% savings in runtime. Another 15% gain is typically realized by using reduced (FP32) precision in the pressure preconditioning steps, which is an idea that has been pursued in other high-order codes as well (e.g., Ref. [30]).

We remark that all of the optimizations are designed to be effective in the strong-scale limit, meaning that the simulation is using as many GPUs as are effective. Through numerous tests, we have found that NekRS requires 2–4 × 106 points per GPU to realize 80% parallel efficiency; using fewer points per rank by using more GPUs leads to efficiencies that are generally deemed as unacceptable. At the 80% operating point, NekRS typically requires only 0.1–0.4 s per time-step, regardless of the overall problem size [13,14]. We demonstrate strong-scaling results to illustrate these points using the atmospheric boundary-layer test case on the V100-based Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF) (see Fig. 19 in Sec. 5).

Figure 2 shows some of the logfile entries for a typical NekRS run on the NVIDIA A100-based Polaris at the ALCF. The case is a 17 × 17 rod-bundle configuration with 97 M gridpoints (E =277,000, N =7) running on P =32 GPUs (3M points per rank). The table indicates that kernel version (kv) 2 is selected for the FP64 implementation of the ALu¯L kernel when N =7, but kv = 5 for N =3. For FP32, the fastest kernel versions are, respectively, kv = 4 and 2, each of which sustains the expected 2× performance over their FP64 counterparts. The fdm kernels implement the fast-diagonalization method [9,20], which is used to solve the local problems in the overlapping Schwarz preconditioner. The sustained FLOPS rate, reported for all 32 ranks, corresponds to 0.816 TFLOPS per rank. This value is computed by using a 1/2 flop for each FP32 floating point operation, so that the aggregate result provides a reasonable estimate of sustained FP64 performance. Further details regarding NekRS performance optimization can be found in Refs. [5] and [9].

In addition to performance gains, NekRS preserves the high-order convergence properties exhibited by Nek5000, which are intrinsic to the rapidly convergent SEM discretization. Nek5000 has had extensive verification studies for problems with analytical solutions (e.g., Refs. [20], [31], and [32]) and validation for more challenging turbulent flows (e.g., Refs. [33] and [34]). NekRS has been thoroughly tested to match the baseline Nek5000 results on all of its features, including the RANS models.

2.4 Mesh Restrictions.

NekRS like its predecessor Nek5000 requires unstructured conformal high-order hexahedral elements. That is the domain is discretized with high-order hexahedra, whose edges can employ a variety of curvature descriptions. The most general description of the edge curvature is a HEX20 representation where, in addition to the vertices of the hexahedron, the midpoints of each edge are provided. Only conformal meshes are supported, where the face of each hexahedron can be shared only between at most two elements and no hanging nodes are allowed. For complex geometries where a block-structured approach is difficult to design, two strategies are typically applied:

  1. Overset meshes, as discussed in Sec. 3;

  2. A tex-to-hex approach, where a tetrahedral mesh is first generated and then each tetrahedron is divided into four hexahedrons [35].

3 NekRS Simulation Capabilities

An important utility in Nek5000 that has recently been ported to NekRS is the findpts() library, which allows for fast, robust, general-purpose, and scalable interpolation of data fields. The user interface for this library entails specification of one or more query points x¯*=[x1*xn**]T on each MPI rank, where n* is rank dependent, followed by a call to findpts(), which returns the list of MPI ranks, element numbers, and local coordinates ri*Ω̂ that correspond to the query point set, x*. Subsequent calls to findpts_eval(u) will return to the querying processor(s) values of a prescribed scalar or vector field u(x) at x¯*. Knowledge about intermediate data (e.g., the processor or element number Ωe that contains xi*) is not required by either the user or developer to retrieve a set of values. findpts() uses hash tables to rapidly identify candidate elements/processors followed by a Newton iteration to find the closest point in Ω for each xi*. All interpolation is fully Nth-order using the underlying SEM basis and all communication requires only O(logP) time. findpts() is central to several developments in Nek5000/RS, including Lagrangian particle tracking and support for overset grids.

Particle Tracking. NekRS currently supports one-way Lagrangian particle tracking (e.g., as in Fig. 17) to update particle positions given a function of the form f=dx*/dt, where the user provides f that is typically a function of the fluid velocity at x* and thus requires off-grid interpolation. Updates to the position and velocity are based on third-order Adams–Bashforth using third-order Runge–Kutta to bootstrap the process in the first few timesteps. For large (load balanced) particle counts it was shown in Ref. [12] that the particle update time on 60 NVIDIA V100 s of OLCF's Summit is equal to the Navier–Stokes solution time if the number of particles is 1/3 of the number of grid points in the flow simulation. Realization of this result requires the use of particle migration, where each particle is moved to the processor where the element containing the particle resides, so that interprocessor communication is not needed for (many) subsequent particle updates. Without migration, the update cost is about 10× larger. Through the ppiclf() library [36], Nek5000 supports two-way particle-fluid interaction and even four-way coupling that accounts for particle–particle interactions, and a NekRS port of these features is underway.

Nonconforming Overset Grids. One approach to simplifying the meshing process for complex geometries is to use overset grids that divide the domain into multiple, easier to mesh regions [3740]. Furthermore, dividing the domain into multiple meshes can simplify simulations that have moving components by meshing individual parts that only move relative to the other subdomains. Overset grids can be viewed as a Schwarz overlapping method applied to a single step of the full Navier–Stokes equations in which the solution in each subdomain is advanced independently, with boundary conditions in the overlapping regions interpolated from the solution in the overlapping neighbors at the previous time-step. This straightforward approach provides O(Δt) accuracy that can be improved to higher order using predictor–corrector iterations [41].

Implementation of an overlapped mesh scheme requires interpolating the fluid velocity and any scalars, such as temperature, at each time-step, and multiple times per step if correction iterations are added. Additionally, changes to the mesh during the simulation, such as for moving machinery, requires recomputing the interpolation points at every time-step. Thus, using overlapped meshes requires scalable interpolation routines.

The overset grid support requires resolution of several other technical issues relating to mass conservation, timestepping, and parallelism which have been addressed in Nek5000/RS [41,42] and in earlier efforts by Merrill et al. [40,43]. For a GPU-based NS solver, however, it is also critical to have fast interpolation on the GPU, such as provided by the routines described in Ref. [12].

3.1 Example Validation Study for Turbulent Mixing.

NekRS has been extensively validated for a variety of flows. The present section comprehensively studies turbulent mixing occurring in a large enclosure. We perform LES using NekRS with reference to the Michigan Multi-Jet Gas-Mixture Dome (MiGaDome) [44], a 1/12th scaled high-temperature gas reactor experiment developed at the University of Michigan. Here we report a validation study for a single-jet injection [45]. Once the numerical model is proven accurate, we briefly investigate the discharge configurations leading to jet instabilities. Hence, the proposed numerical campaign provides valuable physical insights for understanding mixing effects occurring in such a configuration type.

3.1.1 Numerical Model and Flow Conditions.

The MiGaDome facility operates with gas and aims to investigate the mixing effects of turbulent jets in an enclosed dome. The experimental data are obtained with time-averaged particle image velocimetry such that the test section is manufactured as a single body of clear cast acrylic, which provides optical clearance to the interior volume of the fluid.

Figure 3 provides an overview of the setup. First, Fig. 3(a) presents the geometry of the test section, which includes six inlet pipes and 12 others in the periphery as outlets. Following that, Fig. 3(b) shows the computational mesh, which contains approximately 435,000 hexahedral elements developed using the open-source mesh code Gmsh [46]. Given the high resolution of the mesh, the GLL discretization is only detailed in the jet region to improve the quality of the image.

Fig. 3
The geometry and the mesh considered in the LES of MiGaDome: (a) geometry of the MiGaDome experiment, the facility can operate with multiple jet configurations and flow conditions and (b) the mesh employed for simulating MiGaDome experiments
Fig. 3
The geometry and the mesh considered in the LES of MiGaDome: (a) geometry of the MiGaDome experiment, the facility can operate with multiple jet configurations and flow conditions and (b) the mesh employed for simulating MiGaDome experiments
Close modal

Dimensionless LES has been performed on the OLCF Summit supercomputer for different jet configurations. The jets are defined by Reynolds numbers based on their bulk velocity and the inlet pipe diameter. The simulations ensure a fully developed turbulent incoming jet by employing a recycling boundary condition that maps the velocity field after 10D downstream to the inlet pipe back to its start.

The model has been validated for a single jet case at Re=4097 [45]. Simulations were performed for this case with the mesh from Fig. 3(b) by employing polynomials up to the sixth-order, i.e., N =7, and approximately a total of 150 × 106 of grid points. This configuration ensures at least one point near walls at y+<1 and five within y+<10 [47]. Besides that, it resolves the Taylor microscale everywhere in the domain. At this scale, viscosity significantly affects the dynamics of turbulent eddies in the flow. For this reason, this criterion is often recommended for designing meshes for LES [48].

Further, the developed model is employed for testing multiple jet configurations to scrutinize jet instabilities. While different discharging schemes have been exploited, the configuration of three jets evenly distributed in the azimuthal direction is reported here as a representative case. In this setup, the jets form an angle of 120deg with its neighbor at fixed Reynolds numbers of Re=4097.

3.1.2 Results.

Figure 4 presents the instantaneous velocity field as well as select comparisons with experimental data for the single jet injection case [44]. In Fig. 4(a), one can visualize the shear layer as the jet evolves upward. Further, the vectors shown in this figure stem from the mean velocity field and allow for identifying flow recirculation patterns. As will be discussed later, identifying these flow patterns is paramount to explaining the stability of jets when dealing with multiple discharge configurations.

Fig. 4
Simulations of single jet experiment in MiGaDome [45]: (a) instantaneous velocity magnitude and (b) averages ⟨v⟩ and root mean squares (RMS) vrms of the upward velocity component v as well as the shear component ⟨uv⟩. The maximum velocity of the first profile Vm is employed to normalize the statistical quantities.
Fig. 4
Simulations of single jet experiment in MiGaDome [45]: (a) instantaneous velocity magnitude and (b) averages ⟨v⟩ and root mean squares (RMS) vrms of the upward velocity component v as well as the shear component ⟨uv⟩. The maximum velocity of the first profile Vm is employed to normalize the statistical quantities.
Close modal

As part of a validation study, first- and second-order statistics are reported in Fig. 4(b) for both NekRS predictions and experimental measurements in the single jet case. Line profiles over different heights are compared along the x direction with the origin at the jet centerline (i.e., x =0). Comparisons are only provided for half of the jet while the flow is symmetric in the x direction, and spatial averages were also computed.

The reported results show that the jet profile development generally agrees between experiment measurements and LES results. Different polynomial orders were considered when performing the LES to demonstrate mesh convergence. It should be noted that Ref. [45] presents additional comparisons with experiments as part of another validation study of NekRS on turbulent mixing. Besides MiGaDome, this work also compares LES predictions considering a different scaled-down high-temperature gas reactor facility from Texas A&M [49].

As a complementary study, the triple jet injection case showcases that the flow patterns previously discussed in the validation study (Fig. 4(a)) potentially lead to the formation of jet instabilities. Figure 5 shows this fact by presenting the magnitude velocity field at two different instants.

Fig. 5
Triple jet injection: instabilities occur due to flow pattern interactions in the enclosure: (a) instantaneous velocity magnitude at instant t0 and (b) instantaneous velocity magnitude 20 convective units after t0
Fig. 5
Triple jet injection: instabilities occur due to flow pattern interactions in the enclosure: (a) instantaneous velocity magnitude at instant t0 and (b) instantaneous velocity magnitude 20 convective units after t0
Close modal

One clearly observes that the turbulent jets in Fig. 5(b) are suppressed compared to their previous state from Fig. 5(a). This fact is largely explained by the interactions among the recirculation zones induced by each of the jets. Beyond that the same complex flow interactions also contribute to improving mixing across the hemisphere. Such an enhancement is obvious when comparing the central region of the dome between Fig. 4(a) (single jet case) and plots shown in Fig. 5 on x =0 planes.

Interestingly, the above mentioned effects are even more intense when all six inlet pipes are employed, i.e., in a full injection scheme. Ultimately, these analyses show that LES can provide valuable insights into the physical process parallel to experiments. As a continuous validation process, future endeavor aims at characterizing and comparing the flow instabilities predicted by the numerical model with experiments.

4 Nuclear Energy Applications

This section discusses our collaborative efforts between ECP's Exascale small modular reactor (ExaSMR) and Co-design Center for Efficient Exascale Discretizations teams. We present simulation capabilities for ExaSMR geometries and performance analysis for neutronics and thermal hydraulics on pre-exascale and exascale systems using ExaSMR's Exascale Nuclear Reactor Investigative COde (ENRICO) platform consisting of OpenMC [50], Shift [51], and NekRS. It is noteworthy to highlight that both OpenMC and Shift now offer support for GPU-based calculations. The coupled capabilities established in ENRICO are strategically aligned with the evolving landscape of GPU-accelerated supercomputer platforms.

4.1 Modeling Effects of Mixing Vanes Through Momentum Sources.

In light water nuclear reactors, fuel rods are generally arranged in triangular or square patterns and supported by spacer grids to prevent vibrations and maintain stability. Mixing vanes, specifically split-type vanes, are employed to enhance turbulence mixing, thereby improving convective heat transfer from the fuel rod surface to the coolant. The complex flow structures generated by these spacer grids and mixing vanes have a profound impact on the performance of the coolant flow. This research aims to develop a reduced-order modeling approach that can effectively capture the macroscopic effects of spacer grid and mixing vanes on the coolant flow. By implementing novel three-dimensional momentum sources, the approach simulates flow swirling, intersubchannel mixing, and pressure loss, while significantly reducing computational costs and enabling larger domain sizes to be studied. The accuracy of the reduced-order model is evaluated by comparing its results with those obtained from reference LESs for a specific pin bundle geometry. The outcome of this research will facilitate the utilization of advanced computational fluid dynamics (CFD) techniques and supercomputing capabilities to tackle grand challenges in nuclear engineering, such as studying the behavior of neutronics and thermal-hydraulics in large-scale systems.

The LES simulations were conducted for a 5 × 5 fuel rod bundle geometry to investigate the effect of spacer grid and mixing vanes on the coolant flow. The explicit representation of detailed geometric structures, including the spacer grid and mixing vanes, was incorporated in the LES model (as shown in Fig. 6). Four Reynolds numbers were considered: 10,000, 20,000, 40,000, and 80,000. It is noteworthy that the target Reynolds number for the small modular reactor (SMR) core is approximately 78,300, based on the NuScale SMR design specifications [52]. The obtained LES results provide valuable information on the instantaneous velocity fields, showcasing the significant flow mixing induced by the presence of the mixing vanes. Time-averaged first- and second-order flow quantities were extracted from all cases. The swirling and mixing factors, along with the pressure drop, were derived from the LES solutions to serve as reference data for calibrating the corresponding RANS momentum sources.

Fig. 6
A 5 × 5 fuel rod bundle and the LES velocity field solutions showing strong flow mixing induced by mixing vanes grid
Fig. 6
A 5 × 5 fuel rod bundle and the LES velocity field solutions showing strong flow mixing induced by mixing vanes grid
Close modal

The efficacy of the momentum sources was first evaluated by their implementation and testing in the 2 × 2 bundle configuration, where they successfully replicated the time-averaged macroscale flow characteristics, including lateral flow mixing and streamwise pressure drop induced by the spacer grid and mixing vanes. The application of momentum sources near the inlet induced a shift in the streamline pattern from parallel to swirling, as depicted in Fig. 7. Quantitative measures such as swirling and mixing factors, as well as the pressure drop, exhibited excellent agreement with the reference data. Furthermore, the momentum sources were further implemented in pin-resolved CFD simulations of a full SMR core (shown in Fig. 8). In this larger-scale simulation, the momentum sources induced significant cross flow in the core, consistent with the findings in the 2 × 2 bundle simulations. The magnitudes of the cross flows in both directions were substantial, representing a high percentage of the bulk flow and playing a crucial role in heat transfer and mixing processes. A detailed view of the cross flow pattern at five hydraulic diameter downstream is illustrated in Fig. 9. For more comprehensive details regarding the implementation and rigorous verification of the momentum source model, interested readers are encouraged to refer to the work by Fang et al. [53]. This study provides in-depth insights into the effectiveness of the momentum source model and its ability to accurately capture the complex flow behavior in the presence of spacer grids and mixing vanes.

Fig. 7
Streamlines from the RANS simulations revealing the lateral flow motions induced by prescribed momentum sources
Fig. 7
Streamlines from the RANS simulations revealing the lateral flow motions induced by prescribed momentum sources
Close modal
Fig. 8
Cross flows generated by the momentum sources in the full core SMR simulations
Fig. 8
Cross flows generated by the momentum sources in the full core SMR simulations
Close modal
Fig. 9
Detail of cross flows in full core simulation. (Left) velocity in direction y. (Right) velocity in direction x.
Fig. 9
Detail of cross flows in full core simulation. (Left) velocity in direction y. (Right) velocity in direction x.
Close modal

4.2 Coupling With Monte Carlo Neutronics Code OpenMC Within Exascale Nuclear Reactor Investigative Code.

The ENRICO is designed to drive automated multiphysics simulations of reactors. It achieves this by coupling a neutronics code with a thermal-hydraulics (TH) code [52]. The neutronics code is used to solve the steady-state neutron transport equation and determine the energy deposition distribution, which serves as the source term for the TH code. The TH code then solves equations for heat transfer and fluid dynamics, determining the temperature and density distributions, which are then fed back into the neutronics code to update the model since neutron cross sections are dependent on temperature and density. To solve for the different physics fields, ENRICO employs a nonlinear fixed-point iteration, also known as Picard iteration. This approach has been demonstrated to be effective in previous investigations [54], especially when under-relaxation is used. The individual physics solvers are run sequentially, and the convergence is reached when the L1, L2, or L norm of the temperature change of two consecutive iterations is less than the prescribed tolerance. ENRICO has been designed to run efficiently on distributed-memory clusters. By running each physics solver on independent nodes, ENRICO eliminates memory constraints and ensures efficient performance. The solvers that have been integrated into ENRICO include the OpenMC and Shift MC codes for neutronics simulations, and the spectral element CFD codes Nek5000 and GPU-oriented NekRS for heat transfer and fluid dynamics simulations. It is worthwhile to mention that ENRICO was developed to be agnostic to the underlying physics codes, which can be expanded to support other neutronics/TH software.

Exascale Nuclear Reactor Investigative Code was originally developed using Nek5000 as the TH solver. The integration of NekRS CFD solver into the ENRICO framework represents a significant advancement in thermal-fluid calculations, particularly in terms of accelerating simulations on GPU-based leadership class supercomputers such as Summit and Aurora. To fully capitalize on the potential performance boost, a thorough benchmark study is essential to ensure that the new OpenMC-NekRS solver delivers consistent results with the original OpenMC-Nek5000 solver. To this end, a benchmark study was conducted using test cases comprising a single rod model and an assembly model. The study compared key quantities of interest (QoI) from both solvers, including the coolant temperature distributions and the effective reactivity coefficient (keff). The long rod case is based on a model previously reported in a milestone report [55]. To ensure a fair comparison, an identical set of input parameters was used for both OpenMC-NekRS and OpenMC-Nek5000. The benchmarking analysis focused on the ENRICO results after ten Picard iterations, enabling a comprehensive evaluation of the solver performance and its consistency with respect to the key thermal-fluid parameters.

Figure 10 presents the comparison of temperature distributions obtained after ten Picard iterations between the new OpenMC-NekRS solver and the original OpenMC-Nek5000 solver. Specifically, Fig. 10(a) focuses on the axial temperature profile, where the results from both solvers exhibit an extraordinary level of agreement, with a maximum deviation of just 0.02%. To further validate the accuracy of the simulations, radial temperature profiles were extracted at a height of z =100 cm. The comparison reveals excellent conformity, with maximum deviations of 0.50% inside the fuel pin and a mere 0.01% in the coolant flow. These results serve as compelling evidence that the new ENRICO solver, based on NekRS, consistently produces simulation outcomes that are in close agreement with the original Nek5000-based version. The benchmark study successfully confirms the reliability and accuracy of the enhanced OpenMC-NekRS solver within the ENRICO framework.

Fig. 10
Comparison of temperature distributions from the long rod case with both OpenMC–NekRS and OpenMC–Nek5000 solvers: (a) axial temperature profile in the long rod and (b) radial temperature profile in the long rod at z = 100 cm
Fig. 10
Comparison of temperature distributions from the long rod case with both OpenMC–NekRS and OpenMC–Nek5000 solvers: (a) axial temperature profile in the long rod and (b) radial temperature profile in the long rod at z = 100 cm
Close modal

In addition to the single rod model, benchmarking tests were conducted on a more complex 17 × 17 fuel rod assembly model [56]. As with the previous cases, both the OpenMC-NekRS and OpenMC-Nek5000 simulations were provided with the same set of input parameters. The coupled simulation ran for seven Picard iterations, and the resulting solution fields were examined, including the velocity and temperature fields extracted at a height of z = 100 cm, as shown in Fig. 11. To thoroughly assess the consistency and reliability of the new OpenMC-NekRS solver, key QoIs were selected for comparison, as presented in Table 1. The results obtained from the NekRS based ENRICO closely align with those from the original Nek5000 based ENRICO, indicating an excellent agreement and further affirming the reliability of the OpenMC-NekRS solver. These benchmarking tests reinforce the confidence in utilizing the enhanced ENRICO framework with NekRS, offering a powerful and reliable solution for multiphysics simulations in complex nuclear reactor assemblies.

Fig. 11
Cross-sectional view of assembly solutions at z = 100 cm from OpenMC–NekRS simulations: (a) velocity field within the assembly and (b) temperature field within the assembly
Fig. 11
Cross-sectional view of assembly solutions at z = 100 cm from OpenMC–NekRS simulations: (a) velocity field within the assembly and (b) temperature field within the assembly
Close modal
Table 1

Comparison of QoI from benchmarking tests with the assembly model

QoIOpenMC–NekRSOpenMC–Nek5000Relative deviation
Peak fuel temperature1048.361050.000.16%
Peak coolant temperature595.98597.300.22%
keff1.16691.16700.01%
QoIOpenMC–NekRSOpenMC–Nek5000Relative deviation
Peak fuel temperature1048.361050.000.16%
Peak coolant temperature595.98597.300.22%
keff1.16691.16700.01%

4.3 Coupled NekRS-OpenMC Simulations of Small Modular Reactor Full Core.

After the successful verification of the NekRS-OpenMC coupling capability, a comprehensive simulation of an SMR full core was conducted on the Summit supercomputer at OLCF. Each simulation job utilized 300 Summit nodes, with a polynomial order of N =3 for NekRS. During each Picard iteration, the predominant share of simulation time was allocated to NekRS calculations. On average, the OpenMC session utilized approximately 54 min within the total 360-minute runtime, constituting only 15% of the overall computational expense. During each Picard iteration, NekRS executed 10,000 time steps with a time-step size of 0.5 ms. Overall, a total of ten Picard iterations were simulated. To assess the convergence of the Picard iterations, Fig. 12 depicts the maximum temperature change between iterations. The plot clearly indicates that convergence is achieved after four iterations, demonstrating the stability and efficiency of the iterative process.

Fig. 12
L∞ norm of the difference between temperatures at two successive Picard iterations
Fig. 12
L∞ norm of the difference between temperatures at two successive Picard iterations
Close modal

Figure 13 presents the radial temperature distribution across the entire SMR reactor core at the axial midplane (z =100 cm). The plot reveals distinct radial rings within each fuel pin, depicting the transitions from the fuel region to the gap and cladding. In this comprehensive simulation, a total power of 160 MW is deposited in the full core model, resulting in a temperature rise of 56 K (100.8 F) in the coolant flow, which closely aligns with the design specifications of NuScale (ΔT = 100 F). Notably, the peak fuel and coolant temperatures observed are 1212.0 K and 587.7 K, respectively. It is important to note that the turbulence modeling options employed in this study are intentionally simplistic. As part of ongoing efforts to enhance the simulation accuracy, additional simulations are planned with more sophisticated turbulence modeling and higher resolution to evaluate and quantify the uncertainty associated with the current results. To further improve the resolution of the SMR system response, a natural circulation model has been developed and integrated into the full core simulations. For a more comprehensive understanding of this aspect, interested readers are encouraged to refer to the work by Fang et al. [57], which provides in-depth details on this development and its implications. Details on the performance are provided in Ref. [58].

Fig. 13
Radial temperature distribution (z = 100 cm) from coupled OpenMC–NekRS simulations of the full-core model
Fig. 13
Radial temperature distribution (z = 100 cm) from coupled OpenMC–NekRS simulations of the full-core model
Close modal

4.4 Hybrid Reynolds-Averaged Navier–Stokes/Large Eddy Simulation Modeling.

The RANS/LES hybrid approach is a novel and promising strategy that has been introduced to enhance the simulation of a nuclear reactor core. In this approach, the core is divided into different regions or zones, with a portion of the core simulated using LES while the remaining regions are handled by RANS modeling. This zonal hybrid strategy aims to combine the strengths of both approaches, leveraging the accuracy and capability of LES in resolving turbulent flow features in specific regions, while still benefiting from the computational efficiency of RANS for the majority of the core. The preliminary results obtained for a subset of the reactor using this approach have shown great promise, motivating further investigation and development to ultimately improve the full core simulation without significantly increasing computational overheads. The project has pursued two parallel methods: the first involves modeling the hybrid RANS-LES using a multizonal mesh, and the second incorporates RANS and LES coupling through overlapping meshes, utilizing the overset mesh (also known as “neknek”) feature to achieve this ambitious goal. As an illustrative example, a relatively simple 5 × 5 rod bundle is considered, and the computational grids for both the multizonal and overlapping mesh approaches are depicted in Figs. 14 and 15, respectively. The elemental mesh resolution is uniform in the z-direction and nonuniform in the x-direction, clustering elements in the middle to ensure sufficient resolution for the LES modeling area. These grid representations highlight the flexibility and versatility of the hybrid approach, showcasing the potential to effectively optimize computational resources while obtaining high-fidelity simulation results for complex reactor geometries. A demonstration of the capability is shown in Fig. 16 demonstrating a solution with RANS and LES characteristics.

Fig. 14
Overset mesh cross section of a 5 × 5 bundle at polynomial order 3
Fig. 14
Overset mesh cross section of a 5 × 5 bundle at polynomial order 3
Close modal
Fig. 15
Cross sections of overlapping meshes for the same bundle at polynomial order 3: (a) 3 × 3 subchannel for LES (b) 5 × 5 subchannel for RANS, and (c) overlapped subchannels
Fig. 15
Cross sections of overlapping meshes for the same bundle at polynomial order 3: (a) 3 × 3 subchannel for LES (b) 5 × 5 subchannel for RANS, and (c) overlapped subchannels
Close modal
Fig. 16
Demonstration of zonal hybrid approach for a rod bundle case (5 × 5). Instantaneous velocity magnitude on a streamwise cross section.
Fig. 16
Demonstration of zonal hybrid approach for a rod bundle case (5 × 5). Instantaneous velocity magnitude on a streamwise cross section.
Close modal

4.5 Additional Nuclear Energy Simulations.

The advances in performance achieved by ExaSMR have led to the adoption of NekRS for a variety of other nuclear engineering simulations. Particularly notable have been the advances in the simulation of pebble bed reactors which have been scaled to full core large eddy simulations including 350,000 pebbles, while previously reported simulations achieved of the order of hundreds to thousand of pebbles [59,60]. We note that pebble bed reactors operate at high Reynolds numbers compared to other applications involving packed beds. As the flow is highly turbulent, the creeping flow assumption is not valid for the conditions typically encountered in nuclear pebble beds.

The simulations have been used to elucidate flow dynamics in cylindrical packed beds, which are characterized by a nonhomogeneous porosity distribution that peak near the walls of the cylinder. The simulations have demonstrated that the form constant in the pressure drop depends on the porosity and the dependency follows closely metrics that are based on inertial effects [61,62]. A first-of-its-kind correlation for the form loss has been generated that can accurately predict the velocity distribution in the bed. We note that the wall-region is very important in pebble bed reactors for a variety of reasons including bypass flow and localized power peaking.

5 Wind Energy Applications

Through a collaboration between the ECP's ExaWind project at the National Renewable Energy Laboratory and the Center for Efficient Exascale Discretizations, the Nek5000/RS team has recently developed wall-modeled large-eddy simulation approaches for atmospheric boundary layers [10,13]. Such a capability is critical for applications such as wind farm analysis and dispersive transport in urban canyons. Here, we discuss the governing equations for wall-modeled large-eddy simulation in Nek5000/RS along with new subgrid-scale (SGS) turbulence modeling approaches [10]. We also present convergence and performance results, demonstrating that NekRS provides high-order convergence and that it outperforms low-order codes on various advanced architectures [13].

5.1 Large Eddy Simulation for Atmospheric Boundary Layer Flows.

Atmospheric boundary layer flows are characterized by turbulent mixing, vertical diffusion, vertical and horizontal exchange of heat, etc., over a large range of scales and thus present significant challenges for simulation codes, even at exascale. To quantify the effects of numerical modeling and discretization choices, the atmospheric boundary layer community has setup a sequence of benchmark problems, including the Global Energy and Water Cycle Experiment Atmospheric Boundary Layer Study (GABLS) [63]. These benchmarks represent the atmospheric boundary layer in regional and large-scale atmospheric models and are considered important for improved modeling in the study of wind-energy, climate, and weather on all scales [64].

For the atmospheric LES, we solve the incompressible NS and potential temperature equations in a spatially filtered resolved-scale formulation in nondimensional form. Appropriate source terms are added for the Coriolis acceleration and buoyancy force.

In the case of the high-pass filter model discussed below, the forcing term would also include a damping term of the form χHPF(ui), where χ is an order-unity multiplier and HPF is a spatial filter function that allows grid-scale fluctuations of the argument to pass through so that their amplitude is damped in time.

The stress tensors in the momentum and energy equations, τij and τθj, respectively, include SGS turbulent modeling terms
(15)
(16)

where Re is the Reynolds number, Pe is the Peclet number, Sij is the resolved-scale strain-rate tensor, and τijsgs and τθjsgs are the SGS stress tensors that require modeling.

5.2 Global Energy and Water Cycle Experiment Atmospheric Boundary Layer Study Benchmark.

We consider the GABLS benchmark [63], illustrated in Fig. 17, which is a well-documented stably stratified flow problem, having the ground temperature being cooler than the air temperature with continued cooling over the simulation duration. The computational domain is defined on Ω=Lx×Ly×Lz= 400 m × 400 m × 400 m with the streamwise direction in x, the spanwise direction in y, and the vertical direction in z. We define an initial condition at t =0 with constant velocity in the streamwise direction equal to geostrophic wind speed of U =8 m/s. The initial potential temperature is 265 K in 0z100 m and linearly increased at a rate of 0.01 K/m in 100 m z400 m, with the reference potential temperature of 263.5 K. An initial perturbation is added to the temperature with an amplitude of 0.1 K on the potential temperature field for 0z50 m.

Fig. 17
NekRS simulation for the atmospheric boundary layer flows demonstrating the potential temperature on the vertical planes and the streamwise velocity on the horizontal plane. The total number of grid points is n = 134 M with E=643 and N = 8. The simulation is performed on OLCF's Summit using ten nodes with 60 NVIDIA V100 GPUs. (Simulation by Lindquist et al. [12]).
Fig. 17
NekRS simulation for the atmospheric boundary layer flows demonstrating the potential temperature on the vertical planes and the streamwise velocity on the horizontal plane. The total number of grid points is n = 134 M with E=643 and N = 8. The simulation is performed on OLCF's Summit using ten nodes with 60 NVIDIA V100 GPUs. (Simulation by Lindquist et al. [12]).
Close modal

The Reynolds number is Re=ULb/ν, where Lb = 100 m is the thickness of the initial thermal boundary layer and ν is the molecular viscosity. Under the stated conditions, Re50 M, which precludes direct numerical simulation wherein all turbulent scales are resolved.

We consider periodic boundary conditions (BCs) in the streamwise and spanwise directions. At the top boundary, (z =400 m), a stress-free, rigid lid is applied for momentum, and the heat flux for the energy equation is set consistent with the 0.01 K/m temperature gradient initially prescribed in the upper region of the flow. At the bottom boundary, we use a wall model in which traction BCs for the velocity. The specified shear stress comes from Monin–Obukhov similarity theory [65]. For the energy equation, a heat flux is applied that is derived from the same theory and a specified potential temperature difference between the flow at a height, z1, and the surface. For the traction BCs for the horizontal velocity components, we followed the approaches of Refs. [66] and [67] in the context of the log-law. The normal component of the velocity as zero and the traction BCs is imposed on the tangential velocity based on the horizontally averaged slip velocity that develops at the boundary. Further details regarding the wall model are provided in Ref. [10].

5.3 Subgrid-Scale Models.

We have explored several SGS modeling approaches for the GABLS problem including a mean-field eddy viscosity (MFEV) [68] for the anisotropic part of the stress tensor. The law of the wall is effected through the use of the MFEV concept and the approach originally used by Schumann [69] is used to convert the horizontally averaged traction to local values based on the local slip velocity in each of the horizontal directions. The isotropic part of the dissipation is provided using either a high-pass filter (HPF) [70] or a Smagorinsky (SMG) eddy viscosity [71] applied to the fluctuating strain-rate. Here, we present some of the basic components of these approaches and compare their convergence behavior. Additional thermal components required for these models are presented in Ref. [10].

For the HPF approach [70], which is not eddy-viscosity based, we have
(17)
where the angle brackets denote averaging over the horizontal directions (at each time-step) and νT is an average eddy viscosity, which is expressed in terms of mean flow quantities
(18)

Here, CK is a constant and CKLm is a mixing-length scale [10,68]. In the HPF model, isotropic fluctuations are damped by addition of the χHPF(ui) term to fi, as mentioned earlier.

In our second model, we still consider the SGS dissipation effected through a nonisotropic, MFEV, but an isotropic, fluctuating part, is taken into account through a SMG model based on the fluctuating strain rate. Our approach is extended from the SGS model of Ref. [68] based on the following expression:
(19)
where also here the angle brackets denote averaging over the homogeneous directions and νT is an average eddy viscosity which is expressed in terms of mean flow quantities. In Eq. (19), γ is an “isotropy factor,” which accounts for variability in the SGS constants due to anisotropy of the mean flow. In Ref. [68], the fluctuating eddy viscosity, νt, is obtained using an eddy viscosity model based on the SGS turbulent kinetic energy equation [69]
(20)

where S=2(SijSij)(SijSij) and Cs=(CkCk/Cϵ)1/2. The isotropy factor γ is obtained by γ=S/S+S with S=2SijSij (see Refs. [10] and [13] for further detail).

Figure 18 shows the horizontally averaged streamwise and spanwise velocities at t=7h at three different resolutions using n=1283,2563,5123 which correspond to the averaged grid spacing of Δx=3.12 m, 1.56 m, and 0.79 m, respectively. The top left shows relatively rapid convergence for MFEV+HPF, with a clearly converging profile at n=2563 that is very close to the one at n=5123. The results of MFEV+SMG, while not as converged at n=2563 as MFEV+HPF, shows profiles that are fairly close at all three resolutions, which implies that MFEV+SMG is perhaps more robust than MFEV+HPF. The third panel illustrates that both models converge to the same jet height, which is an important parameter for wind-farm modeling.

Fig. 18
Horizontally averaged streamwise and spanwise velocities at t=7 h with MFEV+HPF and MFEV+SMG using traction boundary conditions, demonstrating convergence at three different resolutions
Fig. 18
Horizontally averaged streamwise and spanwise velocities at t=7 h with MFEV+HPF and MFEV+SMG using traction boundary conditions, demonstrating convergence at three different resolutions
Close modal

In addition to grid-convergence, we also consider performance, which is a primary focus of exascale computing. Figure 19 shows NekRS strong-scaling results for the GABLS example on Summit. The simulations were run for 200 timesteps starting with the 5123 solution at t =6 h as the initial condition. Timings were averaged over the last 100 steps. The upper left panel shows the standard “time-per-step” versus number of V100 s, P, for resolutions of n=5123, 10243, and 20483. These curves are seen to nearly collapse into a single line when the data are replotted versus the number of points per GPU, n/P, seen in the upper right panel. From the parallel efficiency plot in the lower panel, we see that 80% efficiency is realized at n/P=:n0.82.2 M, which corresponds to a wall-clock time per step of 0.11 s per step. We reiterate the earlier remark that at the strong scale limit of n/P=2.2 M, the time per solution is essentially independent of problem size.

Fig. 19
NekRS strong-scaling performance on Summit for n=5123,10243,20483
Fig. 19
NekRS strong-scaling performance on Summit for n=5123,10243,20483
Close modal

6 Conclusions

This paper has presented the transformative enhancements in performance enabled by the development of NekRS, an open-source GPU-based fluid-thermal simulation code. NekRS has been developed with a focus on efficiency and scalability, leveraging a kth-order accurate timesplit formulation for time advancement and scalable iterative solvers for spatial discretization. The high-order SEM used in spatial discretization enables fast, low-memory, matrix-free operator evaluation, contributing to the overall performance of NekRS. We discussed novel capabilities within NekRS and provide a validation example for multijet experiment that has also provided valuable insights into complex flow structures. We then discussed large scale energy applications.

In the context of nuclear energy applications, the integration of NekRS into the ENRICO framework, which includes the OpenMC and Shift MC codes for neutronics simulations, and the spectral element CFD codes Nek5000 and NekRS for heat transfer and fluid dynamics simulations, has significantly advanced thermal-fluid calculations for nuclear reactors delivering simulations that were unthinkable only a few years ago. This has been possible by leveraging the power of GPU-based leadership class supercomputers such as Summit and Frontier. We have also developed a reduced-order modeling approach that effectively captures the macroscopic effects of spacer grid and mixing vanes on the coolant flow. This approach, which implements novel three-dimensional momentum sources, simulates flow swirling, intersubchannel mixing, and pressure loss. It significantly reduces computational costs and enables larger domain sizes to be studied, thereby enhancing the performance of NekRS in these applications. Furthermore, our work with NekRS in wall-modeled large-eddy simulations of atmospheric boundary layers for wind-energy have demonstrated the potential of NekRS in providing valuable insights into complex flow physics and their impact on performance.

In conclusion, the advances in the performance and capabilities of NekRS have demonstrated its potential in a range of applications, including nuclear and wind energy. Future work will continue to leverage these performance enhancements to further advance our understanding of complex flow physics in energy systems.

Acknowledgment

The research used resources at the Argonne Leadership Computing Facility and at the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Funding Data

  • This material is based upon work supported by the U.S. Department of Energy (DOE), Office of Science, under Contract No. DE-AC02-06CH11357 and by the Exascale Computing Project (ECP) 17-SC-20-SC; Funder ID: 10.13039/100000015.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

References

1.
Fischer
,
P.
,
Kerkemeier
,
S.
,
Min
,
M.
,
Lan
,
Y.-H.
,
Phillips
,
M.
,
Rathnayake
,
T.
,
Merzari
,
E.
,
Tomboulides
,
A.
,
Karakus
,
A.
,
Chalmers
,
N.
, and
Warburton
,
T.
,
2022
, “
NekRS, a GPU-Accelerated Spectral Element Navier–Stokes Solver
,”
Parallel Comput.
,
114
, p.
102982
.10.1016/j.parco.2022.102982
2.
Fischer
,
P.
,
Lottes
,
J.
, and
Kerkemeier
,
S.
,
2008
, “
Nek: Fast High-Order Scalable CFD
,” accessed Feb. 20, 2024, http://nek5000.mcs.anl.gov and https://github.com/nek5000/nek5000
3.
Patera
,
A.
,
1984
, “
A Spectral Element Method for Fluid Dynamics: Laminar Flow in a Channel Expansion
,”
J. Comput. Phys.
,
54
(
3
), pp.
468
488
.10.1016/0021-9991(84)90128-1
4.
Fischer
,
P.
,
Heisey
,
K.
, and
Min
,
M.
,
2015
, “
Scaling Limits for PDE-Based Simulation (Invited)
,”
AIAA
Paper No. 2015–3049.10.2514/6.2015-3049
5.
Min
,
M.
,
Lan
,
Y.-H.
,
Fischer
,
P.
,
Merzari
,
E.
,
Kerkemeier
,
S.
,
Phillips
,
M.
,
Rathnayake
,
T.
,
Novak
,
A.
,
Gaston
,
D.
,
Chalmers
,
N.
, and
Warburton
,
T.
,
2022
, “
Optimization of Full-Core Reactor Simulations on Summit
,”
SC22: International Conference for High Performance Computing, Networking, Storage and Analysis
, Dallas, TX, Nov. 13–18, pp.
1
11
.
6.
Min
,
M.
,
Lan
,
Y.
,
Fischer
,
P.
,
Rathnayake
,
T.
, and
Holmen
,
J.
,
2022
, “
Nek5000/RS Performance on Advanced GPU Architectures
,”
Argonne National Laboratory
,
Argonne, IL
, Technical Report No. ANL-22/81.
7.
Medina
,
D. S.
,
St-Cyr
,
A.
, and
Warburton
,
T.
,
2014
, “
OCCA: A Unified Approach to Multi-Threading Languages
,” e-print
arXiv:1403.0968
.10.48550/arXiv.1403.0968
8.
OCCA
,
2020
, “
OCCA: Lightweight Performance Portability Library
,” accessed Feb. 20, 2024, libocca.org
9.
Phillips
,
M.
,
Kerkemeier
,
S.
, and
Fischer
,
P.
,
2022
, “
Tuning Spectral Element Preconditioners for Parallel Scalability on GPUs
,”
Proceedings of the 2022 SIAM Conference on Parallel Processing for Scientific Computing
, Virtual, Feb. 23–26, pp.
37
48
.10.1137/1.9781611977141.4
10.
Min
,
M.
, and
Tomboulides
,
A.
,
2022
, “
Simulating Atmospheric Boundary Layer Turbulence With Nek5000/RS
,”
Argonne National Laboratory
,
Argonne, IL
, Technical Report No. ANL-22/79.
11.
Shaver
,
D. R.
,
Tomboulides
,
A.
,
Obabko
,
A.
,
Fang
,
J.
, and
Saini
,
N.
,
2023
, “
Demonstration of RANS Models With Wall Functions in the Spectral Element Code Nek5000
,”
Nucl. Eng. Des.
,
408
, p.
112302
.10.1016/j.nucengdes.2023.112302
12.
Lindquist
,
N.
,
Fischer
,
P.
, and
Min
,
M.
,
2021
, “
Scalable Interpolation on GPUs for Thermal Fluids Applications
,”
Argonne National Laboratory
,
Argonne, IL
, Technical Report No. ANL-21/55.
13.
Min
,
M.
,
Brazell
,
M.
,
Tomboulides
,
A.
,
Churchfield
,
M.
,
Fischer
,
P.
, and
Sprague
,
M.
,
2023
, “
Towards Exascale for Wind Energy Simulations
,” e-print
arXiv:2210.00904
.10.48550/arXiv.2210.00904
14.
Min
,
M.
,
Lan
,
Y.-H.
,
Fischer
,
P.
,
Rathnayake
,
T.
, and
Holmen
,
J.
,
2022
, “
Nek5000/RS Performance on Advanced GPU Architectures
,”
Argonne National Laboratory
,
Argonne, IL
, Technical Report No. ANL-22/81.
15.
Orszag
,
S.
, and
Kells
,
L.
,
1980
, “
Transition to Turbulence in Plane Poiseuille Flow and Plane Couette Flow
,”
J. Fluid Mech.
,
96
(
01
), pp.
159
205
.10.1017/S0022112080002066
16.
Orszag
,
S.
,
Israeli
,
M.
, and
Deville
,
M.
,
1986
, “
Boundary Conditions for Incompressible Flows
,”
J. Sci. Comput.
,
1
(
1
), pp.
75
111
.10.1007/BF01061454
17.
Tomboulides
,
A.
,
Israeli
,
M.
, and
Karniadakis
,
G.
,
1989
, “
Efficient Removal of Boundary-Divergence Errors in Time-Splitting Methods
,”
J. Sci. Comput.
,
4
(
3
), pp.
291
308
.10.1007/BF01061059
18.
Canuto
,
C.
,
Quarteroni
,
A.
,
Hussaini
,
M.
, and
Zang
,
T.
,
2007
,
Spectral Methods: Fundamentals in Single Domains
, Academic Press, Cambridge, MA
.
19.
Orszag
,
S.
,
1980
, “
Spectral Methods for Problems in Complex Geometry
,”
J. Comput. Phys.
,
37
(
1
), pp.
70
92
.10.1016/0021-9991(80)90005-4
20.
Deville
,
M.
,
Fischer
,
P.
, and
Mund
,
E.
,
2002
,
High-Order Methods for Incompressible Fluid Flow
,
Cambridge University Press
,
Cambridge, UK
.
21.
Pironneau
,
O.
,
1982
, “
On the Transport-Diffusion Algorithm and Its Applications to the Navier–Stokes Equations
,”
Numer. Math.
,
38
(
3
), pp.
309
332
.10.1007/BF01396435
22.
Maday
,
Y.
,
Patera
,
A. T.
, and
Rønquist
,
E. M.
,
1990
, “
An Operator-Integration-Factor Splitting Method for Time-Dependent Problems: Application to Incompressible Fluid Flow
,”
J. Sci. Comput.
,
5
(
4
), pp.
263
292
.10.1007/BF01063118
23.
Patel
,
S.
,
Fischer
,
P.
,
Min
,
M.
, and
Tomboulides
,
A.
,
2019
, “
A Characteristic-Based, Spectral Element Method for Moving-Domain Problems
,”
J. Sci. Comput.
,
79
(
1
), pp.
564
592
.10.1007/s10915-018-0876-6
24.
Malm
,
J.
,
Schlatter
,
P.
,
Fischer
,
P.
, and
Henningson
,
D.
,
2013
, “
Stabilization of the Spectral-Element Method in Convection Dominated Flows by Recovery of Skew Symmetry
,”
J. Sci. Comput.
,
57
(
2
), pp.
254
277
.10.1007/s10915-013-9704-1
25.
Lottes
,
J. W.
, and
Fischer
,
P. F.
,
2005
, “
Hybrid Multigrid/Schwarz Algorithms for the Spectral Element Method
,”
J. Sci. Comput.
,
24
(
1
), pp.
45
78
.10.1007/s10915-004-4787-3
26.
Lottes
,
J.
,
2023
, “
Optimal Polynomial Smoothers for Multigrid v-Cycles
,”
Numerical Linear Algebra Applications
, 30(6), p. e2518.10.1002/nla.2518
27.
Canuto
,
C.
,
Gervasio
,
P.
, and
Quarteroni
,
A.
,
2010
, “
Finite-Element Preconditioning of G-NI Spectral Methods
,”
SIAM J. Sci. Comput.
,
31
(
6
), pp.
4422
4451
.10.1137/090746367
28.
Bello-Maldonado
,
P.
, and
Fischer
,
P.
,
2019
, “
Scalable Low-Order Finite Element Preconditioners for High-Order Spectral Element Poisson Solvers
,”
SIAM J. Sci. Comput.
,
41
(
5
), pp.
S2
S18
.10.1137/18M1194997
29.
Pazner
,
W.
,
2020
, “
Efficient Low-Order Refined Preconditioners for High-Order Matrix-Free Continuous and Discontinuous Galerkin Methods
,”
SIAM J. Sci. Comput.
,
42
(
5
), pp.
A3055
A3083
.10.1137/19M1282052
30.
Fehn
,
N.
,
Wall
,
W.
, and
Kronbichler
,
M.
,
2018
, “
Efficiency of High-Performance Discontinuous Galerkin Spectral Element Methods for Under-Resolved Turbulent Incompressible Flows
,”
Int. J. Numer. Methods Fluids
,
88
(
1
), pp.
32
54
.10.1002/fld.4511
31.
Fischer
,
P.
,
1997
, “
An Overlapping Schwarz Method for Spectral Element Solution of the Incompressible Navier–Stokes Equations
,”
J. Comput. Phys.
,
133
(
1
), pp.
84
101
.10.1006/jcph.1997.5651
32.
Fischer
,
P.
,
Schmitt
,
M.
, and
Tomboulides
,
A.
,
2017
, “
Recent Developments in Spectral Element Simulations of Moving-Domain Problems
,”
Recent Progress and Modern Challenges in Applied Mathematics, Modeling and Computational Science
(Fields Institute Communications, Vol.
79
),
Springer
,
New York
, pp.
213
244
.
33.
Fischer
,
P.
,
Loth
,
F.
,
Lee
,
S.
,
Smith
,
D.
,
Tufo
,
H.
, and
Bassiouny
,
H.
,
2005
, “
Parallel Simulation of High Reynolds Number Vascular Flows
,”
Proceedings of the Parallel Computational Fluid Dynamics 2005
, College Park, MD, May 24–27, pp.
219
226
.10.1016/B978-044452206-1/50026-4
34.
Obabko
,
A.
,
Fischer
,
P.
,
Tautges
,
T.
,
Goloviznin
,
V. M.
,
Zaytsev
,
M.
,
Chudanov
,
V.
,
Pervichko
,
V.
,
Aksenova
,
A.
, and
Karabasov
,
S.
,
2012
, “
Large Eddy Simulation of Thermo-Hydraulic Mixing in a T-Junction
,” Nuclear Reactor Thermal Hydraulics and Other Applications, BoD—Books on Demand, Elsevier, Amsterdam, The Netherlands.
35.
Yuan
,
H.
,
Yildiz
,
M. A.
,
Merzari
,
E.
,
Yu
,
Y.
,
Obabko
,
A.
,
Botha
,
G.
,
Busco
,
G.
,
Hassan
,
Y. A.
, and
Nguyen
,
D. T.
,
2020
, “
Spectral Element Applications in Complex Nuclear Reactor Geometries: Tet-to-Hex Meshing
,”
Nucl. Eng. Des.
,
357
, p.
110422
.10.1016/j.nucengdes.2019.110422
36.
Zwick
,
D.
, and
Balachandar
,
S.
,
2020
, “
A Scalable Euler-Lagrange Approach for Multiphase Flow Simulation on Spectral Elements
,”
IJHPCA
,
34
(
3
), pp.
316
339
.10.1177/1094342019867756
37.
Steger
,
J. L.
,
Dougherty
,
F. C.
, and
Benek
,
J. A.
,
1983
, “
A Chimera Grid Scheme. [Multiple Overset Body-Conforming Mesh System for Finite Difference Adaptation to Complex Aircraft Configurations]
,” Association for Computing Machinery, New York.
38.
Bassetti
,
F.
,
Brown
,
D.
,
Davis
,
K.
,
Henshaw
,
W.
, and
Quinlan
,
D.
,
1998
, “
OVERTURE: An Object-Oriented Framework for High-Performance Scientific Computing
,”
SC'98: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing
, Orlando, FL, Nov. 7–13, p.
14
.10.1109/SC.1998.10013
39.
Peet
,
Y.
, and
Fischer
,
P.
,
2012
, “
Stability Analysis of Interface Temporal Discretization in Grid Overlapping Methods
,”
SIAM J. Numer. Anal.
,
50
(
6
), pp.
3375
3401
.10.1137/110831234
40.
Merrill
,
B.
,
Peet
,
Y.
,
Fischer
,
P.
, and
Lottes
,
J.
,
2016
, “
A Spectrally Accurate Method for Overlapping Grid Solution of Incompressible Navier–Stokes Equations
,”
J. Comput. Phys.
,
307
, pp.
60
93
.10.1016/j.jcp.2015.11.057
41.
Mittal
,
K.
,
Dutta
,
S.
, and
Fischer
,
P.
,
2019
, “
Nonconforming Schwarz-Spectral Element Methods for Incompressible Flow
,”
Comput. Fluids
,
191
, p.
104237
.10.1016/j.compfluid.2019.104237
42.
Mittal
,
K.
,
Dutta
,
S.
, and
Fischer
,
P.
,
2020
, “
Multirate Time-Stepping for the Incompressible Navier–Stokes Equations in Overlapping Grids
,”
J. Comput. Phys.
,
437
, p.
110335
.10.1016/j.jcp.2021.110335
43.
Merrill
,
B.
, and
Peet
,
Y.
,
2019
, “
Moving Overlapping Grid Methodology of Spectral Accuracy for Incompressible Flow Solutions Around Rigid Bodies in Motion
,”
J. Comput. Phys.
,
390
, pp.
121
151
.10.1016/j.jcp.2019.01.048
44.
Mao
,
J.
,
Che
,
S.
,
Petrov
,
V.
, and
Manera
,
A.
,
2023
, “
Benchmark Experiments for Turbulent Mixing in the Scaled-Down Upper Plenum of High-Temperature Gas-Cooled Reactors Under Accident Scenario
,”
Nucl. Sci. Eng.
, pp.
1
15
.10.1080/00295639.2023.2186726
45.
Leite
,
V. C.
,
Merzari
,
E.
,
Mao
,
J.
,
Petrov
,
V.
, and
Manera
,
A.
,
2023
, “
High-Fidelity Simulation of Mixing Phenomena in Large Enclosures
,”
Nucl. Sci. Eng.
, pp.
1
18
.10.1080/00295639.2023.2186159
46.
Geuzaine
,
C.
, and
Remacle
,
J.-F.
,
2009
, “
Gmsh: A 3D Finite Element Mesh Generator With Built-in Pre- and Post-Processing Facilities
,”
Int. J. Numer. Methods Eng.
,
79
(
11
), pp.
1309
1331
.10.1002/nme.2579
47.
Pope
,
S. B.
,
2000
,
Turbulent Flows
,
Cambridge University Press
,
Cambridge, UK
.
48.
Addad
,
Y.
,
Gaitonde
,
U.
,
Laurence
,
D.
, and
Rolfo
,
S.
,
2008
, “
Optimal Unstructured Meshing for Large Eddy Simulations
,”
Quality and Reliability of Large-Eddy Simulations
,
Springer
,
Dordrecht, The Netherlands
, pp.
93
103
.
49.
Alwafi
,
A.
,
Nguyen
,
T.
,
Hassan
,
Y.
, and
Anand
,
N.
,
2019
, “
Time-Resolved Particle Image Velocimetry Measurements of a Single Impinging Jet in the Upper Plenum of a Scaled Facility of High Temperature Gas-Cooled Reactors
,”
Int. J. Heat Fluid Flow
,
76
, pp.
113
129
.10.1016/j.ijheatfluidflow.2019.02.003
50.
Romano
,
P. K.
,
Horelik
,
N. E.
,
Herman
,
B. R.
,
Nelson
,
A. G.
,
Forget
,
B.
, and
Smith
,
K.
,
2015
, “
OpenMC: A State-of-the-Art Monte Carlo Code for Research and Development
,”
Ann. Nucl. Energy
,
82
, pp.
90
97
.10.1016/j.anucene.2014.07.048
51.
Pandya
,
T. M.
,
Johnson
,
S. R.
,
Evans
,
T. M.
,
Davidson
,
G. G.
,
Hamilton
,
S. P.
, and
Godfrey
,
A. T.
,
2016
, “
Implementation, Capabilities, and Benchmarking of Shift, a Massively Parallel Monte Carlo Radiation Transport Code
,”
J. Comput. Phys.
,
308
, pp.
239
272
.10.1016/j.jcp.2015.12.037
52.
Romano
,
P. K.
,
Hamilton
,
S. P.
,
Rahaman
,
R. O.
,
Novak
,
A.
,
Merzari
,
E.
,
Harper
,
S. M.
,
Shriwise
,
P. C.
, and
Evans
,
T. M.
,
2020
, “
A Code-Agnostic Driver Application for Coupled Neutronics and Thermal-Hydraulic Simulations
,”
Nucl. Sci. Eng.
,
195
(
4
), pp.
1
21
.10.1080/00295639.2020.1830620
53.
Fang
,
J.
,
Shaver
,
D.
,
Tomboulides
,
A.
,
Min
,
M.
,
Fischer
,
P.
,
Lan
,
Y.
,
Rahaman
,
R.
,
Romano
,
P.
,
Benhamadouche
,
S.
,
Hassan
,
Y.
,
Kraus
,
A.
, and
Merzari
,
E.
,
2021
, “
Feasibility of Full-Core Pin Resolved CFD Simulations of Small Modular Reactor With Momentum Sources
,”
Nucl. Eng. Des.
,
378
, p.
111143
.10.1016/j.nucengdes.2021.111143
54.
Gill
,
D. F.
,
Griesheimer
,
D. P.
, and
Aumiller
,
D. L.
,
2017
, “
Numerical Methods in Coupled Monte Carlo and Thermal-Hydraulic Calculations
,”
Nucl. Sci. Eng.
,
185
(
1
), pp.
194
205
.10.13182/NSE16-3
55.
Hamilton
,
S.
,
Evans
,
T.
,
Biondo
,
E.
,
Merzari
,
E.
,
Rahaman
,
R.
,
Romano
,
P.
, and
Harper
,
S.
,
2018
, “
Coupled Multiphysics Driver Implementation
,” Exascale Computing Project, Oak Ridge National Laboratory, Oak Ridge, TN, Milestone Report No. WBS 2.2.2.03 ECP-SE-08-61.
56.
Hamilton
,
S.
,
Biondo
,
E.
,
Romano
,
P.
,
Merzari
,
E.
,
Rahaman
,
R.
, and
Novak
,
A.
,
2019
, “
Coupled Assembly Analysis
,”
Exascale Computing Project, American Nuclear Society
,
Chicago, IL
, Milestone Report No. WBS 2.2.2.03 ECP-SE-08-66.
57.
Fang
,
J.
,
Shaver
,
D.
,
Romano
,
P.
, and
Merzari
,
E.
,
2023
, “
Large-Scale Multiphysics Simulations of Small Modular Reactors Operating in Natural Circulation
,”
Proceedings of the 20th International Topical Meeting on Nuclear Reactor Thermal Hydraulics (NURETH-20)
, Washington, DC, Aug. 20–25, pp.
902
915
.
58.
Merzari
,
E.
,
Hamilton
,
S.
,
Evans
,
T.
,
Min
,
M.
,
Fischer
,
P.
,
Kerkemeier
,
S.
,
Fang
,
J.
,
Romano
,
P.
,
Lan
,
Y.-H.
, and
Phillips
,
M.
,
2023
, “
Exascale Multiphysics Nuclear Reactor Simulations for Advanced Designs
,”
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
, Denver, CO, Nov. 12–17, pp.
1
11
.10.1145/3581784.3627038
59.
Merzari
,
E.
,
Yuan
,
H.
,
Min
,
M.
,
Shaver
,
D.
,
Rahaman
,
R.
,
Shriwise
,
P.
,
Romano
,
P.
,
Talamo
,
A.
,
Lan
,
Y.-H.
,
Gaston
,
D.
,
Martineau
,
R.
,
Fischer
,
P.
, and
Hassan
,
Y.
,
2021
, “
Cardinal: A Lower-Length-Scale Multiphysics Simulator for Pebble-Bed Reactors
,”
Nucl. Technol.
,
207
(
7
), pp.
1118
1141
.10.1080/00295450.2020.1824471
60.
Yildiz
,
M. A.
,
Botha
,
G.
,
Yuan
,
H.
,
Merzari
,
E.
,
Kurwitz
,
R. C.
, and
Hassan
,
Y. A.
,
2020
, “
Direct Numerical Simulation of the Flow Through a Randomly Packed Pebble Bed
,”
ASME J. Fluids Eng.
,
142
(
4
), p.
041405
.10.1115/1.4045439
61.
Reger
,
D.
,
Merzari
,
E.
,
Balestra
,
P.
,
Schunert
,
S.
,
Hassan
,
Y.
, and
Yuan
,
H.
,
2023
, “
An Improved Pressure Drop Correlation for Modeling Localized Effects in a Pebble Bed Reactor
,”
Nucl. Eng. Des.
,
403
, p.
112123
.10.1016/j.nucengdes.2022.112123
62.
Reger
,
D.
,
Merzari
,
E.
,
Balestra
,
P.
,
Schunert
,
S.
,
Hassan
,
Y.
, and
King
,
S.
,
2023
, “
Direct Numerical Simulation and Large Eddy Simulation of a 67–Pebble Bed Experiment
,”
Nucl. Technol.
, pp.
1
21
.10.1080/00295450.2023.2218245
63.
Beare
,
R. J.
,
Macvean
,
M. K.
,
Holtslag
,
A. A. M.
,
Cuxart
,
J.
,
Esau
,
I.
,
Golaz
,
J.-C.
,
Jimenez
,
M. A.
, et al.,
2006
, “
An Intercomparison of Large-Eddy Simulations of the Stable Boundary Layer
,”
Boundary-Layer Meteorol.
,
118
(
2
), pp.
247
272
.10.1007/s10546-004-2820-6
64.
Rodrigo
,
J.
,
Churchfield
,
M.
, and
Kosovic
,
B.
,
2017
, “
A Methodology for the Design and Testing of Atmospheric Boundary Layer Models for Wind Energy Applications
,”
Wind Energy Sci.
,
2
(
1
), pp.
35
54
.10.5194/wes-2-35-2017
65.
Monin
,
A. S.
, and
Obukhov
,
A. M.
,
1954
, “
Basic Laws of Turbulent Mixing in the Surface Layer of the Atmosphere
,”
Tr. Akad. Nauk SSSR Geophiz. Inst.
,
24
(
151
), pp.
163
187
.https://gibbs.science/efd/handouts/monin_obukhov_1954.pdf
66.
Grotjans
,
H.
, and
Menter
,
F.
,
1998
, “
Wall Functions for General Application CFD Codes
,”
Proceedings of the 4th European Computational Fluid Dynamics Conference ECCOMAS 98
, Greece, Sept. 7–11, p.
1112
.https://www.tib.eu/en/search/id/BLCP%3ACN027337170/Wall-Functions-for-General-Application-CFD-Codes/
67.
Kuzmin
,
D.
,
Mierka
,
O.
, and
Turek
,
S.
,
2007
, “
On the Implementation of the μ Turbulence Model in Incompressible Flow Solvers Based on a Finite Element Discretisation
,”
Int. J. Comput. Sci. Math.
,
1
(
2/3/4
), pp.
193
206
.10.1504/IJCSM.2007.016531
68.
Sullivan
,
P.
,
McWilliams
,
J.
, and
Moeng
,
C.
,
1994
, “
A Subgrid-Scale Model for Large-Eddy Simulation of Planetary Boundary-Layer Flows
,”
Boundary-Layer Meteorol.
,
71
(
3
), pp.
247
276
.10.1007/BF00713741
69.
Schumann
,
U.
,
1975
, “
Subgrid Scale Model for Finite Difference Simulations of Turbulent Flows in Plane Channels and Annuli
,”
J. Comput. Phys.
,
18
(
4
), pp.
376
404
.10.1016/0021-9991(75)90093-5
70.
Stolz
,
S.
,
Schlatter
,
P.
, and
Kleiser
,
L.
,
2005
, “
High-Pass Filtered Eddy-Viscosity Models for Large-Eddy Simulations of Transitional and Turbulent Flow
,”
Phys. Fluids
,
17
(
6
), p.
065103
.10.1063/1.1923048
71.
Smagorinsky
,
J.
,
1963
, “
General Circulation Experiments With the Primitive Equations
,”
Mon. Weather Rev.
,
91
(
3
), pp.
99
164
.10.1175/1520-0493(1963)091<0099:GCEWTP>2.3.CO;2