一种基于块雅可比迭代的高阶FR格式隐式方法
AN IMPLICIT BLOCK JACOBI APPROACH FOR HIGH-ORDER FLUX RECONSTRUCTION METHOD
-
摘要: 最近, 基于非结构网格的高阶通量重构格式(flux reconstruction, FR)因其构造简单且通用性强而受到越来越多人的关注. 但将FR格式应用于大规模复杂流动的模拟时仍面临计算开销大、求解时间长等问题. 因此, 亟需发展与之相适应的高效隐式求解方法和并行计算技术. 本文提出了一种基于块Jacobi迭代的高阶FR格式求解定常二维欧拉方程的单GPU隐式时间推进方法. 由于直接求解FR格式空间和隐式时间离散后的全局线性方程组效率低下并且内存占用很大. 而通过块雅可比迭代的方式, 能够改变全局线性方程组左端矩阵的特征, 克服影响求解并行性的相邻单元依赖问题, 使得只需要存储和计算对角块矩阵. 最终将求解全局线性方程组转化为求解一系列局部单元线性方程组, 进而又可利用LU分解法在GPU上并行求解这些小型局部线性方程组. 通过二维无黏Bump流动和NACA0012无黏绕流两个数值实验表明, 该隐式方法计算收敛所用的迭代步数和计算时间均远小于使用多重网格加速的显式Runge-Kutta格式, 且在计算效率方面至少有一个量级的提升.Abstract: Recently, the flux reconstruction (FR) method has attracted more and more attentions for its simplicity and generality. However, it is still computationally expensive and time consuming when simulating the complex flow problems by FR method. There is a huge demand for developing appropriate efficient implicit solvers and parallel computing techniques for FR. This paper proposes an implicit high-order flux reconstruction solver on GPU platform based on the block Jacobi iteration method. As it is inefficient to solve the large global linear system resulting from spatial and implicit temporal discretization of FR directly. A block Jacobi approach is used to change the characteristics of the lift-hand matrix of the global linear system and this avoids the dependence of neighboring elements. Therefore, only the diagonal blocks of global matrix need to be stored and calculated. Then, the problem of solving the huge global linear system is transformed into solving a series of local linear equations simultaneously. Finally, these small local linear equations would be solved by the LU decomposition method in parallel on GPU platforms. Two typical cases, including subsonic flows over a bump and a NACA0012 airfoil, were simulated and compared with the multi-grid explicit Runge-Kutta scheme. The numerical results demonstrated that the present implicit method can reduce the iterations significantly. Meanwhile, the implicit solver has shown at least 10x speedup over the multi-grid Runge-Kutta scheme in all cases.