Tensor-Based Operator Assembly and Evaluation


The high performance (HPC) versions of the example codes in the miniapps/performance directory use a set of templated classes to efficiently implement the inner-most portion ($B^T D B$) of the fundamental finite element (FE) operator decomposition:

We take advantage of the tensor-product structure of both the finite element basis and the quadrature rule to efficiently apply the action of $B$ without necessarily computing its entries. This is generally know as sum factorization. In the case where we pre-compute and store the $D$ matrix, we call the algorithm partial assembly.

Template implementation

Below is a short description of the header files that are part of the initial reference implementation of the tensor-based finite element assembly and evaluation algorithms.


Configuration macros including the specification of the template blocking size (currently used in the matrix-matrix multiply).


Helper templated function to allow single implementation of multiple assignment operations on tensor entries.


Classes describing fixed size tensor layouts. Implemented are standard strided layouts for 1D/2D/3D/4D tensors. Layouts support reshape and sub-tensor operations independent of the actual data storage.

This header also contains another set of "vector layout" classes for converting scalar data indices into multi-component (vector) data indices. They are used to describe the layout of vector GridFunctions on global degrees of freedom (similar to the Ordering class). In the FE operator decomposition, these classes are used by the templated *_FiniteElementSpace classes (see fem/tfespace.hpp below) to implement the actions of $G$ and $G^T$ in the case of vector (multi-component) input and/or output fields.


Small matrix operations, defined by specializations: determinant, adjugate, etc. Matrix-matrix multiply, C = A.B, simple and blocked version.


Fixed-size container classes TVector, TMatrix, TTensor3, TTensor4 for 1D/2D/3D/4D tensors stored in column-major layout. Element-wise tensor operations: A {=,+=,*=} scalar; A {=,+=,*=} B. The latter allows different input and output layouts. With suitable layouts this can be used to permute (transpose) tensors, extract sub-tensors, etc. The tensor contraction/product operations:


The Mesh object templated by the finite element space and layout of the nodes. Provides MatchesGeometry() and MatchesNodes() functions to verify if compiled and runtime mesh agree.


Quadrature rules templated by geometry (triangles, quads, etc.) and integration order.


H1 and L2 finite elements templated by geometry and polynomial order.


Template *_FiniteElementSpace classes providing the mappings between global and local (element) degrees of freedom for H1 continuous and L2 discontinuous spaces. In the FE operator decomposition, these classes provide the element local action of $G$ (Extract methods) and $G^T$ (Assemble methods).


Templated versions of classes derived from the abstract class Coefficient. It encapsulates physical quantities like material properties, sources, boundary/initial conditions, etc. Its main functionality is to evaluate the coefficient at all quadrature points in an element, which is then used in the evaluation of the $D$ matrix.


Element transformation class, templated on a mesh type and an integration rule. It is constructed from a mesh (e.g. class TMesh) and shape evaluator (e.g. class ShapeEvaluator) objects. Allows computation of physical coordinates and Jacobian matrices corresponding to the reference integration points. The desired result (a combination of coordinates and/or Jacobian matrices at quadrature points, element attribute and/or element index) is specified through the template sub-class Result and stored in an object of the same type. The idea of this approach is to eliminate unnecessary evaluations if they are not needed. The need is determined based on what the particular "users" need. The "users" are the templated Coefficient and Kernel (see fem/tbilininteg.hpp below) classes which specify what they need through static constant boolean variables, e.g. uses_coordinates, uses_Jacobians, etc.


Classes for evaluating FE basis, ShapeEvaluator, and finite element functions, FieldEvaluator, and their derivatives at quadrature points, templated by a finite element class and an integration rule class. These correspond to the $B$ and $BG$ matrices above. Quads and hexes use the tensor-product structure for fast evaluation.


Kernel classes (e.g. mass, diffusion) that represent the matrix $D$ from the above FE operator decomposition. These classes also specify the type of the local operator that needs to be applied before and after the $D$ matrix - these are the $B_{in}$ and $B^T_{out}$ matrices, respectively. The product $B^T_{out} D B_{in}$ is the local element matrix, which is the result when using the BilinearFormIntegrator classes. This specifications of the types are given by static constant boolean variables, e.g. in_values and out_values. The Kernel classes provide the following methods:


Bilinear form operator, templated on the mesh, finite element space, integration rule and bilinear form integrator. Corresponds to the $A$ matrix above. Provides various assembly and evaluation schemes:


By default make builds the example drivers with the compiler used to compile MFEM. If g++ was used, a pseudo-code dump file with the optimized code will be generated (option -fdump-tree-optimized-blocks). The g++ option --param max-completely-peel-times=3 prevents the compiler from unrolling innermost loops (of size greater than 3), allowing the compiler to vectorize them. Some options for optimization/vectorization with the clang compiler are also included.


High-performance templated versions of the corresponding examples/ex*.cpp example codes.