Reduced Order Model-en - BARAM Portal

Reduced Order Model, ROM

The Reduced Order Model is a feature that predicts results for arbitrary conditions based on results calculated via batch run.

It analyzes data using Proper Orthogonal Decomposition (POD), creates a Reduced Order Model (ROM), and then uses this to generate 3D results for input conditions.

The generated results are in the same format as the computed results and are treated identically to the results from the batch run, being added to the batch calculation conditions.

It consists of three parts: ‘Build Model’, ‘Reconstruct from Model’ and ‘Model Enhancement’.

Build Model

Clicking the ‘Build ROM’ button opens a new window where you can select the conditions from the batch run to use when creating the ROM. Use the mouse to select the desired conditions (use Ctrl-A to select all) and click the Add button.

Pressing the OK button displays a window for selecting variables to use in ROM creation. All variables from the Excel or CSV file read during batch run conditions appear on the left; select the parameters to use from these. The Excel or CSV file may contain numerous variables not relevant to the calculation conditions. If these are not excluded, the ROM will not predict results accurately.

Pressing the OK button creates the ROM, and the selected conditions are displayed in ‘Snapshot Cases’.

Reconstruct from Model

For each parameter selected for ROM creation, set the conditions by moving the slider bar or entering a value. Enter the desired name in ‘Case name’ and click the ‘Reconstruct case with ROM’ button to generate the prediction results.

The generated result is added as the ‘Case Name’ in the ‘Run’ – ‘Batch Cases’ window. The added result is treated identically to other calculated results.

Right-clicking the created result and selecting ‘Load’ allows post-processing. Selecting ‘Open in ParaView’ from the menu opens this result. Right-clicking and selecting ‘Schedule Calculation’, then pressing the ‘Start Calculation’ button, initiates the calculation using the predicted result as the initial condition.

Model Enhancement

When the accuracy of ROM results is compromised due to insufficient sample size, additional samples can be added. Clicking the ‘Evaluate/Enhance ROM’ button displays the window shown in Figure~\ref{fig:romE1}. Here, you can input the number of additional samples to add and set the comparison value for cross-validating CFD and ROM results. You can select values such as the aerodynamic coefficient, point/surface/volume, etc.

Calculation conditions are automatically added at the points with the largest error using the Gaussian Process Regression (GPR) method. For each calculation condition, the ROM calculates the value, performs the CFD analysis, and then displays the difference between the two values in a table.

Overview of the reduced-order model

Here, we briefly summarize the core idea and implementation procedure of the Proper Orthogonal Decomposition (POD)-based Reduced Order Method (ROM). The gist is as follows. First, a matrix is created by collecting multiple “snapshots” from a 3D CFD analysis, and this is decomposed using a mathematical framework identical to the Principal Component Analysis (PCA) in statistics to obtain orthogonal basis vectors in order of high energy and their linear combination coefficients. Then, for new operating conditions, the learned “input condition → coefficients” relationship is interpolated (linear regression, Gaussian regression, or simple AI regression, etc.) to predict the coefficients, and the entire flow field is quickly reconstructed by linearly combining the predicted coefficients and the basis vectors.

Snapshot matrix construction and principal component analysis

The fields obtained from the analysis results, such as velocity, temperature, and pressure, are high-dimensional data on a space and time, and can be flattened into a one-dimensional column vector with a length equal to the number of mesh. A snapshot matrix, $\mathbf{Y} \in \mathbb{R}^{n \times N}$ is created by stacking N column vectors of analysis results with different analysis conditions (inlet flow rate, temperature, etc. during operation). And to centralize the data, we compute $\mathbf{\bar{y}}$, the average of N column vectors, and subtract it from each column of the snapshot matrix. The final constructed snapshot matrixis, $\mathbf{Y_c} \in \mathbb{R}^{n \times N}$ as follows.

Here, $u_{ij}$ is the physical quantity value at the $i$th mesh point in the $N$th analysis result, and $\bar{u_i}$ is the value of the mean column vector $\mathbf{\bar{y}}$ at the $i$th grid point.

Performing Singular Value Decomposition (SVD) on the snapshot matrix $\mathbf{Y_c}$ yields basis vectors that most efficiently project the entire analysis result. However, for typical CFD analysis results, the number of mesh cells is vastly larger than the number of snapshots ($n \gg N$). Therefore, directly applying SVD to the matrix $\mathbf{Y_c}$ itself incurs extremely high computational costs. To circumvent this, the covariance matrix $\mathbf{C}$ is defined as follows.

$\mathbf{C} = \mathbf{Y_c}^T \mathbf{Y_c}$

This matrix is a square matrix with a significantly reduced size ($N \times N$) compared to the snapshot matrix, enabling the rapid solution of the eigenvalue problem to compute eigenvalues and eigenvectors.

$\mathbf{C}$의 고유값/고유벡터와 $\mathbf{Y_c}$의 특이값/특이벡터의 관계는 다음과 같다.

$\mathbf{C} \vec{v_i} = \lambda_i \vec{v_i}$, $\mathbf{Y_c} = U \sum V^T$

The relationship between the eigenvalues/eigenvectors of $\mathbf{C}$ and the singular values/singular vectors of $\mathbf{Y_c}$ is as follows.

$\vec{u_i} = \frac{\mathbf{Y_c} \vec{v_i}}{\sigma_i}$

The above $\vec{u_i}$ becomes the POD basis vector.

The entire singular value decomposition process is performed quickly with only data loading and a few matrix operations, so it is completed in a time that is dozens of times shorter than the original CFD analysis time.

POD basis vectors and expansion coefficients

The N basis vectors $\vec{u_i}$ obtained by singular value decomposition of the snapshot matrix $\mathbf{Y_c}$ are mutually orthogonal and form representative spatial patterns of the flow field. Physically, this means the entire flow field can be approximated as a linear combination of these patterns.

$\mathbf{\vec{y}} = \mathbf{\bar{y}} + \sum_{i=1}^{N} a_i \vec{u_i}$

Here, $a_i$ is the expansion coefficient under the given condition. This represents the value obtained by projecting each snapshot onto the basis vectors, indicating how strongly that snapshot incorporates each spatial pattern (mode). These spatial patterns are listed in descending order of singular value during the calculation process. Modes with higher energy (larger singular values) describe the large-scale structures of the flow field (e.g., major jets, large-scale temperature distributions), while higher-order modes with lower energy primarily account for the detailed/residual structures.

Generally, when a sufficient number of snapshots N is secured, the singular value $\sigma$ tends to decrease sharply after a certain number. This indicates that the spatial patterns that can form in a flow field are compressed into a small number of dominant modes. Therefore, even by considering only the top K modes out of the entire set, the main behavior of the entire flow field can be approximated with high accuracy.

$\mathbf{\vec{y}} = \mathbf{\bar{y}} + \sum_{i=1}^{N} a_i \vec{u_i} \approx \mathbf{\bar{y}} + \sum_{i=1}^{K} a_i \vec{u_i}$

At this point, K is often determined based on the cumulative energy ratio (e.g., 99% or higher). The basis vector set selected in this manner compresses the original vast amount of analytical data into representation by a very small number of modes. This order reduction dramatically improves computational efficiency while also enabling intuitive interpretation of the physically dominant flow structure.

Expansion coefficient interpolation and flow field reconstruction

Through Principal Orthogonal Decomposition (POD) operations, the entire CFD analysis dataset can be represented as a linear combination of mode vectors $\vec{u}$ and scalar expansion coefficients $a$. Even when the analysis conditions change, the mode vectors $\vec{u}$ remain fixed spatial patterns, while only the expansion coefficient vector $\vec{a}$ varies depending on the analysis conditions. Therefore, by estimating the functional relationship between the input parameter vector $\vec{x}$ and the expansion coefficient vector $\vec{a}$, the flow field for arbitrary input conditions can be rapidly simulated.

Various interpolation techniques can be applied to estimate these functional relationships. The simplest methods are linear interpolation or least-squares regression, which fit a local plane from adjacent samples to estimate the expansion coefficient. More sophisticated approaches, such as Gaussian Process Regression (GPR) or artificial neural networks (AI Regression), can capture nonlinear relationships and precisely estimate complex mappings of high-dimensional input spaces.

For an input parameter vector $\vec{x}$ corresponding to unknown interpretation conditions, once the predicted expansion coefficient vector $\vec{a}$ is obtained, it can be immediately combined with the basis vector through a linear combination to reconstruct the entire flow field.

$\mathbf{\hat{y}} = \sum_{i=1}^{K} a_i \vec{u_i} + \mathbf{\bar{y}}$

This process, which involves only simple interpolation and matrix operations, enables rapid prediction of the entire flow field within seconds to tens of seconds, even for large-scale analysis cases. Therefore, it can be utilized as an efficient prediction tool that significantly reduces computational costs while maintaining the key flow characteristics of the original analysis results. It also demonstrates potential for use in digital twins and real-time simulations.

Data collation problem in parallel computing case

When data is distributed across multiple nodes for parallel analysis, collating all analysis results on a single node can be considered to ensure consistent POD operations across the entire domain. However, for large analysis datasets, loading the analysis result column vectors from tens of millions of unit mesh into memory as many as the number of snapshots may be impossible due to capacity constraints, and data transfer over the network can also take considerable time.

To solve this, the calculation process for the covariance matrix can be modified. Originally, the covariance matrix $\mathbf{C}$ is computed from the entire snapshot matrix $\mathbf{Y}$ as follows.

$\mathbf{C} = \mathbf{Y^T} \mathbf{Y}$

At this point, the matrix $\mathbf{Y}$ is divided into multiple pieces $\mathbf{Y_p}$ along the row direction for parallel computation, with each piece located at a respective node. The row size of each piece $\mathbf{Y_p}$ is equal to the number of partitioned mesh assigned to that node, while the column size is equal to the number of snapshots provided in the dataset.

$ \mathbf{Y} = \begin{bmatrix} Y_1 \\ Y_2 \\ … \\ Y_N \end{bmatrix}$

In the above equation, N is the total number of nodes. Now, by the properties of matrix multiplication, the covariance matrix can be calculated as follows.

Each MPI process calculates the partial covariance $C_p = \mathbf{Y_p}^T \mathbf{Y_p}$ for its own snapshot fragment $\mathbf{Y_p}$. Finally, all processes sum these $C_p$ values using an MPI reduce operation to complete the global covariance matrix $C = \sum_{p=1}^ {N} C_p $, which can then be used in the POD operation.

This method is a distributed POD computation technique that reduces both memory usage and communication bandwidth, achieving the same results without collecting the snapshot data itself. Each node independently performs only local matrix multiplication, resulting in minimal computational bottlenecks. The only data exchanged over the network is the $N \times N$ covariance matrix, making the communication cost very small compared to the linear snapshot size. Consequently, stable POD processing becomes feasible even for large-scale parallel analysis datasets.

Memory usage issues in large-scale analysis cases

When the number of snapshots is large, in the thousands or tens of thousands, even calculating the covariance matrix on a single node can become difficult due to on-device memory limitations. In this case, randomized SVD can be applied as a technique to extract approximate eigenvectors without constructing the entire snapshot matrix, taking advantage of the fact that calculating only the top few basis vectors with high energy content can accurately simulate real-world physical phenomena.

While the traditional SVD performs a complete orthogonal decomposition operation on the entire snapshot matrix, the randomized SVD creates a subspace approximation through random projection and then computes the reduced SVD only within it. Specifically,

전통적인 SVD가 전체 스냅샷 행렬 에 대해 완전한 적합직교분해 연산을 수행하는 반면, randomized SVD는 무작위 투영(random projection)을 통해 저차 근사(subspace approximation)를 만든 뒤 그 내부에서만 축약된 SVD를 계산한다.

Specifically, we construct an arbitrary orthogonal basis $Q$ in the form $Y \approx Q (Q^T Y)$, and then perform a singular value decomposition on $Q^T Y$, which is significantly smaller in size than the high-dimensional matrix $Y$.

The complexity of this process is low at $O(N_{cell} k)$, where $k$ is the number of basis vectors to be used. That is, even if the number of snapshots exceeds thousands, the required number of modes can be limited to tens to hundreds. This allows obtaining POD basis vectors of nearly equivalent quality without constructing the entire covariance matrix. Furthermore, this process is suitable for iterative updates or streaming processing, allowing the basis to be progressively updated by sequentially reading snapshots.

Therefore, for large datasets, it is practical to apply approximate POD algorithms, such as randomized SVD, instead of the traditional exact SVD. This significantly reduces memory usage and computation time, while maintaining nearly the same accuracy by correctly calculating the dominant basis vectors in terms of energy. Consequently, it becomes possible to build reduced-order models based on appropriate orthogonal decomposition even for multi-terabyte analysis datasets.

Characteristics and limitations of POD-ROM

The reduced order method (ROM) based on the proper orthogonal decomposition (POD) can be used to dramatically reduce the computational cost of iterative analyses by representing complex flow fields with a small number of dominant modes. However, its accuracy and efficiency directly depend on the quality and diversity of the snapshot data used in the learning phase.

In particular, the number of snapshots must be appropriately selected to achieve a trade-off between model accuracy and speed. Insufficient snapshots fail to adequately capture key flow patterns, while excessive snapshots reduce the computational efficiency of the reduced-order method. Therefore, the appropriate number of snapshots should be determined by considering factors such as the dimensionality of the input parameters, the complexity of the flow phenomenon, and the smoothness of the POD eigenvalue distribution (energy spectrum). Typically, at least a fourfold increase in efficiency is expected compared to a full CFD analysis.

Furthermore, the snapshot data must sufficiently cover the entire range of physical phenomena being predicted. For example, when predicting physical phenomena that fluctuate rapidly across the input parameter range, such as flow separation or vortex transition, the training snapshots should be designed to be uniformly distributed across the entire input parameter space, including the relevant interval.

POD-ROM is essentially a linear algebraic technique that constructs a new solution through linear combinations of snapshots. While this technique is effective for predicting continuous gradients or gradual changes, it can be less accurate for flow fields containing discontinuous gradients (such as shock waves and free-surface). To overcome these limitations, various complementary research methods, such as discontinuity-tracking POD and convolutional neural networks, are being conducted.

In summary, POD-ROM is an excellent technique for rapidly predicting continuous and smooth flows, but to handle physical phenomena involving rapidly changing or discontinuous gradients, a sufficient dataset and complementary/extended technical approaches are required.