In calibrating model parameters, it is important to include the model discrepancy term in order to capture missing physics in simulation, which can result from numerical, measurement, and modeling errors. Ignoring the discrepancy may lead to biased calibration parameters and predictions, even with an increasing number of observations. In this paper, a simple yet efficient calibration method is proposed based on sensitivity information when the simulation model has a model error and/or numerical error but only a small number of observations are available. The sensitivity-based calibration method captures the trend of observation data by matching the slope of simulation predictions and observations at different designs and then utilizing a constant value to compensate for the model discrepancy. The sensitivity-based calibration is compared with the conventional least squares calibration method and Bayesian calibration method in terms of parameter estimation and model prediction accuracies. A cantilever beam example, as well as a honeycomb tube crush example, is used to illustrate the calibration process of these three methods. It turned out that the sensitivity-based method has a similar performance with the Bayesian calibration method and performs much better than the conventional method in parameter estimation and prediction accuracy.

## Introduction

In engineering simulation, parameter calibration and model validation have been extensively studied to achieve an accurate parameter and prediction of computational models. In particular, when the model parameters cannot be measured directly, simulation models are used to calibrate model parameters by comparing with experimental observations. For conventional calibration method using least squares, model parameters were determined by minimizing the error between simulation results and experimental observations, which is still widely used in engineering applications. In this regard, optimization methods, such as genetic algorithms [1,2], particle swarm optimization [3], sensitivity method [4], and tabu search method [5], were used for the purpose of calibration, where the goal was to match model predictions with measurements.

However, when the simulation has a model error and/or numerical error, matching experimental data with simulation results may lead to biased calibration parameters and predictions. That is, when the simulation is not accurate, calibration may lead to wrong parameters during the matching process. Based on this biased parameters, the simulation results may agree well with the experimental data at the training points, but may have large errors at the validation points. Loeppky et al. [6] pointed out that the accurate value of the parameters could only be obtained when the model discrepancy is included in the calibration process. For this reason, a discrepancy term needs to be introduced to capture the missing physics in the simulation model and obtain the accurate calibration parameters.

Many studies [7] focused only on improving the prediction capability, but not on the accuracy of the identified model parameters. However, it is important to accurately identify the model parameters for many reasons [8]: (1) the calibrated parameters may need to be applied to other simulations or higher level (system-level) simulations; (2) the calibrated parameters can be used as an important design criterion but they cannot be measured directly; and (3) the accurate parameters can help to improve the prediction capability over a broad range of design. Therefore, in this paper, both prediction accuracy and parameter estimation accuracy are evaluated as two important performances for calibration methods.

The current state-of-the-art calibration and validation procedures are mostly based on statistical approaches using the Bayesian method. For example, Kennedy and O'Hagan [7] proposed a Bayesian calibration framework to include various sources of uncertainty and demonstrated that the bias and overfitting in the estimation of physical parameters can be avoided or mitigated by introducing a discrepancy function. However, in this framework, the Markov chain Monte Carlo was commonly used to obtain the posterior distributions and requires a significant amount of iterations. Han et al. [9] introduced a statistical methodology simultaneously to determine both tuning and calibration parameters based on the Bayesian method. Higdon et al. [10] developed a full Bayesian calibration method for the model with multivariate outputs. Mahadevan et al. [11,12] discussed the effect of different model discrepancies in the Bayesian method and demonstrated the effectiveness of a discrepancy function on the calibration results. However, statistical methods require insightful understandings of the statistical theory and intensive computer implementation, which is not an easy task for industrial practitioners. Besides, these methods often demand a large number of test/simulation data to evaluate statistical distributions.

In this paper, we propose a sensitivity-based calibration method, which is simple enough for industrial practitioners. Instead of using the statistical concept, sensitivity information was utilized to calibrate the parameters, and a simple form of discrepancy function is used to compensate for measurement error, model error, and/or numerical error. When a limited number of observations are available, a constant discrepancy function was utilized to represent the missing physics between the simulation predictions and observations. Even if the assumption of a constant discrepancy might be limited, the performance is much better than the conventional method and it might be the only choice when the number of test data is small. The calibrated parameters and model predictions obtained from the proposed sensitivity-based method were compared with that of the conventional calibration method using least squares and the Bayesian calibration method by using an analytical cantilever beam example and a honeycomb crushing example.

The remainder of this paper is organized as follows: Section 2 introduces the three calibration approaches: the conventional least squares, sensitivity-based and Bayesian calibration approaches. An analytical cantilever beam example is used in Sec. 3 to compare the performance of three methods along with the detailed characteristics of the proposed sensitivity-based calibration method. Section 4 applies the proposed method to engineering application on crush simulation of honeycomb tube, followed by conclusions in Sec. 5.

## Calibration Methods Under Model Error

In the following discussions, two types of variables will be used, namely, design variable and calibration parameter. Design variables are user-controllable variables of the model, such as thickness, width, and length. Their values can be easily changed for optimizing the performance of a structure/system. In other words, they can be “designed.” Both the model response and experimental data depend on them. The calibration parameters are the parameters that are used in the simulation but cannot be measured easily or directly in the physical tests. For the honeycomb crush example discussed in Sec. 4, the cell size *c* and thickness *t* are the design variables **d** that need to be optimized to improve the crashworthiness of honeycomb structures. When these two variables change, both the test results and simulation results will change. The yield stress is an unknown material parameter that only presents in the simulation model and needs to be calibrated. In this case, the yield stress is the calibration parameters. The user wants to determine the calibration parameters such that the model prediction is consistent with experimental data.

**d**and calibration parameters $\theta $, the numerical response is represented as

*y*

_{pre}(

**d**, $\theta $) and the experimental observation as

*y*

_{test}(

**d**). Only one design variable case was shown in Fig. 1; thus, vector

**d**can be represented by

*d*. The simulation results can vary with different calibration parameters $\theta $ (such as material properties, boundary condition, etc.). The experimental observations were divided into two groups (see Fig. 1): training points for calibrating parameters and validation points for checking the prediction accuracy of the calibrated models. In addition, the relationship between simulation results and corresponding experimental results can be established as

where **e**_{test}, **e**_{model}, and **e**_{num} are the measurement error, modeling error, and numerical error, respectively. In this study, only the biased part for the above-mentioned errors are considered by introducing a discrepancy function. The proposed method ignores the random part of measurement error, whose effect can be reduced by taking an average of multiple measurements.

### Conventional Calibration Method Using Least Squares.

**e**

_{test},

**e**

_{model}, and

**e**

_{num}) between them, which can be formulated by

**e**is the residual between the calibrated simulation results

*y*

_{pre}and the experimental observations

*y*

_{test}. The residual may include the model error, measurement bias, and numerical error. When two or more observations were used, the parameters were calibrated by minimizing the sum of squared errors [13], given as

where *w _{i}* denotes the weight for

*i*th experimental observation, which can be chosen based on the quality of or the importance level of the experimental data. In this study, we assume an equal weight for all observations.

*N*is the number of observations used as training points.

After solving for the optimal parameter $\theta *$ from Eq. (3), it can be used to predict the results at untested designs *y*_{pre}(**d**, $\theta *$). However, the parameter $\theta *$ may cause a large error (as shown in Fig. 1) at other designs, such as at *d*_{2}. It should be noted that when the simulation has a model error, using many data may not improve the calibration accuracy, which will be shown in the numerical example section.

### Sensitivity-Based Calibration Method.

**e**

_{model},

**e**

_{num}, and −

**e**

_{test}. Even if the true model error is not constant, the assumption of constant model error in the proposed method yields better calibration results than the conventional method as illustrated by the following two examples. Therefore, Eq. (1) can be rewritten as

where *C*_{i} is the *i*th order coefficient for the discrepancy function. When a high-order polynomial function is selected as the discrepancy function, there are more coefficients to be estimated in the calibration process, which needs more observations. However, this study focuses on the case with limited (2–3) observations, which is much common in real-life engineering application, so the complex discrepancy function may lead to overfitting and is beyond the scope of this study.

As shown in Eq. (6), the effect of the model error is removed in the proposed calibration process. Once the calibration parameters are found, the effect of model error in the form of discrepancy on model predictions can be considered based on Eq. (7). Instead of directly matching with experiment, the sensitivity-based calibration method tries to match the slope (i.e., sensitivity) between the nearby two observation data. As shown in Fig. 2, the parameter $\theta 1$ from the conventional method can match the experimental value at *d*_{1}, but it failed to capture the trend and obtained a biased parameter and prediction at other design values, such as *d*_{2}. On the other hand, parameter $\theta *$ from the sensitivity-based calibration method failed to match the experimental value, but it captured the trend of the true function and was considered as the calibrated parameters.

This method requires at least two data points and assumes that the constant value is utilized to consider the discrepancy. When the true discrepancy is not constant, this method can average out the effect of the discrepancy, which still can mitigate the error due to the discrepancy. In the analytical example, it will be shown that the constant discrepancy assumption significantly improves the parameter calibration as well as model prediction.

When many data are available, it is possible to use a more complicated form of discrepancy in Eq. (7). However, requiring more data is against the main purpose of this paper. The advantage of the proposed method is to improve the estimation of model parameters and predictions at untested points with a small number of test data.

### Bayesian Calibration Framework.

Most widely used Bayesian calibration framework was introduced by Kennedy and O'Hagan [7]. The Bayesian calibration framework predicts the true behavior based on experiment data while it calibrates parameters to make the prediction most likely to represent experiment results.

The simulation GP, $ypre(d,\theta )$, is to model simulation behavior in terms of design **d** and unknown calibration parameters $\theta $; and the discrepancy GP, $\delta (d)$, models the error in the calibrated simulation behavior. $ytrue(d)$ is a GP model of the true response.

The variance of the posterior distribution is the prediction uncertainty measure of the Bayesian calibration. However, it is impractical for finding the calibration parameter and hyperparameters simultaneously. This is because this Bayesian calibration framework [7] calibrates parameters by maximizing the likelihood of simulation with other hyperparameters. Therefore, the correlation between parameters can lead to inaccurate calibration. Detailed description about the Bayesian calibration framework is given in Appendix.

## Cantilever Beam Example

In this section, the proposed calibration methods will be demonstrated using a cantilever beam example. It is assumed that the Timoshenko beam model represents the true model, while the Euler beam model is the outcome of the simulation. From the assumption of zero measurement error, the experimental observation is the same as the true model.

In this example, the tip deflection of the beam is selected as the quantity of interest and the height of beam *h* as the design variable. As shown in Fig. 3, the length *L* and width *b* of the beam are 20 mm and 2 mm, respectively. The concentrated force *F* loaded at the tip is 600 N. The beam also has the Poisson's ratio *υ* = 0.36 and Young's modulus E_{0} = 68,000 MPa. The Young's modulus *E* is considered as a calibration parameter in the simulation model. That means the test data are generated by using the true Young's modulus from the Timoshenko beam theory. The objective is to find the true Young's modulus and the tip deflection at different designs using the Euler beam model.

*y*

_{pre}(

*h,E*) can be given as [14]

By comparing the two models, one can see that the second term in Eq. (11) corresponds to the discrepancy, which varies as a function of *h*. Therefore, the model error is proportional to the inverse of design, while we will use a constant model error in the proposed calibration process.

Figure 4 compares the results of the true model (black solid curve) and the simulation model (blue dashed dot curve).

### Case I: Conventional Least Squares Calibration Based on One Data Point (*h*_{1} = 8).

*h*= 8.0 mm. In the conventional calibration method, the parameter can be identified by matching the model prediction with experimental data at the point, as

With a training point given at *h* = 8.0 mm, Young's modulus is calculated as *E* = 60,932 MPa. Therefore, the relative error between the calibrated and true parameter is 10.4%. Figure 4 shows the prediction of the model with the calibrated parameter. As expected, the prediction is accurate at the training point (*h* = 8.0 mm), but the error gradually increases as the distance increases from the current design. This happens because the model includes a model error in addition to unidentified calibration parameter.

### Case II: Calibration Based on Two Data Points (*h*_{1} = 8, *h*_{2} = 10)

#### Conventional Calibration Methods Using Least Squares.

*h*= 8 and 10 mm) are available, it may be impossible to match predictions and test data simultaneously at these two points unless the model form is perfect. In this case, the residuals between predictions and test data are minimized using optimization as

When two data points are used, the optimum parameter turns out to be *E* = 60,200 MPa, which is about 11.5% relative error. Therefore, using two data points did not improve the calibration results.

#### Sensitivity-Based Calibration Method.

In the case of the sensitivity-based calibration method, the unknown parameter is determined by matching the difference between two data points, followed by identifying the constant value of discrepancy term.

By utilizing Eq. (6) at *h*_{1} = 8 and *h*_{2} =10 mm, the parameter is calibrated as *E* = 64,914 MPa, which corresponds to 4.5% relative error. Then, the constant discrepancy $\delta $ was obtained by substituting the *E* = 64,914 into Eq. (7), to yield *δ* = 0.0189.

As can be seen from Eq. (11), the discrepancy function is not constant. The error in parameter calibration is due to the assumption that the discrepancy is constant. If an accurate form of the discrepancy is used, it is possible that the sensitivity-based calibration method can identify the exact Young's modulus, but in general, it is impossible to know the form of discrepancy function. However, the relative error of 4.5% in the calibration parameter is much better than 11.5% of the conventional least squares method.

Figure 5 compares the performance of sensitivity-based calibration method with that of the conventional least squares method. It was observed that the sensitivity-based method can capture the trend of the model accurately before introducing the discrepancy term (blue dashed curve). With the constant discrepancy term, this method can predict the true model more accurately than the conventional least squares method (red dashed curve).

#### Bayesian Calibration Method.

The process of the Bayesian calibration method is composed of hyperparameter and calibration parameter estimations and it is achieved by three steps:

- (1)
Estimate hyperparameters of $ypre(h,\theta )$. Instead of directly using the simulation model in Eq. (10), Bayesian calibration method builds an approximation function using the GP model to represent the simulation responses [17] in terms of design variable

*h*and calibration parameter $\theta $. In this case, a quadratic trend function was used for the simulation model with 24 sampling points in the design space $58,000\u2264\theta \u226468,000$ and $7\u2264h\u226413$ by utilizing Latin hypercube sampling method. - (2)
Estimate hyperparameters of discrepancy function

*δ*(*h*). A constant trend function is used for the discrepancy function GP model. Since two test data are available, the difference between simulation responses and test results are used as the samples to fit the discrepancy function model. - (3)
Estimate calibration parameter

*θ*. The calibration parameter $\theta $ of*E*= 64,911 MPa is estimated based on the previously estimated hyperparameters.

Note that the relative error in the calibration parameter is about 4.5%, which is similar to that of the sensitivity-based method. As shown in Fig. 6, the prediction result from the Bayesian method (red cross marks) was also similar to that from the sensitivity-based method. It is interesting to notice that the Bayesian method can estimate comparatively accurate parameters with several hyperparameters in the Gaussian model even with two observations. This may be because that these hyperparameters are correlated with each other. It indicated that if the calibration parameters are correlated with each other, it is still possible to estimate their values even when the number of observations is less than the number of calibration parameters.

#### Case III: Calibration Based on More Than Two Data Points.

An important observation in calibration under model error is that adding more data may not improve the calibration and prediction accuracy. This is related to the lack of knowledge of the model. To study the effect of the number of data on the calibration and validation accuracy, the same procedure in Sec. 3.2 can be applied when three (*h*1 = 8, *h*2 = 10, *h*3 = 12), four (*h*1 = 8, *h*2 = 9, *h*3 = 10, *h*4 = 12), and five (*h*1 = 8, *h*2 = 9, *h*3 = 10, *h*4 = 11, *h*5 = 12) training points are available. Figure 7 shows the results for all the methods. It was observed that for the conventional and sensitivity-based methods, using more data cannot improve much of the parameter and prediction accuracy because a constant discrepancy function is used. This can be further improved if a better knowledge of model error is available. Of course, the discrepancy that is more complicated requires more test data. For example, if a true discrepancy form is used, it could calibrate to the exact *E* = 68,000 MPa with a perfect prediction. For the Bayesian calibration method, however, the calibration and validation results can be slightly improved by using more observation data.

#### Case VI: Calibration Two Parameters Based on Three Observations.

To demonstrate the performance of sensitivity-based calibration when multiple parameters need to be calibrated, the cantilever beam example is extended to two calibration parameter cases ($\theta $ = {*E*, *L*}). In this case, three observations were used as the training points (*h* = 8, 10, 12 mm) and one observation as the validation point (*h* = 13 mm). As shown in Table 1, it was observed that sensitivity-based calibration method generally outperforms the conventional least-square calibration method and has a similar accuracy with the Bayesian method.

. | Calibration parameters . | Validation (h = 13)
. | ||||
---|---|---|---|---|---|---|

. | E (MPa)
. | Relative error (%) . | L (mm)
. | Relative error (%) . | V_{tip} (mm)
. | Relative error (%) . |

True value | 68,000 | 20 | 0.08394 | |||

Conventional method | 53,770 | 20.93 | 19.31 | 3.46 | 0.07313 | 12.88 |

Sensitivity-based method | 66,667 | 1.96 | 18.62 | 6.92 | 0.08504 | −1.31 |

Bayesian calibration | 66,691 | 1.92 | 20.19 | −0.95 | 0.08495 | −1.20 |

. | Calibration parameters . | Validation (h = 13)
. | ||||
---|---|---|---|---|---|---|

. | E (MPa)
. | Relative error (%) . | L (mm)
. | Relative error (%) . | V_{tip} (mm)
. | Relative error (%) . |

True value | 68,000 | 20 | 0.08394 | |||

Conventional method | 53,770 | 20.93 | 19.31 | 3.46 | 0.07313 | 12.88 |

Sensitivity-based method | 66,667 | 1.96 | 18.62 | 6.92 | 0.08504 | −1.31 |

Bayesian calibration | 66,691 | 1.92 | 20.19 | −0.95 | 0.08495 | −1.20 |

## Honeycomb Tube Example

### Problem Description.

The honeycomb structure is one of the commonly used cellular structures in engineering applications. For the purpose of lightweight in automotive applications, a very thin plate is often used, which is difficult to measure the accurate material parameters from specimen tests. In this study, the quasi-static crush experiments of aluminum honeycomb tubes were used to calibrate its material properties.

Three cylindrical honeycomb tubes are used in this paper with different combinations of cell sizes *c* and thicknesses *t* (see Fig. 9). As shown in Fig. 8, C3T8 represents the tube with a cell size of 3 mm and thickness of 0.08 mm. The other two cross-sectional configurations are C3T7 (*c* = 3 mm, *t* = 0.07 mm) and C6T5 (*c* = 6 mm, *t* = 0.05 mm). The cross-sectional diameters of these tubes are 57.35 mm and their lengths are 150 mm. Under the assumption that the discrepancy between test result and simulation result is independent of design variable, it should be noted that sensitivity-based method could be applied to a high-dimension design problem. To calibrate parameters, at least two test data are needed even for the high-dimension problem.

### Finite Element Modeling.

The crushing behavior of the above-mentioned honeycomb tubes was simulated by the commercial finite element analysis code LS-DYNA. As shown in Fig. 9, the finite element model is composed of the honeycomb tubes with different cross-sectional configurations, the striker, and the base. The striker with a mass of 600 kg moves down from the top of the tube with a constant velocity of *v* = 1 m/s to simulate the quasi-static load. The base fully constrains the bottom end of the tube. The Belytschko-Lin-Tsay shell elements with five integration points through the thick were used to model the honeycomb tubes. To avoid volumetric locking, the reduced integration technique was utilized with hourglass control to suppress spurious zero-energy deformation modes. Based on a mesh convergence study, the mesh size of 0.5 mm was selected for the tube. The interfaces between the tube and striker as well as between the tube and rigid base were simulated with “automatic node to surface” algorithm. To avoid interpenetration during crushing, “automatic single surface” contact was employed to simulate the contacts between the tubes. The friction coefficient of 0.15 was selected for the Coulomb friction model for all the contact surfaces [18].

An elastoplastic material model (MAT3) in LS-DYNA was used for Aluminum 3003-H18 as the honeycomb tubes go through high plasticity [19]. The mechanical properties are: density = 2700 kg/m^{3}, Poisson's ratio = 0.33, Young's modulus *=* 68 GPa, and initial yield stress $\sigma y$ = 185 MPa [20–22]. Since the yield stress has a significant effect on the energy absorption capability, it was selected as the calibration parameter.

### Axial Crush Tests for Honeycomb Tubes.

As shown in Fig. 10(a), nine honeycomb tubes (repeating three times for each design point C6T5, C3T7 and C3T8) were prepared for the crushing test. The wire cut using electrical discharge machining process was utilized to cut the honeycomb block into a cylinder tube with a precision of 20 *μ*m. The material of the specimens was the same as the simulation model, namely, Aluminum 3003-H18. The specimens also have the same geometry of the simulation model.

The axial quasi-static test for the three honeycomb tubes was conducted by using the MTS 322 testing machine with 100 kN capacity (Fig. 10(b)). Figure 11 plots the load–displacement curves for three honeycomb designs. It was observed that the reaction force increases with the increase of thickness and decreases with the increase of cell length. All of the honeycomb tubes deformed in a stable progressive buckling mode. The force increases rapidly in the initial stage and then oscillates during the crush. The energy absorption capacity is mainly affected by the oscillating region, which is closely related to the average force (*F*_{avg}). Therefore, the average force is selected as the performance criterion to evaluate the crashworthiness performance of the honeycomb tubes.

### Calibration and Validation Results Analysis and Discussion.

In this study, C6T5 and C3T8 were used as the training points, while C3T7 as the validation point as shown in Fig. 8. The objective is to calibrate the yield stress and predict *F*_{avg} accurately at the validation point. In order to find calibration parameter, 12 uniformly distributed samples ranging from $100\u2009\u200a\u200a\u200aMPa\u2264\sigma y\u2264200\u2009\u200a\u200aMPa$ were used to build the surrogate model. Linear polynomial response surface was employed to approximate *F*_{avg} for each honeycomb tube for conventional least squares and sensitivity-based calibration method. Five random validation points were generated to evaluate the fitting accuracy of the linear polynomial response surface surrogate model. The relative errors between the surrogate model and the simulation results at these validation points were less than 5%. It was observed that the surrogate model can provide the acceptable approximation for performing the following calibration and validation process. On the other hand, Bayesian calibration method used GP model to represent the simulation responses.

The calibrated parameters and validation results are plotted in Fig. 12. The yield stress obtained from sensitivity method is 177 MPa, which is the closest to the true value of 185 MPa. The yield stress of 176 MPa from the Bayesian method is also similar to the sensitivity result and outperforms the conventional least squares method (167 MPa) for parameter calibration. For this example, the prediction accuracies in both validation point and training points were evaluated by the error sum of the prediction errors, which indicated that sensitivity-based could achieve better accuracy in whole design range than the conventional method.

The relative errors of validation and calibration results are depicted in Fig. 13. In order to exclude the effect of selecting a validation point, all three possible combinations of the test results were used to obtain the calibration and validation results (see Fig. 13). For example, when C6D5 and C3D7 were selected as the training points, the C3D8 will be used as the validation point, and vice versa. The calibration errors are the relative errors of the estimated calibration parameter compared to the true yield stress [20–22]. The validation errors are the sum of the relative errors of both training points and validation point. It was observed that the sensitivity-based and Bayesian method could obtain comparatively accurate calibration parameters and good prediction capability for the whole design range for all combination cases. It was observed that sensitivity-based calibration method performed generally well for all combination cases, while the Bayesian calibration yielded a poor validation when using C3T7 and C3T8 as training points. This might be related to difficulty in finding the maximum likelihood in a high dimension.

## Conclusions

This study aimed to apply the sensitivity-based approaches to solve the engineering problem with model error and numerical error but with a limited number of observation data. This method uses a constant discrepancy to consider the model error, numerical error, and/or measurement error when the performance is mildly nonlinear in the design space, even if it is highly nonlinear in the physical domain, like crash problems as shown in Sec. 3.4. In this regard, the sensitivity-based calibration method tries to match the sensitivity of simulation predictions with that of observations to estimate the calibration parameters, and then to find out the constant value to compensate for the discrepancy between the predictions and the observations. A cantilever beam model and a honeycomb tube example were used to demonstrate the benefits of the proposed method.

For the cantilever beam example, the Timoshenko beam model was regarded as the true model, and the Euler beam model was considered as the simulation model. Three calibration methods, i.e., the conventional calibration methods using least squares, the sensitivity-based calibration method, and the Bayesian calibration method, were used for comparison. It was found that both the sensitivity-based and the Bayesian calibration methods perform well for two data cases. However, the Bayesian calibration method requires building a GP model based on many simulation samples. It was also observed that when a model error is present, having more data would not improve the accuracy of calibration and prediction. The accuracy would depend on the correctness of discrepancy function, which requires a knowledge of the model error.

In the second example, the yield stress of the aluminum honeycomb structure was calibrated based on the crushing test results on two training design points, and the prediction results were evaluated at a validation point. This is a real engineering two-dimension design problem. It was observed that the calibration and validation results obtained from the sensitivity-based method and Bayesian method were also similar to each other. The estimated parameters from these two methods were more accurate than that of the conventional method. Their prediction accuracies at both calibration and validation points were much better than that of the conventional method using least squares, which indicated that sensitivity-based and Bayesian calibration have good prediction capability at the whole design range.

Overall, the sensitivity-based and Bayesian calibration model generally perform better than the conventional least squares method for parameter estimation and model validation. However, the Bayesian calibration method requires many simulations to build the GP model. Furthermore, it is important to include the model discrepancy (due to model error or numerical error in simulation) to have accurate calibration and prediction. It should be noted that sensitivity-based calibration method can be applied to a high-dimension design problem; all it needed is at least two test observations. It is also acknowledged that when more data are available and when more knowledge on model error is available, it is possible to include a complicated form of discrepancy function to improve prediction accuracy.

## Acknowledgment

The first author is a recipient of the doctoral scholarships from China Scholarship Council (CSC).

## Funding Data

U.S. Department of Energy, National Nuclear Security Administration (Contract No. DE-NA0002378).

### Appendix

**d**and unknown calibration parameters

**θ**and its prediction uncertainty with a Gaussian distribution as

where $\psi m={bm,\sigma m,\lambda m,\lambda \theta}$ are the hyperparameters of the simulation GP model. $bm$ and $\sigma m$ denote a coefficient vector and a process standard deviation. $\lambda m$ and $\lambda \theta $ are the roughness parameters for design variable **d** and calibration parameter $\theta $. It should be noted that when the calibration parameters are obtained as $\theta *$, the response of a calibrated simulation, $ymodel(d,\theta *)$, is only a function of **d** while the un-calibrated model $ymodel(d,\theta )$ is a function of **d** and $\theta $.

where $\psi \delta ={b\delta ,\sigma \delta ,\lambda \delta}$ is the hyperparameter vector of the simulation GP model.

The maximum likelihood method is often used to estimate the parameters of GP models, where the likelihood function is obtained using the combined GP model in the form of a multivariate normal distribution.

*y*

_{pre}is given at data points

*D*

_{1}= {(

**x**

_{1}, $\theta 1$),…,(

**x**

_{n}, $\theta n$)} and test data set

*y*test at

*D*

_{2}= {

**x**

_{1},…,

**x**

_{m}}, the GP model for the given datasets is expressed as

It is noted that many simulations are required to build the GP model. In general, it is assumed that relatively large data are generated from simulations, while a small number of data are used from test; that is $n\u226bm$.

The model parameters are found by maximizing the likelihood function. However, it is impractical for finding the calibration parameter and hyperparameters simultaneously for most problems because of too many unknowns. Kennedy and O'Hagan [7] introduced a sequential parameter estimation approach. In this framework, however, the calibrated parameter $\theta $ may not be physical. This is because this Bayesian calibration framework calibrates parameters by maximizing the likelihood of simulation with other hyperparameters. Therefore, the correlation between parameters can lead to inaccurate calibration.