## Abstract

Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design. However, there is currently a lack of general techniques that can jointly fuse multiple data sets with varying fidelity levels while also estimating calibration parameters. To address this gap, we introduce a novel approach that, using latent-map Gaussian processes (LMGPs), converts data fusion into a latent space learning problem where the relations among different data sources are automatically learned. This conversion endows our approach with some attractive advantages such as increased accuracy and reduced overall costs compared to existing techniques that need to take a combinatorial approach to fuse multiple datasets. Additionally, we have the flexibility to jointly fuse any number of data sources and the ability to visualize correlations between data sources. This visualization allows an analyst to detect model form errors or determine the optimum strategy for high-fidelity emulation by fitting LMGP only to the sufficiently correlated data sources. We also develop a new kernel that enables LMGPs to not only build a probabilistic multi-fidelity surrogate but also estimate calibration parameters with quite a high accuracy and consistency. The implementation and use of our approach are considerably simpler and less prone to numerical issues compared to alternate methods. Through analytical examples, we demonstrate the benefits of learning an interpretable latent space and fusing multiple (in particular more than two) sources of data.

## 1 Introduction

Computer models are increasingly employed in the analysis and design of complex systems. For a particular system, there are typically various models available whose fidelity is generally related to their costs; i.e., accurate models are generally more expensive. In such a scenario, *multi-fidelity modeling* techniques are adopted to balance costs and accuracy when using all these models in the analyses [1,2]. Additionally, computer models typically have some *calibration* parameters which are estimated by systematically comparing their predictions to experiments/observations [3]. These parameters either correspond to some properties of the underlying system being modeled or act as tuning knobs that compensate for the model deficiencies. In this paper, we introduce a versatile, efficient, and unified approach for emulation-based multi-fidelity modeling and calibration (henceforth, we use the term data fusion to refer to both multi-fidelity modeling and calibration because they all involve fusing or assimilating multiple sources of data). Our approach is based on latent-map Gaussian processes and its core idea is to convert data fusion into a learning process where different data sources are related in a nonlinearly learned manifold.

Over the past few decades, many data fusion techniques have been developed for outer-loop applications such as design optimization, sequential sampling, or inverse parameter estimation. For example, multi-fidelity modeling can be achieved via space mapping [4–6] or multi-level [7–9] techniques where the inputs of the low-fidelity data are mapped via *x*_{l} = ** F**(

*x*_{h}). In this equation,

*x*_{l}and

*x*_{h}are the inputs of low- and high-fidelity data sources, respectively, and

**(·) is the transformation function whose**

*F**predefined*functional form is calibrated such that

*y*

_{l}(

**(**

*F*

*x*_{h})) approximates

*y*

_{h}(

*x*_{h}) as closely as possible. These techniques are particularly useful in applications where higher fidelity data are obtained by successively refining the discretization of the simulation domain [7,9], e.g., by refining the mesh when simulating the flow over an airfoil. The main disadvantage of space mapping techniques is that choosing a near-optimal functional form for

**(·) is iterative and very cumbersome.**

*F*Two of the most important aspects of multi-fidelity modeling are choosing the emulators that surrogate the data sources and formulating the relation between these emulators. Correspondingly, several methods have been developed based on Gaussian processes (GPs) [3], Co-Kriging [10], polynomial chaos expansions [11,12], and moving least squares [13]. The interested reader is referred to Refs. [2,14] for more comprehensive reviews on multi-fidelity modeling and how they benefit outer-loop applications.

Multi-fidelity modeling is closely related to the calibration of computer models since the latter also involves working with at least two data sources where typically the low-fidelity one possesses the calibration parameters. Besides the traditional ways of estimation that are ad hoc and involve trial and error, there are more systematic methods that are based on generalized likelihood [15] or Bayesian principles [16].

Among existing methods for multi-fidelity modeling and calibration, the most popular emulator-based method in engineering design is that of Kennedy and O’Hagan (KOH) [3] which assimilates and emulates two data sources while estimating calibration parameters of the low-fidelity source (if there are any such parameters). KOH’s approach is one of the first attempts that considers a broad range of uncertainty sources arising during the calibration and subsequent uses of the emulator. This approach has been used in many applications including climate simulations [17], materials modeling [18], and modeling shock hydrodynamics [19].

KOH’s approach assumes that the discrepancies between the two data sources are additive^{2} and that both data sources and the discrepancy between them can be modeled via GPs. The approach then uses (fully [20,21] or modular [18,22–25]) Bayesian inference to find the posterior estimates of the GPs as well as the calibration parameters. The fully Bayesian version of KOH’s method offers advantages such as low computational costs for small data sets or quantifying various uncertainty sources (e.g., lack of data, noise, model form error, and unknown simulation parameters). However, obtaining the joint posteriors via Markov chain Monte Carlo (MCMC) is quite effortful and expensive, especially in high dimensions or with relatively large datasets. The modular version of KOH’s approach addresses this limitation by typically using point estimates for the GP hyperparameters of the low-fidelity data [3,23]. These estimates are obtained via maximum likelihood estimation (MLE) and, while they result in a small under-estimation of uncertainties with small data, provide accurate mean predictions.

A major limitation of KOH’s approach and other reviewed data fusion techniques is that they only accommodate two data sources at a time. That is, the fusion process must be repeated *p* times if there are *p* low-fidelity and one high-fidelity data sources. In addition to being tedious and expensive, this repetitive process does not provide a straightforward diagnostic mechanism for comparing the low-fidelity sources to identify, e.g., which one(s) perform similarly or have the smallest model form error.

In this paper, we aim to address the abovementioned limitations of the existing technologies for data fusion. Our primary contributions are threefold and summarized as follows. First, we convert multi-fidelity modeling into a latent space learning problem. This conversion is achieved via latent-map Gaussian processes (LMGPs) and endows our approach with important advantages such as flexibility to jointly fuse any number of data sources and ability to visualize correlations between them. This visualization provides the user with an easy-to-interpret diagnostic measure for identifying the relations between different data sources. We believe the joint fusion (of more than two sources) and the accompanying visualization aids reduce the overall costs of multi-fidelity modeling compared to reviewed methods since they eliminate the iterative process of data source selection and link the fusion results across the iterations (note that our approach is also applicable to problems with two data sources). Second, we develop a new kernel function that enables LMGPs to not only build a probabilistic multi-fidelity surrogate but also estimate calibration parameters with high accuracy and consistency. Third, the implementation of our approach is considerably simpler and less prone to numerical issues compared to the reviewed technologies (especially KOH’s approach).

The rest of the paper is organized as follows. In Sec. 2, we briefly review the relevant technical background on GPs and LMGPs (see Sec. 7 for Nomenclature). In Sec. 3, we introduce our approach to multi-fidelity modeling and calibration while demonstrating its performance on four pedagogical examples. In Sec. 4, we validate our approach against GPs and KOH’s method on six analytic and engineering examples. We conclude the paper in Sec. 5 by discussing the advantages and limitations of our approach, considerations that should be made in its application, and its application to multi-response problems.

## 2 Emulation via Latent-Map Gaussian Processes

We review emulation via GPs and a variation of GPs (i.e., LMGP) for data sets that include categorical inputs. Throughout, symbols or numbers enclosed in parentheses encode sample numbers and are used either as subscripts or as superscripts. For example, *x*_{(i)} or *x*^{(i)} denote the *i*th sample in a training data set while *x*_{i} indicates the *i*th component of the vector $x=[x1,x2,\u2026,xdx]T$. We use *h* and *l* either as superscript or as subscript to denote high- and low-fidelity data sources. For instance, $xh(i)$ and $yh(i)$ denote, respectively, the inputs and output of the *i*th sample in the high-fidelity data set. In cases where there is more than one low-fidelity source, we add a number to the *l* symbol, e.g., $yl3(x)$ denotes the third low-fidelity source. Lastly, we distinguish between the data source (or the underlying function) and samples by specifying the functional dependence (e.g., *y*(** x**) is a function while

*y*and

**are, respectively, a scalar and a vector of values).**

*y**d*

_{x}-dimensional vector $x=[x1,x2,\u2026,xdx]T$ and the scalar

*y*. Assume the training data come from a realization of a GP defined as

*η*(

**) =**

*x***(**

*f***)**

*x***+**

*β**ξ*(

**) where**

*x***(**

*f***) = [**

*x**f*

_{1}(

**), …,**

*x**f*

_{h}(

**)] are a set of pre-determined parametric functions and**

*x***= [**

*β**β*

_{1}, …,

*β*

_{h}]

^{T}are the unknown coefficients.

*ξ*(

**) is a zero-mean GP whose parameterized covariance function is**

*x**σ*

^{2}is the process variance and

*r*( ·, · ) is a user-defined parametric correlation function. There are many types of correlation functions [26,27], but the most common one is the Gaussian kernel

*ω*

_{i}< ∞ are the roughness or scale parameters (in practice the ranges are limited to −10 <

*ω*

_{i}< 6 ensure numerical stability [26,28]) and Ω

_{x}=

*diag*(10

^{ω}).

The correlation function in Eq. (2) depends on the distance between two arbitrary input points ** x** and

**. Hence, traditional GPs cannot accommodate categorical inputs (such as gender and zip code) as they do not possess a distance metric. This issue is well established in the literature, and there exist a number of strategies that address it by reformulating the covariance function such that it can handle categorical variables [29–32]. In this paper, we use LMGPs [33] which are recently developed and shown to outperform previous methods.**

*x**′**t*

_{i}is

*m*

_{i}. For instance,

*t*

_{1}= {92697, 92093} and

*t*

_{2}= {

*math*,

*physics*,

*chemistry*} are two categorical inputs that encode zip code (

*m*

_{1}= 2 levels) and course subject (

*m*

_{2}= 3 levels), respectively. Inputs for mixed (numerical and categorical) training data are collectively denoted by

**= [**

*u***;**

*x***], which is a column vector of size (**

*t**d*

_{x}+

*d*

_{t}) × 1. To handle mixed inputs, LMGP maps categorical variables to some points in a manifold. This mapping allows using any standard correlation function such as the Gaussian which is reformulated as follows:

**. To find these points in the latent space, LMGP first assigns a unique vector (i.e., a prior representation) to each combination of categorical variables. Then, it uses matrix multiplication to map each of these vectors to a point in a manifold of dimensionality**

*t**d*

_{z}

**(**

*ζ***) is the $1\xd7\u2211i=1dtmi$ unique prior vector representation of**

*t***and**

*t***is a $\u2211i=1dtmi\xd7dz$ matrix that maps**

*A***(**

*ζ***) to**

*t***(**

*z***). In this paper, we use**

*t**d*

_{z}= 2 since it simplifies visualization and has also been shown to provide sufficient flexibility for learning the latent relations [33].

We can construct ** ζ** in a number of ways, see Ref. [33] for more information on selecting the priors. In this paper, we use a form of one-hot-encoding. Specifically, we first construct the 1 ×

*m*

_{i}vector $\nu i=[\nu 1i,\nu 2i,\u2026,\nu mii]$ for the categorical variable

*t*

_{i}such that $\nu ji=1$ when

*t*

_{i}is at level

*j*and $\nu ji=0$ when

*t*

_{i}is at level

*k*≠

*j*for,

*k*∈ 1, 2, · · ·,

*m*

_{i}. Then, we set $\zeta (t)=[\nu 1,\nu 2,\cdots ,\nu dt]$. For instance, in the above example with two categorical variables,

*t*

_{1}= {92697, 92093} and

*t*

_{2}= {

*math*,

*physics*,

*chemistry*}, we encode the combination

**= [92093,**

*t**physics*]

^{T}by

**(**

*ζ***) = [0, 1, 0, 1, 0] where the first two elements encode zip code while the rest encode the subject.**

*t***,**

*A***,**

*β***, and**

*ω**σ*

^{2}must be determined based on the data. These estimates can be found via either cross-validation (CV) or MLE. Alternatively, Bayes’ rule can be applied to find posterior distributions of the hyperparameters if prior knowledge is available. In this paper, MLE is employed because it provides a high generalization power while minimizing the computational costs [27,34]. MLE works by estimating

**,**

*A***,**

*β***, and**

*ω**σ*

^{2}such that they maximize the likelihood of

*n*training data being generated by

*η*(

**). This optimization can be equivalently expressed as**

*x***and $\sigma ^2$ are now functions of both**

*R***and**

*ω***, log(·) is the natural logarithm, |·| denotes the determinant operator,**

*A***= [**

*y**y*

_{(1)}, …,

*y*

_{(n)}]

^{T}is the

*n*× 1 vector of outputs in the training data,

**is the**

*R**n*×

*n*correlation matrix with the (

*i*,

*j*)th element

*R*

_{ij}=

*r*(

*x*_{(i)},

*x*_{(j)}) for

*i*,

*j*= 1, …,

*n*, and

**is the**

*F**n*×

*h*matrix with the (

*k*,

*l*)th element

*F*

_{kl}=

*f*

_{l}(

*x*_{(k)}) for

*k*= 1, …,

*n*and

*l*= 1, …,

*h*. By setting the partial derivatives with respect to

**and**

*β**σ*

^{2}to zero, their estimates can be solved in terms of

**and**

*ω***as follows:**

*A*By minimizing *L* one can solve for $A^$ and $\omega ^$ subsequently obtain $\beta ^$ and $\sigma ^2$ using Eqs. (6) and (7). While many heuristic global optimization methods exist such as genetic algorithms [35] and particle swarm optimization [36], gradient-based optimization techniques based on, e.g., the L-BFGS algorithm [37], are generally preferred due to their ease of implementation and superior computational efficiency [26,38]. With gradient-based approaches, it is essential to start the optimization via numerous initial guesses to improve the chances of achieving global optimality [33,38].

After obtaining the hyperparameters via MLE, the response at any $x*$ is estimated via $E[y*]=f(x*)\beta ^+gT(x*)V\u22121(y\u2212F\beta ^)$ where $E$ denotes expectation, $f(x*)=[f1(x*),\u2026,fh(x*)]$, $g(x*)$ is an *n* × 1 vector with the *i*th element $c(x(i),x*)=\sigma ^2r(x(i),x*)$, and ** V** is the covariance matrix with the (

*i*,

*j*)th element $\sigma ^2r(x(i),x(j))$. Additionally, The posterior covariance between the responses at the two inputs $x*$ and

**′ is $cov(y*,y\u2032)=c(x*,x\u2032)\u2212gT(x*)V\u22121g(x\u2032)+h(x*)(FTV\u22121F)\u22121h(x\u2032)T$ where $h(x*)=(f(x*)\u2212FTV\u22121g(x*))$.**

*x*The above formulations can be easily extended to cases where the data set is noisy. GPs (and hence LMGPs) can address noise and smoothen data by using a nugget or jitter parameter, *δ*, which is incorporated into the correlation matrix. That is, ** R** becomes $R\delta =R+\delta In\xd7n$ where

*I*_{n×n}is the identity matrix of size

*n*×

*n*. If the nugget parameter is used, the estimated (stationary) noise variance in the data will be $\delta \sigma ^2$. The version of LMGP used in this paper finds only one nugget parameter and uses it for all categorical combinations; i.e., we assume that the noise level is the same for each data set. LMGP can be modified in a straightforward manner to have a separate nugget parameter (and hence separate noise estimate) for each categorical combination.

## 3 Proposed Framework for Data Fusion

In this section, we first explain the core idea and rationale of our approach in Sec. 3.1. Then, we detail how it is used for multi-fidelity modeling and calibration in Secs. 3.2 and 3.3, respectively. In the latter two subsections, we provide pedagogical examples to facilitate the discussions and elaborate on the benefits of the learned latent space in diagnosing the results. The notation introduced in Sec. 2 is also used here (see Sec. 7 for Nomenclature).

### 3.1 The Rationale Behind Using a Latent Space for Data Fusion.

Factors that affect the fidelity of various data sources are either known or not; in either case, they typically cannot be easily used in the fusion process. Consider an engineering application on predicting the fracture toughness of an alloy where an engineer states “model A and model B achieve errors of $7%$ and $12%$ when their predictions are tested against experimental data.” These inaccuracies and their $5%$ difference can be due to many underlying factors such as noise in the experiments, missing physics in either of the models (especially model B), uncertain material properties (i.e., calibration parameters) that affect the fracture behavior, or numerical errors associated with the computer models (e.g., coarse discretization). It is very difficult to quantitatively incorporate all these factors into data fusion. Hence, existing fusion methods such as that of the Kennedy and O’Hagan [3] assign *labels* or *qualitative* variables to data (e.g., data from “model A” or data from “experiments”) and then develop fusion formulas that break down if the underlying assumptions are incorrect or if there are many information sources.

We argue that data fusion should be based on *learned quantitative* variables instead of assigned qualitative labels to enable instruction-free and versatile fusion. We use LMGPs to learn these quantitative variables (other methods can be used as well) in a latent space that aims to encode the underlying factors which distinguish different data sources. The power of latent spaces in learning hidden factors is perhaps best exemplified in computer vision where deep neural networks encode high-dimensional images to a low-dimensional latent space where a single axis learns *smiling* (Fig. 1(a)).

As shown in Fig. 1(b), data fusion via LMGP is achieved via the following steps. First, we augment the various datasets with categorical inputs that aim to distinguish the data sources and also add unknown calibration parameters (if applicable). Then, we fit a single LMGP to the combined data set to obtain emulators of the data sources and estimates of the calibration parameters (if applicable). Finally, once the LMGP is trained, we visualize the learned latent space to analyze the relations between the sources. In the following subsections, we provide more details on each of these steps.

Following the above--mentioned description, we summarize our goals in data fusion as building emulators for each data source (especially the high-fidelity one), estimating any unknown calibration parameters, and automatically obtaining the relation between the various data sources. We also note that our approach can simultaneously fuse any number of data sources with any level of fidelity. Without lack of generality, hereafter, we will assign only one source as high fidelity and the rest of the sources are treated as low fidelity. This assignment is adopted to simplify the descriptions and does *not* affect our approach at all since we do not use any knowledge on the fidelity level during fusion (e.g., if there are two experimental and three simulation data sets, we can assign any one of them as high-fidelity and the rest as low-fidelity).

*y*

_{h}(

**), we evaluate relative root-mean-squared error (RRMSE)**

*x*

*y*_{h}refer to the vectors containing the outputs of $yli(x)$ and

*y*

_{h}(

**) at**

*x**n*input points (we use

*n*= 10

^{4}throughout), and var(

*y*_{h}) is the variance of

*y*_{h}.

### 3.2 Multi-Ffidelity Modeling via LMGP.

Using LMGP for multi-fidelity modeling is quite straightforward. Consider the case where multiple (i.e., two or more) data sources with different levels of accuracy are available, and the goal is to emulate each source while (1) having limited data, especially from the most accurate source, (2) accounting for potential noises with unknown variance, and (3) avoiding *a priori* determination of how different sources are related to each other. The last condition indicates that we do *not* know (1) how the accuracy of the low-fidelity models compare to each other, and (2) if low-fidelity models have inherent discrepancy which may be additive or not. While not necessary, we assume it is known which data source provides the highest fidelity because this source typically corresponds to either observations/experiments or a very expensive computer model.

We assume *n*_{h} high-fidelity samples are available whose inputs and output are denoted by *x*_{h} and *y*_{h}, respectively. We also presume that the data set obtained from the *i*th low-fidelity source has $nli$ samples where the inputs and outputs are denoted via $xli$ and $yli$, respectively.

With the above-mentioned points in mind, we use two examples in the following subsections to demonstrate our approach to multi-fidelity modeling.

#### 3.2.1 A Simple Analytical Example.

*not*ordered by the accuracy with respect to

*y*

_{h}(

*x*) (Table 1). Note that we do not use this knowledge of relative accuracy during multi-fidelity modeling via LMGP. Rather, by only using the datasets in LMGP, we aim to inversely discover this relation between the fidelity levels.

To perform data fusion with LMGP, we first append the inputs with one or more categorical variables that distinguish the data sources. We can use any number of multi-level categorical variables. That is, we can either (1) select a single variable with at least as many levels as there are data sources or (2) use a few multi-level categorical variables with at least as many level combinations as there are data sources. For example, with one categorical variable, we can choose *t* = {*h*, *l*_{1}, *l*_{2}, *l*_{3}}, *t* = {1, 2, 3, 4}, *t* = {1, *a*, *ab*, 2}, or *t* = {*a*, *b*, *c*, *d*, *e*} for our pedagogical example with four data sources (in the last case level *e* does not correspond to any of the data sources).

For the remainder of this paper, we use two strategies for choosing categorical variables, see Fig. 2. Strategy 1 uses one categorical variable with as many levels as data sources, e.g., *t* = {*a*, *b*, *c*, *d*} or *t* = {1, 2, 3, 4}. We add the subscript *s* to an LMGP that uses this strategy since a single categorical variable is used to encode the data sources. Strategy 2 employs multiple categorical variables where the number of variables and their levels equals the number of data sources^{3}, e.g., *t*_{i} = {*a*, *b*, *c*, *d*} with *i* = 1, 2, 3, 4. We place the subscript *m* to an LMGP that uses strategy 2 to indicate that multiple categorical variables are employed. As we explain below, having more levels (or level combinations if more than one *t* is used) than data sources provides LMGP with more flexibility to learn the relation between the sources. This flexibility comes at the expense of having larger ** A** and higher computational costs. As we demonstrate in Sec. 4, the performance of LMGP is relatively robust to this modeling choice as long as there are sufficient training samples and the number of latent positions does not greatly exceed the number of hyperparameters in

**. Regarding the latter condition, note that when LMGP must find many latent positions with a small**

*A***(i.e., a very simple map), performance may suffer due to local optimality. For example, Strategies 2 with 4 data sources results in $\Pi i=1dtmi=44=256$ latent positions (one for each possible categorical level combination where only 4 corresponds to data sources) but there are only $dz\xd7\Sigma i=1dtmi=2\xd716=32$ elements in**

*A***. These elements are supposed to map the 256 points in the latent space such that the 4 points which encode the data sources have inter-distances that reflect the underlying relation between their corresponding data sources. Without sufficient data and regularization, the learned map may provide a locally optimal solution.**

*A*The above description clearly indicates that LMGP can, in principle, fuse any number of data sets *simultaneously*. In practice, this ability of LMGP is bounded by the natural limitations of GPs such as scalability to big data or very high dimensions. The recent advancements in GP modeling for big or high-dimensional data [38–44] have addressed these limitations to some extent and can be directly used in LMGP for multi-fidelity modeling in our future works.

*t*= {1, 2, 3, 4} where the number of levels equals the number of data sources. We assume the data sets are highly unbalanced and use Sobol sequence to sample from the functions in Eq. (10) with

*n*

_{h}= 3 and $nl1=nl2=nl3=20$. Upon appending, we combine the entire data into a single training data set that is directly fed into LMGP

**1**

_{n×1}is an

*n*× 1 vector of ones. The fusion results are illustrated in Fig. 3(a) and indicate that LMGP is able to accurately emulate each data source, including

*y*

_{h}(

*x*) for which only three samples are provided. As illustrated in Fig. 3(b), a GP fitted to only data from

*y*

_{h}(

*x*) provides poor performance due to lack of data.

Plugging the latent positions into Eq. (12) shows that a relative distance of $\Delta z2=(z\u2212z\u2032)T(z\u2212z\u2032)$ between two points scales the correlation function by $exp(\u2212\Delta z2)$. Thus, we can interpret the latent space as being a distillation of the correlations between the data sources. Note, however, that the term exp{− (** x** −

**′)**

*x*^{T}

*Ω*_{x}(

**−**

*x***′)}, which accounts for the correlation between outputs at different points in the input space, remains the same as we change data sources. Thus, our modeling assumption is that this correlation is similar for all data sources. In layman’s terms, we expect each data source to have a relatively similar shape. This is often true in multi-fidelity problems and if this modeling assumption is not met, LMGP estimates Ω**

*x*_{x}to provide the best compromise between different sources, which may provide poor performance in emulation for some or all sources. To avoid making such a compromise, we can use the latent space to identify the dissimilar data source(s) and then repeat the fusion process after excluding them.

Note also that the objective function in Eq. (8) that is used to find the latent positions is invariant under translation and rotation. In order to find a unique solution, we enforce the following constraints in two dimensions (more constraints are needed for *d*_{z} > 2): latent point 1 is placed at the origin, latent point 2 is positioned on the positive *x* axis, and latent point 3 is restricted to the *y* > 0 half-plane. We assign *y*_{h}(*x*) to position 1 for both of our strategies as it yields more readable latent plots, but this choice is arbitrary and does not affect the relative distances between the latent positions as shown in Sec. 4.

Returning to our example with the above constraints in mind, we can see that the latent points corresponding to *y*_{h}(*x*) and $yl2(x)$ are close and the other points relatively distant, especially the point representing $yl3(x)$. This observation matches with our knowledge of the relative accuracies of the underlying functions with respect to *y*_{h}(*x*) (this knowledge is *not* provided to LMGP). In other words, LMGP has accurately determined the correlations between the data sources despite the sparse sampling for *y*_{h}(*x*). Given that $yl2(x)$ appears to be much more accurate than other low-fidelity sources with respect to *y*_{h}(*x*), one might consider fitting LMGP using only data from these two sources rather than all of the data to produce a more accurate high-fidelity emulator. The results of this approach, shown in Fig. 3(d), demonstrate that high-fidelity emulation performance is actually equivalent with all sources used; i.e., using less accurate sources does not make our estimate of *y*_{h}(*x*) worse in this case because they include useful information about *y*_{h}(*x*).

*S*

_{c}

*y*

_{h}(

*x*) and the points corresponding to $yl1(x)$, $yl2(x)$, and $yl3(x)$ are, respectively, 0.21, 0.10, and 0.90, which correspond to correlations of, respectively, 0.96, 0.99, and 0.45 using Eq. (12). By contrast, the rough cosine similarities are, respectively, 0.994, 0.997, and 0.911. While both measures show the same relative relationships between data sources in terms of which source has the most/least correlation/similarity, LMGP finds a much starker difference between $yl3(x)$ and

*y*

_{h}(

*x*) than the cosine similarity metric. The correlations found by LMGP better match both the RRMSE measures and the intuitive relative similarity of the functions based on looking at their plots. Note that while the cosine similarity is calculated using 10,000 test points from the analytic functions, LMGP calculates its correlation measurements based purely on the training data, i.e., three samples from the high-fidelity function and 20 samples from each low-fidelity function.

In order to support our assertion that a two-dimensional latent space is typically sufficient to encode the relationships between data sources, we show the latent space for LMGP fits all data sources with *d*_{z} = 3 in Fig. 4. We enforce the following constraints in three dimensions: latent point 1 is placed at the origin, latent point 2 is positioned on the positive *z*_{1} axis, latent point 3 is restricted to the $z3=0&z2\u22650$ half-plane, and latent point 4 is restricted to *z*_{3} ≥ 0. These constraints reduce degrees-of-freedom by restricting translation, rotation, and reflection. In this case, we find that the relative distances between the latent points in Fig. 4 are nearly the same as those in Fig. 3(c), which indicates that two dimensions are sufficient to encode the relationships between the data sources.

#### 3.2.2 Effect of Categorical Variable Assignment.

*n*

_{h}= 3, $nl1=nl2=20$, and do not apply noise to the samples. We create 30 unique quasi-random iterations (hereafter referred to as repetitions) to examine the robustness of our approach to sampling variations. As shown in Fig. 5(a), both $yl2(x)$ and $yl1(x)$ are equally accurate as they differ from

*y*

_{h}(

*x*) by a ±0.1

*x*

^{3}term. This time, we fit LMGP using both strategies for categorical variable assignment and examine the effect of this choice as well as the size of the training data sets on the results. We use the subscript

*All*to denote the fact that we fit LMGP to all available data and employ the subscripts

*l*

_{i}to refer to an LMGP fitted via only

*y*_{h}and $yli$.

The latent space for LMGP using one categorical variable is demonstrated in Fig. 5(b) and shows that this strategy enables LMGP to learn that both sources have inaccuracy with respect to *y*_{h}(*x*). However, LMGP consistently finds one source to be significantly more accurate than the other as a result of the sparse sampling. By contrast, the positions found by LMGP using multiple categorical variables are very inconsistent across repetitions and often estimate one of the sources as being either extremely correlated or uncorrelated with *y*_{h}(*x*) (Fig. 5(c)). This inconsistency is because LMGP_{m All} has quite a few hyperparameters (1 roughness parameter and 18 parameters in the ** A** matrix), which are difficult to estimate with scarce data. Across the repetitions of LMGP

_{m All}, at least one data source is always found to be well correlated with

*y*

_{h}(

*x*) so high-fidelity predictions are still good and much better than fitting a traditional GP to only the high-fidelity data (Fig. 5(d)). When we increase the available data to

*n*

_{h}= 15, $nl1=nl2=50$, both LMGP

_{s All}and LMGP

_{m All}consistently (i.e., across repetitions) find latent positions for the low-fidelity functions that are approximately equidistant from

*y*

_{h}(

*x*). We demonstrate this in Fig. 6, which shows histograms of the distances between the latent points for

*y*

_{h}(

*x*) and $yl1(x)$ or $yl2(x)$ in (

*a*) and (

*b*), respectively. Notably, LMGP

_{m All}is less consistent in both cases, with a few poor-performing outliers in Fig. 6(b). Interestingly, the positions for the two low-fidelity sources are in opposite directions from

*y*

_{h}(

*x*) which agrees with the fact that discrepancies are equal but of opposite sign (Figs. 5(e) and 5(f)). Notably, as we show in Fig. 7, this property is not a result of the constraints we apply to the latent points during fitting and persists even when no constraints are applied; i.e., all three points lie on a line.

While we did not apply noise to the samples in these pedagogical examples, as we demonstrate in Sec. 4, LMGP is fairly robust to noise both with respect to emulation performance and finding latent positions.

### 3.3 Calibration via LMGP.

Calibration problems closely resemble multi-fidelity modeling in that a number of high- and low-fidelity data sets are assimilated or fused together. However, in such problems, low-fidelity data sets^{4} typically involve calibration inputs which are not directly controlled, observed, or measured in the high-fidelity data (i.e., high-fidelity data have fewer inputs). Hence, in addition to building surrogate models, one seeks to *inversely* estimate these inputs during the calibration process.

**and**

*x***, respectively (note that**

*z***encodes data sources as per Sec. 3.2). While these inputs are shared across all data sources, the low-fidelity data sources have additional quantitative inputs,**

*z***, whose “best” values must be estimated using the high-fidelity data. We represent these “best” values by $\theta *$ which minimize the discrepancies between low- and high-fidelity data sets based on an appropriate metric. In the case that one wishes to calibrate and assimilate multiple computer models simultaneously, we assume that the calibration parameters are shared between the low-fidelity data sets and are expected to have the same best value. Our estimate of $\theta *$ is denoted by $\theta ^$ and is obtained via MLE by modifying LMGP’s correlation function as**

*θ*

*x*^{(i)},

*x*^{(j)}, $\Omega x$,

*z*^{(i)}, and

*z*^{(j)}are defined as before.

*θ*^{(i)}denotes the calibration parameters of sample

*i*and Ω

_{θ}is the diagonal matrix of roughness/scale parameters associated with

**. When one or both of the inputs to the correlation function lack calibration parameters (i.e., at least one of the inputs corresponds to a high-fidelity sample), we substitute $\theta ^$ in the last term of Eq. (15). If both inputs are from the high-fidelity data, the term exp{− (**

*θ*

*θ*^{(i)}−

*θ*^{(j)})

^{T}

*Ω*_{θ}(

*θ*^{(i)}−

*θ*^{(j)})} does not affect the correlation because $exp{\u2212(\theta ^\u2212\theta ^)T\Omega \theta (\theta ^\u2212\theta ^)}=exp{0}=1$

Preprocessing the data for calibration via LMGP is schematically illustrated in Fig. 8. Following the same procedure described in Sec. 3.2, we append the inputs with categorical variables to distinguish data sources. We also augment the high-fidelity inputs with some unknown values to account for the missing calibration parameters. Once the mixed data set that contains *all* the low- and high-fidelity data are built, we directly use it in LMGP to not only build emulators for each data source but also estimate $\theta ^$. Similar to multi-fidelity modeling, any number of data sets can be simultaneously used via LMGP for calibration.

We now illustrate the capabilities of LMGPs for calibration via two analytical examples where there are one high-fidelity data source *y*_{h}(*x*) and up to two low-fidelity data sources, denoted by $yl1(x)$ and $yl2(x)$. We presume that in both examples the goals are to accurately emulate the high-fidelity data source and estimate the calibration parameters. We note that once an LMGP is trained, it provides an emulator for each data source but here we only evaluate accuracy for surrogating *y*_{h}(*x*) since much fewer data points are available from it, and hence, emulating it is more difficult.

#### 3.3.1 A Simple Calibration Problem.

*y*

_{h}(

*x*) and 25 samples from each of $yl1(x)$ and $yl2(x)$ (none of the datasets are corrupted with noise)

We set $\theta *$ as 0.1 because it is the true value of the coefficient on the leading *x*^{3} term. Note that $yl1(x)$ can match *y*_{h}(*x*) perfectly with an appropriate choice of *θ*; i.e., $yl1(x)$ has no model form error when $\theta ^=0.1$ (Fig. 9(a)). Conversely, no value of *θ* allows $yl2(x)$ to match *y*_{h}(*x*) since $yl2(x)$ has a linear model form error. When solving this calibration problem, we assume there is no knowledge on whether low-fidelity models have discrepancies and expect the learned latent space of LMGP to provide diagnostic measures that indicate potential model form errors.

As shown in Fig. 9(b), the learned latent positions by LMGP are quite consistent with our expectations despite the fact that limited and unbalanced data are used in LMGP’s training. It is evident that the latent positions corresponding to *y*_{h}(*x*) and $yl1(x)$ are very close to each other, indicating negligible model form error. In contrast, the positions corresponding to *y*_{h}(*x*) and $yl2(x)$ are more distant which signals that $yl2(x)$ has model form error.

The learned latent positions in Fig. 9(b) suggest that $yl1(x)$ (when calibrated properly) captures the behavior of *y*_{h}(*x*) better than $yl2(x)$. Correspondingly, one may argue calibrating $yl1(x)$ individually may improve performance. To assess this argument, we fit LMGPs to three combinations of the available data sets and compare the performance of these LMGPs in terms of estimating $\theta *$ and emulating *y*_{h}(*x*). In all three cases, we use a single categorical variable to encode the data source, and hence, the subscript *s* is appended to the model names (so, $LMGPsl1$ calibrates $yl1(x)$ via *y*_{h}(*x*) and uses a single categorical variable). The results are shown in Figs. 9(c) and 9(d) and indicate that using both low-fidelity data sets provides the best performance since (1) $\theta ^s$ are estimated more consistently as the distribution is centered at $\theta *$ with small variations, and (2) errors (measured in terms of mean squared error, MSE) for predicting *y*_{h}(*x*) are smaller. These observations can be explained by the fact that the highest relative distance between data sources in Fig. 9(b) is on the order of 0.05, which indicates that LMGP finds $yl2(x)$ to be very similar to *y*_{h}(*x*) and $yl1(x)$ as this distance scales the correlation function by exp{( − 0.05)^{2}} ≈ 0.998. That is, LMGP can distill useful knowledge from the correlation between $yl2$ and other sources to improve its performance in estimating *θ* and emulating *y*_{h}(*x*). When $yl1(x)$ is excluded from the calibration process and only $yl2(x)$ is used in calibration, LMGP provides biased and less consistent estimates for *θ* and relatively large MSEs for predicting *y*_{h}(*x*).

While the distance in the latent space typically encodes model form error that is not reduceable by adjusting *θ*, LMGP may mistake model form error for noise in the case that certain calibration parameters allow the low-fidelity model to closely match the high-fidelity function. This is the case if we fit LMGP to only *y*_{h}(*x*) and $yl2(x)$. As shown in Fig. 9(e), LMGP places the latent positions for *y*_{h}(*x*) and $yl2(x)$ very close to each other when $yl1(x)$ is excluded. We explain this observation by referring back to Fig. 9(c) where $LMGPsl2$ finds $\theta ^\u22480.25$. Plotting $yl2(x)$ for this value of *θ* reveals that it can nearly interpolate the training data (Fig. 9(g)). As such, LMGP mistakes 0.25 for the true value of *θ* and dismisses the small resultant error as noise. This also explains the aforementioned bias and inconsistency in estimating *θ* across repetitions as the value that comes closest to interpolating *y*_{h}(*x*) is different depending on sampling variations. By contrast, LMGP fit to all data is able to leverage the information from $yl1(x)$ to determine that $yl2(x)$ has model form error. And, as expected, no model form error is indicated in the latent space if only $yl1(x)$ is used in calibration (Fig. 9(f)).

As this simple example clearly indicates, a simultaneous fusion of *multiple* (i.e., more than 2) data sources can decrease identifiability issues in calibration. This property is one of the main strengths of our data fusion approach.

#### 3.3.2 Calibration With Severe Model Form Error.

Based on Eq. (18), $\theta *$ can be either *π* or 10*π* so the range of *θ* in *y*_{l}(*x*) is chosen wide enough to include both values. As shown in Fig. 10(a), considering $\theta *=\pi $ implies that the high-fidelity source is either noisy or has a high-frequency component that is missing from the low-fidelity source (note that in realistic applications the functional form of data sources is unknown so high-frequency trends can be easily misclassified as noise in which case they are typically smoothed out, i.e., not learned). Conversely, considering $\theta *=10\pi $ implies that *y*_{l}(*x*) is expected to surrogate the high-frequency component of *y*_{h}(*x*) and that sin(π*x*) is the discrepancy. Note that the analytic MSEs (calculated by comparing *y*_{h}(*x*) and *y*_{l}(*x*) at 10,000 sample points equally spaced over the input range) and cosine similarities (between *y*_{h}(*x*) and *y*_{l}(*x*), also at 10,000 sample points equally spaced over the input range) are identical for each choice of *θ*, i.e., both choices yield a discrepancy of the same magnitude, and we cannot determine which choice is better *a priori* based on MSEs or cosine similarity. We are interested in finding out which value is a better estimate for $\theta *$ and whether LMGP is able to consistently infer this value purely from the low- and high-fidelity data sets. We do not corrupt the data sets with noise and investigate the effect of noise in Sec. 4.2.

We now explore the effects of the low-fidelity data set size on the performance while holding the number of high-fidelity data constant. Specifically, we examine *n*_{l} = 30, 100, 200 with *n*_{h} = 15 in each case. Note that standard GP trained on only the 15 available high-fidelity samples cannot learn the high-frequency behavior of *y*_{h}(*x*) and instead interprets it as noise.

As shown in Fig. 10(b), increasing *n*_{l} improves high-fidelity prediction and we can therefore consider the estimates of *θ* and the latent distances in the *n*_{l} = 200 case to be the most accurate since they maximize prediction performance. Shown in Fig. 11(a) are histograms of the latent distances over 30 repetitions for each case. When few low-fidelity data are available, the latent distances are close to zero; with plentiful data, the latent distances are clustered around 0.5. This indicates that LMGP interprets *y*_{h}(*x*) and *y*_{l}(*x*) as being closely correlated when we have few low-fidelity data, but consistently learns that *y*_{l}(*x*) has a noticeable error with respect to *y*_{h}(*x*) as we provide more data. Without sufficient low-fidelity data, LMGP learns the low-frequency behavior of *y*_{h}(*x*) which follows sin(*πx*) and dismisses the high-frequency behavior as noise. Consequently, LMGP finds a small latent distance since *y*_{l}(*x*) can capture sin(*πx*) without error.

We now examine the histogram of $\theta ^$ in Fig. 11(b). When few low-fidelity data are available, estimates are clustered around both *π* and 10*π* while with plentiful data the estimates are tightly clustered around only 10*π*. This observation indicates that when little data are available, LMGP interprets *y*_{h}(*x*) to more closely resemble sin(*πx*) almost half of the times which matches with the observation on the learned latent distances; i.e., the high-frequency behavior is interpreted as noise and not learned. As more low-fidelity data are available, LMGP is able to learn the high-frequency behavior of *y*_{h}(*x*) using the low-fidelity data and interprets *y*_{h}(*x*) as more closely resembling sin(10*πx*).

Why does LMGP prefer $\theta ^=10\pi $ with more data? To answer this question, we note that in LMGP shifting the levels of the categorical variable is expected to reflect a change in data source. With $\theta ^=\pi $, the shift in the categorical variable is supposed to “model” sin(10*πx*), which is much more difficult than the alternative. In other words, LMGP is trying to learn the simplest function that must be represented by a shift in the categorical variable (Fig. 12(a)). We further explore this conjecture by fitting an LMGP to 100 noiseless samples from *y*_{h}(*x*) and 200 samples from *y*_{l}(*x*). This amount of data is sufficient to learn both the high-frequency behavior of *y*_{h}(*x*) and the high-frequencies of *y*_{l}(*x*) (i.e., the behavior of *y*_{l}(*x*) for large *θ*), and as such, we expect the latent positions and calibration estimates found by LMGP in this case to be optimal. As shown in Fig. 12(b), LMGP finds latent distances near 0.5 and *θ* = 10*π* very consistently; i.e., LMGP prefers to estimate the calibration parameters to minimize the complexity of the discrepancy function.

## 4 Results

To validate our approach in both multi-fidelity and calibration problems, we test our method on analytical functions and assess its performance against competing methods. In each example, we vary the size of the training data and the added noise variance and repeat the training process to account for randomness (20 times for the multi-fidelity problems and 30 times for the calibration problems). The knowledge of the value of the noise variance is *not* used in training. To measure accuracy, we use 10, 000 noisy test samples to obtain MSE (note that since the test data are noisy, the MSE obtained by an emulator cannot be smaller than the noise variance).

In our LMGP implementation, we always use *d*_{z} = 2 and select −3 ≤ *a*_{i,j} ≤ 3 during optimization where *a*_{i,j} are the elements of the mapping matrix ** A**. When using LMGP for calibration, the search space for each element of $\theta ^$ is restricted to [−2, 3] after scaling the data to the range [0, 1] (i.e., we select a search space larger than the sampling range for

**). We use the modular version of KOH’s approach where we set a uniform prior for**

*θ***over the sampling range defined in each problem statement. All optimizations are done based on the L-BFGS method, which is a second-order gradient-based optimization technique.**

*θ*### 4.1 Multi-Fidelity Results.

We consider two analytical problems with high-dimensional inputs. In the first multi-fidelity problem, we consider a set of four functions that model the weight of a light aircraft wing [45]

These functions are ten-dimensional and have varying degrees of fidelity where, following the notation introduced in Sec. 3, *y*_{h}(** x**) has the highest fidelity. Note that in $yl3(x)$ we multiply

*W*

_{p}by zero which is equivalent to reducing the dimensionality of the function by one. As enumerated in Table 2, the above functions are listed in decreasing order with respect to accuracy; that is, $yl1(x)$ and $yl3(x)$ are the most and least accurate models, respectively. Table 2 is generated by evaluating the four functions in Eq. (19) on the same 10, 000 inputs as described in Sec. 3.2 (no noise is added to the outputs). This knowledge of relative accuracy of the data sources is

*not*used when fitting an LMGP.

$yl1(x)$ | $yl2(x)$ | $yl3(x)$ | |
---|---|---|---|

RRMSE | 0.19912 | 1.1423 | 5.7484 |

$yl1(x)$ | $yl2(x)$ | $yl3(x)$ | |
---|---|---|---|

RRMSE | 0.19912 | 1.1423 | 5.7484 |

Note: The functions are listed in decreasing order with respect to accuracy, with $yl3(x)$ being especially inaccurate. 10000 points are used in calculating RRMSE.

We consider various amounts of available low-fidelity data, with and without noise. We also compare the two different settings introduced in Sec. 3.2 where subscripts *s* and *m* indicate whether a single or multiple categorical variables are used to encode the data sources in LMGP. We only take 15 samples for *y*_{h}(** x**), which is a very small number given the high dimensionality of the input space. Additionally, we investigate the effect of fusing the four datasets jointly against fusing the high-fidelity data with each of the low-fidelity sources (in the former case the subscript

*All*is appended to LMGP while in the latter case

*l*

_{1},

*l*

_{2}or

*l*

_{3}is used in the subscript depending on which source is used in addition to

*y*

_{h}(

**)).**

*x*The results are summarized in Fig. 13 and indicate that the different versions of LMGPs consistently outperform traditional GPs (only fitted to high-fidelity data) in all cases, even when only using the least accurate data source to augment high-fidelity emulation. This superior performance of LMGP is due to taking advantage of the correlations between datasets that compensates, to some extent, for the sparsity of the high-fidelity data. LMGP also has the advantage in consistency where fewer outliers are observed in MSE compared to GP. This consistency indicates that our modeling assumptions (e.g., how to encode the data source) marginally affect the performance in this example.

In cases without noise, i.e., Figs. 13(a) and 13(c), LMGPs fit to the data from $yl1(x)$ and *y*_{h}(** x**) perform on par with or better than the LMGPs that are fit to all data and the small differences are mostly due to sample-to-sample variations. However, in cases with noise, i.e., Figs. 13(b) and 13(d), using all the data sets improves the performance of LMGP. We explain this observation as follows: In the noiseless cases, LMGP is able to quite accurately learn the behavior of

*y*

_{h}(

**) using just $yl1(x)$ and using all four data sets provides no additional advantage in learning**

*x**y*

_{h}(

**) while (1) requiring the estimation of additional hyperparameters (in the**

*x***matrix) and (2) compromising the estimates of $\Omega x$ to handle the discrepancies between the four sources. By contrast, in the cases with noise, one source is insufficient for LMGP to reach the threshold in emulation accuracy (which equals the noise variance) for**

*A**y*

_{h}(

**). Including additional data sources in these cases helps LMGP to differentiate noise from model form error.**

*x*For the remainder of this example, we investigate the most challenging version which has the fewest available data and highest level of noise. The latent space for this problem for LMGP_{s All}, shown in Fig. 14(a), is once again a powerful diagnostic tool. While LMGP only has access to 15 noisy samples from the ten-dimensional function *y*_{h}(** x**), the relative distances between latent positions match the relative accuracies of the data sources with respect to

*y*

_{h}(

**). The distance between**

*x**y*

_{h}(

**) and $yl3(x)$ is ≈0.4 yielding an approximate correlation of exp{− (0.4**

*x*^{2})} ≈ 0.85, which means that LMGP still uses information from $yl3(x)$ in predicting the response for

*y*

_{h}(

**) despite the former’s low accuracy with respect to the latter.**

*x*We impose a number of constraints in order to obtain a unique solution for the latent positions since our objective function in Eq. (8) is invariant under translation and rotation. For a two-dimensional latent space, we fix the first position to the origin, the second position to the positive *z*_{1}–axis, and the third position to the *z*_{2} > 0 half-plane. As we mentioned before in Sec. 3.2, we also assign the data sources to positions sequentially (i.e., $[yh(x),yl1(x),yl2(x),yl3(x),\cdots ]\u2192[1,2,3,4,\cdots ]$) with *y*_{h}(** x**) at the origin for easier visualization of the relative correlations $yli(x)$. While assigning the data sources to latent positions affects the learned latent positions, the relative distances between them remain the same as shown in Fig. 14(b). Since we typically know the data source with the highest fidelity, the learned latent space of LMGP provides an extremely easy way to assess the fidelity of different sources with respect to it.

Prediction performance on the low-fidelity sources for LMGP_{s All}, shown in Fig. 15, follows the same trend as data source accuracy; i.e., it is best for $yl1(x)$ and worst for $yl3(x)$. When fitting LMGP to multiple data sources, we expect prediction accuracy to be high on sources that are well correlated with others, i.e., whose latent positions are close together or form a cluster. Leveraging information from a well-correlated source improves prediction performance more than the alternative, so each source in the cluster gains a boost in prediction performance from the information of the other sources in that cluster. In this case, *y*_{h}(** x**), $yl1(x)$, and $yl2(x)$ form a cluster and as such we see that MSEs for $yl1(x)$ and $yl2(x)$ are much lower than those for $yl3(x)$.

The above equations indicate that all low-fidelity functions have nonlinear model form discrepancy. To roughly quantify these discrepancies, we follow the same procedure as in the previous example and calculate RRMSEs (Table 3). As it can be seen, the accuracy of the models increases with *i* (unlike the previous example—LMGP is robust with respect to this choice).

$yl1(x)$ | $yl2(x)$ | $yl3(x)$ | |
---|---|---|---|

RRMSE | 3.6671 | 1.3688 | 0.36232 |

$yl1(x)$ | $yl2(x)$ | $yl3(x)$ | |
---|---|---|---|

RRMSE | 3.6671 | 1.3688 | 0.36232 |

Note: The functions are listed in increasing order with respect to accuracy, with $yl3(x)$ being the most accurate by a significant margin.

We consider various amounts of available low-fidelity data, with and without noise. We also use a few combinations for training LMGP based on the selected data sets or how data sources are encoded. The results are summarized in Fig. 16 where, once again, LMGP convincingly outperforms GP in high-fidelity emulation, especially with noisy data (Figs. 16(b) and 16(d)). The overall trends in performance between strategies for LMGP are consistent across the various cases, with LMGP fit to only one low-fidelity source performing worse than LMGP fit to all data sources and with LMGP_{s All} specifically performing the best. LMGP_{m All} yields inconsistent results with *n*_{l} = 50 or *n*_{l} = 100, especially in the latter case where the box plots have stretched to include the outliers. This behavior is due to overfitting and the fact that there are many latent positions that must be placed in the latent space via a simple matrix-based map (256 positions and 32 elements in the ** A** matrix). Note that even with these inconsistencies, LMGP

_{m All}frequently outperforms GP, $LMGPsl1$, $LMGPml1$, $LMGPsl2$, and $LMGPml2$, which indicates that using more than two data sets in fusion is indeed beneficial.

The learned latent space for LMGP_{s All} which is the most challenging version of this problem (noisy samples, fewest available data) is shown in Fig. 17(a) which clearly indicates that relative distances among the positions match with the relative accuracy between the low- and high-fidelity sources: The position for $yl3(x)$ is very close to that for *y*_{h}(** x**), so LMGP weighs data from $yl3(x)$ heavily when emulating

*y*

_{h}(

**) and vice versa. The position for $yl2(x)$ is also close to both**

*x**y*

_{h}(

**) and $yl3(x)$, but it is relatively more distant from**

*x**y*

_{h}(

**) compared to $yl3(x)$.**

*x*Like in our first example, prediction performance on the low-fidelity sources for LMGP_{s All}, shown in Fig. 17(b), follows a similar trend to data source accuracy; i.e., it is best for $yl2(x)$ and $yl3(x)$ and worst for $yl1(x)$, which is the least accurate source. As we mentioned before, we expect prediction accuracy to be high on sources whose latent positions are close together or form a cluster. In this case, *y*_{h}(** x**), $yl2(x)$, and $yl3(x)$ form a cluster, and as such, we see that MSEs for $yl2(x)$ and $yl3(x)$ are much lower than those for $yl1(x)$.

### 4.2 Calibration Results.

We compare our calibration approach to that of KOH by considering four test cases with varying degrees of complexity. Note that, while LMGP can simultaneously assimilate and calibrate any number of sources, KOH’s approach only works with two data sets at a time and relies on repeating the process as many times as there are low-fidelity sources.

*x*

^{3}term (Table 4).

$yl1(x)$ | $yl2(x)$ | |
---|---|---|

RRMSE | 0.22241 | 0.1285 |

$yl1(x)$ | $yl2(x)$ | |
---|---|---|

RRMSE | 0.22241 | 0.1285 |

Note: We find the RRMSE in calibration problems using the same method as before but with the calibration parameters fixed to their true values at all input points. Both low-fidelity functions are relatively accurate, with $yl2(x)$ more accurate than $yl1(x)$.

We show high-fidelity emulation performance for this problem in Fig. 18 where, similar to Sec. 4.1, LMGPs are trained under various settings in terms of which data sources are selected and how they are encoded. As it can be observed, LMGP performs on par with or better than KOH’s approach in high-fidelity emulation accuracy for all cases, and LMGP_{s All} offers the most consistent performance for most cases. LMGP also performs particularly well in the cases with noise (Figs. 18(b) and 18(d)). Despite the inaccuracy of $yl2(x)$, LMGP fit to all data sources offers the most accurate emulation in all cases.

We next show calibration performance in Fig. 19 where LMGP_{s All} consistently outperforms KOH in both accuracy and consistency, especially in the noiseless cases (Figs. 19(a) and 19(c)). Notably, KOH’s approach fit with $yl2(x)$ yields biased estimates. With noise and little data (Fig. 19(b)), neither LMGP nor KOH’s approach are able to obtain a very consistent estimate for the calibration parameter across the repetitions. When more low-fidelity data are provided (Fig. 19(d)), LMGP is able to leverage the additional low-fidelity data to find a consistent estimate for *θ* while KOH’s approach does not improve in consistency.

We show the latent space from fitting LMGP to the most challenging version of this problem, i.e., *n*_{h} = 3, $nl1=nl2=15$, *σ*^{2} = 2 × 10^{−5}. As demonstrated in Fig. 20(a), LMGP is able to accurately infer the correlations with only three noisy high-fidelity samples as the relative latent distances match the relative accuracies of the data sources. Thus, we expect the low-fidelity performance to be better for $yl2(x)$ than for $yl1(x)$ as the position for $yl2(x)$ is relatively closer to *y*_{h}(*x*), which means that LMGP leverages more information from *y*_{h}(*x*) in predicting $yl2(x)$ than in predicting $yl1(x)$. We assess the veracity of our expectation by examining low-fidelity prediction performance in Fig. 20(b), which indicates that prediction performance is indeed better for $yl2(x)$ than for $yl1(x)$.

Next, we reconsider the example in Eq. (18) where $\theta *=\pi $ and $\theta *=10\pi $ are the two valid choices for the true calibration parameter as discussed in Sec. 3.3. We fit LMGP with two approaches to categorical variable selection and consider various amounts of available low-fidelity data all with noise (the noiseless case is considered in Sec. 3.3).

The high-fidelity emulation performance is summarized in Fig. 21, which indicates that LMGP outperforms KOH’s approach by a similar margin for each case. Notably, LMGP’s performance is robust to the choice of categorical variable assignment for this problem as we see a similar variation in performance over repetitions between LMGP_{s All} and LMGP_{m All}. We explain this by noting that since there are only two data sources, LMGP_{m All} finds a total of 2^{2} = 4 latent positions with (2 + 2) × 2 = 8 elements in ** A** which indicates that overfitting should not be a concern.

The estimates of the calibration parameters are provided in Fig. 22 and indicate that the estimation consistency in both approaches increases as *n*_{l} is increased from 30 to 200. This increase is more prominent for LMGP. However, while LMGP converges on *θ* = 10*π*, KOH’s approach’s estimates are approximately evenly split between *π* and 10*π*. This behavior is because the *L*2 distance of sin(10*πx*) and sin(*πx*) from *y*_{h}(*x*) is the same, and hence, KOH’s approach cannot favor one over the other [21,47,48]. As explained in Sec. 3.3, in this case, LMGP converges at *θ* = 10*π* as this choice provides not only a simpler discrepancy but also enables learning the high-frequency nature of *y*_{h}(*x*).

Finally, we show histograms of latent distances learned by LMGP in Fig. 23. The trends are quite similar to those seen in Sec. 3.3, with the latent distances being close to 0 for low amounts of low-fidelity data and converging on 0.5 as the amount of data is increased. When high-fidelity data are insufficient to learn the high-frequency behavior of *y*_{h}(*x*), LMGP treats the high-frequency behavior as noise and finds *y*_{h}(*x*) ≈ sin(*πx*). When low-fidelity data are also insufficient, LMGP cannot learn the behavior of *y*_{l}(*x*) at high frequencies (i.e., for large *θ*). Thus, LMGP finds *θ* = *π*, which implies *y*_{l}(*x*) = sin(*πx*), i.e., no model form error and a corresponding latent distance near zero. With sufficient low-fidelity data, however, LMGP learns the behavior of *y*_{l}(*x*) for large *θ* and finds that *θ* = 10*π* yields a less complex discrepancy between *y*_{h}(*x*) and *y*_{l}(*x*).

*T*

_{u}has been omitted and replaced by a constant in both low-fidelity functions.

$yl1(x)$ | $yl2(x)$ | |
---|---|---|

RRMSE | 0.049219 | 0.19838 |

$yl1(x)$ | $yl2(x)$ | |
---|---|---|

RRMSE | 0.049219 | 0.19838 |

Note: Both low-fidelity functions are relatively accurate, with $yl2(x)$ less accurate than $yl1(x)$.

We hold *n*_{h} = 25 and *n*_{l} = 100 constant and examine two cases, one without noise and one with noise applied to samples (*σ*^{2} = 100 with Range(*y*_{h}(** x**)) ≈ 974 over the input range) and again fit LMGP with various strategies. In both cases, LMGP convincingly outperforms KOH’s approach in high-fidelity emulation, see Fig. 24. Notably, LMGP outperforms KOH’s approach given equivalent access to data, e.g., $LMGPsl1$ versus $KOHl1$. LMGP’s performance is also robust to modeling choice, which we explain by noting that with three data sources the

*t*_{m}strategy for categorical variable selection yields 3

^{3}= 27 latent positions and 2 × (3 × 3) = 18 elements of

**, i.e., the number of latent positions is on the same order of magnitude as the number of hyperparameters in**

*A***and the size of the dataset is large relative to the number of hyperparameters.**

*A*As shown in Fig. 25(a) for the noiseless case, the latent positions found by LMGP_{s All} show no model form error for $yl1(x)$ and little model form error for $yl2(x)$, i.e., LMGP mistakes model form error in $yl1(x)$ for noise since the error is so low. While these latent positions are not fully accurate as $yl1(x)$ does still have model form error, the relative distances to the data sources do correctly indicate which is more accurate. With noise, shown in Fig. 25(b), the relative distances to *y*_{h}(** x**) are nearly the same for both low-fidelity sources, although $yl1(x)$ is slightly closer to

*y*

_{h}(

**) than to $yl2(x)$, which indicates that LMGP has more difficulty determining the magnitudes of the errors in the low-fidelity data sources in this case. The magnitudes of the latent distances are quite small in both cases, which reflect the fact that both low-fidelity data sources are relatively accurate when calibrated appropriately.**

*x*Calibration performance, shown in Fig. 26, reveals inconsistent performance in estimating *θ*_{1} but consistent estimates for *θ*_{2} for both LMGP and KOH’s approach in all three cases. We explain this by noting that the main sensitivity indices (calculated using 10, 000 inputs sampled via Sobol sequence) for *θ*_{1} and *θ*_{2} are on the order of 10^{−4} and 10^{−1} respectively for the low-fidelity functions; i.e., variation in *θ*_{1} has very little effect on their outputs. Therefore, we expect *θ*_{1} to be very difficult to estimate. While LMGP’s estimates for *θ*_{1} suffer from high variance, the distributions are centered on the true parameter for both cases. By contrast, KOH’s approach produces biased estimates in all cases, although $KOHl2$ guesses nearly the correct parameter almost half the time in the case with noise (Fig. 26(b)). Both methods estimate *θ*_{2} quite accurately and consistently. KOH’s approach has lower variance in its estimates but more outliers when using $yl2(x)$ compared to LMGP’s estimates using all data sources.

We examine one case with very small noisy data sets in which we set *n*_{h} = 15, *n*_{l} = 50, and *σ*^{2} = 16. We fit only LMGP_{s All} as it is generally the best-performing model. LMGP consistently outperforms KOH’s method in high-fidelity emulation (Fig. 27(a)). Additionally, the latent space learned by LMGP shows model form error for all three low-fidelity sources, with the relative distances between the sources roughly matching their relative accuracies (Fig. 27(b)). Both KOH’s method and LMGP perform poorly in calibration for all four parameters. We explain this by noting that this problem suffers from identifiability issues and that the calibration parameters have both small Sobol sensitivities and low interaction (i.e., even increasing the number of data points will not resolve the issue). Notably, while LMGP displays inconsistent calibration estimates for each parameter, KOH’s method incorrectly shows consistent but biased estimates which are often quite far from the true calibration parameter (Fig. 28(c)). LMGP shows more uncertainty than KOH, which more accurately reflects the nature of the problem and its learned latent space can help the analyst in detecting identifiability issues.

## 5 Conclusion

In this paper, we present a novel latent-space-based approach for data fusion (i.e., multi-fidelity modeling and calibration) via latent-map Gaussian processes or LMGPs. Our approach offers unique advantages that can benefit engineering design in a number of ways such as improved accuracy and consistency compared to competing methods for data fusion. Additionally, LMGP learns a latent space where data sources are embedded with points whose distances can shed light on not only the relations among data sources but also potential model form discrepancies. These insights can guide diagnostics or determine which data sources cannot be trusted.

Implementation and use of our data fusion approach are quite straightforward as it primarily relies on modifying the correlation function of traditional GPs and assigning appropriate priors to the datasets. LMGP-based data fusion is also quite flexible in terms of the number of data sources. In particular, since we can assimilate multiple data sets simultaneously, we improve prediction performance and decrease non-identifiability issues that typically arise in calibration problems.

Since LMGPs are extensions of GPs, they are not directly applicable to extrapolation or big/high-dimensional data. However, extensions of GPs that address these limitations [27,38,41–44,49] can be incorporated into LMGPs. In our examples, we assumed all data sources are noisy and hence used a single parameter to estimate the noise. To consider different (unknown) noise levels, we need to have a parameter for each data source. We also note that the performance of LMGP in fusing small data can be greatly improved by endowing its parameters with priors and using Bayes’ rule for inference. In this case, the latent space will have a probabilistic nature, the trained model will be more robust to overfitting, and prediction uncertainties will be more accurate. Lastly, we have studied small data scenarios and not explored the effects of large data sets on the consistency of hyperparameter estimation. A detailed convergence study is needed to determine how the hyperparameters and the learned manifold are affected as the data set sizes grow. These directions will be investigated in our future works.

Lastly, we note that the proposed method can be directly applied to multi-response data sets with no modifications. To apply LMGP, we would treat each response as if it was a data source and then apply our data fusion method directly. However, with this strategy, each “data source’ would have the exact same set of input points, which will most likely cause numerical issues. While LMGP can be applied to multi-response data sets with some modifications (which may be presented in a future paper), the user should bear in mind that we do not necessarily *a priori* expect any level of correlation between the responses whereas with multi-fidelity problems we expect (but do not necessarily have) some correlation as all sources model the same system. Thus, we would recommend fitting LMGP to all responses and examining the latent space to see which responses are well-correlated. Then, fit individual emulators to uncorrelated responses while fitting an LMGP to whichever groups of responses that are correlated with each other.

## Footnotes

Multiplicative terms have also been introduced to KOH’s approach but are seldom adopted as they increase the identifiability issues and computational costs while negligibly improving the mean prediction accuracy.

We have tried a binary encoding version of this strategy where a data source is assigned its own categorical variable with two levels where 0 indicates the source is inactive and 1 indicates that the source is active. We found the results of this case to be similar to those of strategy 2 presented in the paper.

Generally built via computer simulations.

## Acknowledgment

This work was supported by the Early Career Faculty grant from NASA’s Space Technology Research Grants Program (Award Number 80NSSC21K1809).

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The data sets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

## Nomenclature

=*t*matrix or vector encoding of the categorical combinations used in LMGP

=*A*matrix of hyperparameters of LMGP which determine the latent positions of the categorical combinations

=*R*correlation matrix for LMGP

*n*_{h}, $nli$ =respectively, the number of training data for the high-fidelity source and

*i*th low-fidelity source. When all low-fidelity sources have the same number of training data, we simply use*n*_{l}*y*_{h}, $yhi$,*y*_{h}(),*x**y*_{h}=respectively, a vector containing training outputs, the

*i*th training output, the underlying data source, and the output of the underlying data source*X*_{h}, $xhi$,*x*_{h}=respectively, the matrix of training inputs, the

*i*th training input, and the input to the data source. In the case that the input is one-dimensional, these become*x*_{h}, $xhi$, and*x*_{h}, respectively- $\Delta zyh,yli$ =
distance between, e.g.,

*y*_{h}() and $yli(x)$ in the latent space. In the case that there are only two points in the latent space, we shorten this to just Δ*x*_{z},*θ**θ*^{*}, $\theta ^$ =respectively, the calibration inputs, true calibration parameters, and estimated calibration parameters. In general, we use an asterisk to denote the true value of a parameter and a hat to denote an estimate

*σ*^{2}=noise variance

- $\Omega x$, $\Omega \theta $ =
matrix of roughness parameters

*ω*_{i}for the numerical and calibration inputs, respectively

## Subscripts

*h*=high-fidelity source

*l*_{i}=*i*th low-fidelity source. We use this and the above subscript to denote data sources and their corresponding latent points, e.g.,*y*_{h}() or $yl1(x)$. We also use this subscript to refer to strategies of KOH’s approach or LMGP which are fit to only*x**y*_{h}and $yli$-
*s*,*m*= respectively, strategy 1 and strategy 2 for categorical variable assignment during preprocessing of data for LMGP. We combine this with the above subscripts to fully describe a fitting strategy, e.g., $LMGPsl1$ denotes LMGP fit to only

*y*_{h}and $yl1$ using strategy 1 for preprocessing the data*All*=a strategy of LMGP fit to data from all available sources