Abstract

While confidence intervals for finite quantities are well-established, constructing confidence bands for objects of infinite dimension, such as functions, poses challenges. In this paper, we explore the concept of parametric confidence bands for functional data with an orthonormal basis. Specifically, we revisit the method proposed by Sun and Loader, which yields confidence bands for the projection of the regression function in a fixed-dimensional space. This approach can introduce bias in the confidence bands when the dimension of the basis is misspecified. Leveraging this insight, we introduce a corrected, unbiased confidence band. Surprisingly, our corrected band tends to be wider than what a naive approach would suggest. To address this, we propose a model selection criterion that allows for data-driven estimation of the basis dimension. The bias is then automatically corrected after dimension selection. We illustrate these strategies using an extensive simulation study. We conclude with an application to real data.

1 Introduction

Functional data analysis is widely used for handling complex data with smooth shapes, finding applications in diverse fields such as neuroscience (e.g., EEG data, Zhang (2020)), psychology (e.g., mouse-tracking data, Quinton et al. (2017)), and sensor data from daily-life activities (Jacques and Samardžić (2022)). It consists in estimating a function, which is an object of infinite dimension. It is important to accompany the function estimate with a measure of uncertainty, for example through a simultaneous confidence band. This task presents several challenges: the confidence band must control the simultaneous functional type-I error rate, as opposed to point-wise rates; it must strike a balance between being sufficiently conservative to maintain a confidence level while not being overly so as to render it meaningless; and the method used to construct this confidence band should be computationally feasible for practical application (Ramsay (2005)).

We consider several independent observations of the same function, i.e., noisy functional data. To analyze this noisy data, a classic approach consists in either averaging pointwise the data, applying a kernel function to smooth the noise or projecting the data onto a functional space defined by a family of functions (Kokoszka and Reimherr (2017) Chapter 3, Li, Qiu, and Xu (2022)). The pointwise empirical mean is not a function but a vector of estimated discrete points. On the other hand, with the two other approaches, the targeted function is then an approximation of the true function and obtained confidence bands are thus bands of this approximated function. In such context, several methods have been proposed to construct a confidence band that controls the simultaneous functional type-I error rate. In the case of a single individual (no repetition) but with many time points, some methods study the asymptotic distribution of the infinity norm between the targeted approximated function and its estimator, the asymptotics being in the number of time points (Hall (1991), Claeskens and Van Keilegom (2003)). These approaches only work for large datasets in time and are likely to be too conservative otherwise. For small samples, bootstrap methods have been developed to compute the confidence band (Neumann and Polzehl (1998), Claeskens and Van Keilegom (2003)), but with a high computational cost. Another approach consists in constructing confidence bands based on the volume-of-tube formula. Sun and Loader (1994) studied the tail probabilities of suprema of Gaussian random processes. This approach is based on an unbiased linear estimator of the regression function, which corresponds to a band of the approximated targeted function. Zhou, Shen, and Wolfe (1998) used the volume-of-tube formula for estimation by regression splines. Krivobokova, Kneib, and Claeskens (2010) applied this method for the construction of confidence bands using penalized spline estimators. Bunea, Ivanescu, and Wegkamp (2011) propose a threshold-type estimator and derive error bounds and simultaneous confidence bands. In the case of several observations of the same function, Liebl and Reimherr (2023) proposed a method based on random field theory and the volume-of-tube formula, leveraging the Kac-Rice formula. Their approach introduces a quantile that varies with location, which allows to achieve their fairness property. Their confidence band uses an unbiased estimator. Unlike other methods, it does not require estimating the full covariance function of the estimator, but only its diagonal, which reduces computational time. From a practical viewpoint, Sachs, Brand, and Gabriel (2022) introduced a package to popularize simultaneous confidence bands in the context of survival analysis. Telschow and Schwartzman (2022) propose a simultaneous confidence band based on the Gaussian kinematic formula. Again, it assumes access to an asymptotically unbiased estimator of the function of interest. The coverage is thus guaranteed in the asymptotic setting after removing the bias and by targeting the approximated smoothed function. Their paper considers the non-gaussian and non-stationary cases. Wang (2022) proposed a simultaneous Kolmogorov-Smirnov confidence band by modeling the error distribution, thus avoiding the estimation of the covariance structure of the underlying stochastic process. They rely on B-splines for the estimation of the mean curve. Note that recent extensions have been proposed in Telschow et al. (2023) to construct simultaneous confidence bands, or based on conformal prediction in Diquigiovanni, Fontana, and Vantini (2022), or having a prediction goal in mind in Hernández, Cugliari, and Jacques (2024) by considering functional time series data. These different papers use approximation of the function of interest and do not deal with the non asymptotic associated bias.

We want to work in the non asymptotic setting and to propose confidence band of the original true function, not only on the approximated one (obtained by smoothing kernel or by projection). We do not work with the empirical mean estimator for two main reasons. It does not inherit functional properties, especially its regularity. Furthermore, the dimension of the empirical mean estimator is larger than the dimension of a projection or kernel estimator and this induces a larger variance and wider confidence band. Thus we focus on functional estimator.

The difference between the original function and its approximation is a functional (deterministic) bias that depends on the quality of the kernel or the projection. This bias may be neglected in the asymptotic setting where the number of observations goes to infinity, but not with finite sample sizes. It has to be taken into account in the construction of the confidence band. Both methods (kernel and projection) rely on the choice of an hyperparameter, the kernel bandwidth in the first case and the dimension of the functional projection basis in the second. The choice of this hyperparameter is crucial to control the type-I-error rate of the confidence band viewed as a band of the true function in a non-asymptotic setting. This is the question we explore in this paper.

Sun and Loader (1994) proposed a bias correction for a particular class of functions but left the smoothing parameter choice open, leading to an unusable estimator. In the nonparametric framework, the bias is approximated using the estimator of the second derivative of the underlying mean function (Xia (1998)). But in general, there is a lack of discussion on how to handle the bias of the functional estimator, even in the simple case of a functional space of finite dimension. Hard-thresholding approaches, cross-validation methods (Li, Qiu, and Xu (2022)) or model selection framework could be used to select the best dimension. However, these approaches need to be adapted to the specific case of controlling the level of a confidence band. Few references exist on this subject. For example, while the model selection paradigm has been extensively studied in the literature, in multivariate statistics or functional data analysis (e.g., Goepp, Bouaziz, and Nuel (2025), Aneiros, Novo, and Vieu (2022), Basna, Nassar, and Podgórski (2022)), it has not been explored in the context of confidence band construction.

The objective of this paper is to address the bias problem in confidence band construction for a general function, in the non-asymptotic setting with respect to the number of individuals and the number of time points. We adopt the point of view of projecting data onto a functional space, because the functional bias is easier to study and control, than when smoothing data with a kernel. Especially, when the family is an orthonormal basis, e.g., the Legendre basis (with the standard scalar product) or Fourier (with another scalar product), the projection is explicit and it is possible to obtain theoretical results. Moreover, the functional space offers a key advantage: it reduces the problem of inference to the estimation of coefficients, for example by least squares or maximum likelihood estimation. The function estimator is then simply an average after projection onto the functional base (Ramsay (2005)). Our contributions are as follows:

we disentangle the bias issue by explicitly defining the parameter of interest within the approaches of Sun and Loader (1994) and Telschow et al. (2023);
we propose a confidence band for the true function of interest, including a bias correction. This provides a collection of debiased confidence bands. We also propose a criteria to select the best band, by splitting the sample into two sub-samples;
we propose a second heuristic method for selecting the dimension of the approximation space, treating it as a model selection problem, with a trade-off between conservatism and confidence level assurance; this approach does not correct the bias of each band of the collection but selects a band with a negligible bias;
we illustrate the proposed strategies and compare them to cross-validation or threshold approaches;
we also illustrate the impact of the choice of the functional family, including non-orthonormal families.

The paper is organized as follows: Section 2 introduces the functional regression model, the considered functional family and the corresponding approximate regression models, as well as an estimator defined in the finite space, along with descriptions of the error terms. In Section 3, we propose a confidence band for the approximate regression function in the space of finite dimension, where the dimension is fixed. Section 4 proposes a strategy to construct a confidence band for the true function. This last confidence band being too conservative, Section 5 introduces a model selection criterion to select the best confidence band, doing a trade-off between conservatism and confidence level assurance. Section 6 illustrates the different estimating procedures of the confidence band. Section 7 proposes an application on real data. Section 8 ends the paper by a conclusion and discussion of perspectives.

2 Statistical Model

In this paper, we consider time series as discrete measurements of functional curves. We first present the general functional regression model (Section 2.1) where the regression function belongs to a finite functional family of dimension L^*. In practice, this dimension L^* is unknown and we will work on functional space of dimension L. The regression model on the finite family of functions is presented in Section 2.2, and an estimator is proposed in Section 2.3, with a description of the error terms.

In the rest of the paper, we consider the space L^2([0,1]) with its standard scalar product <f_1,f_2> = \int_0^1 |f_1(t)f_2(t)|dt, for f_1,f_2 \in L^2([0,1]). The notation Vect denotes the linear span.

2.1 Functional regression model

Let y_{ij} be the measure at fixed time t_{j} \in [a,b] for individual i=1, \ldots, N, with j=1, \ldots, n. The case with time observations dependent of the individuals is a natural extension of this case, but is not treated in this paper. We restrict ourselves to [a,b] = [0,1], without loss of generality. We assume these observations are discrete-time measurements of individual curves, which are independent and noisy realisations of a common function f that belongs to a functional space. Thus for each individual i, we consider the following functional regression model \begin{equation*} y_{ij} = f(t_{j}) + \varepsilon_{ij}, \end{equation*} where \varepsilon_{i.}=(\varepsilon_{i1}, \ldots, \varepsilon_{in}) is the measurement noise assuming that the \varepsilon_{i} are independent.

For each individual i=1, \ldots, N, we denote y_{i.}=(y_{i1}, \ldots, y_{in}), t_{.}=(t_{1}, \ldots, t_{n}) and f(t_{.})=(f(t_{1}), \ldots, f(t_{n})) the n \times 1 vectors of the observations, times and function f evaluated in t_{.}, respectively. We also denote \mathbf{y} = (y_{1.}, \ldots, y_{N.}) the whole matrix of observations.

The unknown regression function f lives in an infinite space and can not be directly estimated. First we reduce the dimension by projecting f on a function space. Let us consider L^* functions (B_\ell^{L^*})_{1\leq \ell \leq L^*} assumed to be linearly independent and the corresponding functional space \mathcal{S}^{L^*} = Vect((t \mapsto B_\ell^{L^*}(t))_{1\leq \ell \leq L^*}). Then, for any f \in \mathcal{S}^{L^*}, there exists a unique vector of coefficients (\mu_{\ell}^{L^*})_{1\leq \ell \leq L^*} such that, for all t, f(t) = \sum_{\ell =1}^{L^*} \mu_{\ell}^{L^*} B_\ell^{L^*}(t). The regression function f verifies the following assumption:

Assumption 1. The function f belongs to the space \mathcal{S}^{L^*} of dimension L^*. It is denoted {f}^{L^*} and defined as: f(t) = {f}^{L^*}(t) = \sum_{\ell = 1}^{L^*} \mu_\ell^{L^*} B^{L^*}_\ell(t).

Many functional spaces are available in the literature, as Splines, Fourier or Legendre families. We introduce the following assumption:

Assumption 2. The functional family (t \mapsto B^{L^*}_\ell(t))_{1\leq \ell\leq L^*} is orthonormal with respect to the standard scalar product <.,.>.

Note that if Assumption 2 holds, one get \mu_\ell^{L^*} = <{f}^{L^*},B^{L^*}_\ell> for \ell = 1,\ldots, L^*. The Legendre family is orthonormal, the Fourier family is orthogonal for the standard scalar product (but not orthonormal), and the B-splines family is not orthogonal. We will illustrate the impact of using one family or the other in Section 6.6.

We also consider a functional noise through the following assumption.

Assumption 3. The sequence \varepsilon_{i} is functional and allows Karhunen-Loève L^2 representation: there exists a sequence of coefficients (c_{i\ell})_{1\leq \ell} such that \varepsilon_{ij} = \sum_{\ell \geq 1} c_{i\ell} \phi_\ell(t_{j}), where the functions (\phi_\ell)_{1\leq L} can be written through eigenvalues and eigenfunctions of the covariance matrix cov(\varepsilon_{ij}, \varepsilon_{ij'}). Practically, we assume this sum to be finite, as done for example in Chen and Song (2015): there exists L^\varepsilon such that \varepsilon_{ij} = \sum_{1 \leq L \leq L^\varepsilon} c_{i\ell} \phi_\ell(t_{j}).

We also assume that the coefficients are Gaussian: for all i=1,\ldots, N and \ell=1, \ldots, L^\varepsilon, c_{i \ell} \sim_{iid} \mathcal{N}(0,\sigma^2).

Note that we could also work with elliptic distributions instead of the Gaussian distribution. The results of the paper would be the same but require more technical details. For simplicity, we focus on the Gaussian case.

Assumption 1 and Assumption 3 imply that each curve y_i belongs to a finite family: for j=1,\ldots,n, y_{ij} = \sum_{\ell=1}^{L^*} \mu_{\ell}^{L^*} B_\ell^{L^*}(t_{j}) + \sum_{\ell=1}^{L^\varepsilon} c_{i\ell} \phi_\ell(t_{j}).

As the observations are recorded at discrete time points (t_j)_{1\leq j \leq n}, for L\in \mathbb{N}, let us denote \mathbf{B}^L the matrix of n\times L with coefficient in row j and column \ell equal to B^L_\ell(t_{j}), and the basis for the noise \Phi^{L^\varepsilon} = (\phi_\ell(t_j))_{1\leq \ell \leq L^\varepsilon, 1\leq j \leq n}. Let us introduce c_{i.}=(c_{i1}, \ldots, c_{iL^\varepsilon}) the L^\varepsilon \times 1 vector. Then \varepsilon_{i.} = \Phi^{L^\varepsilon} c_{i.}. The vectors y_{i.} \in \mathbb{R}^n are thus independent and y_i\sim \mathcal{N}_n(f(t_{.}), \sigma^2 \Sigma^{L^\varepsilon}) with \Sigma^{L^\varepsilon} = \Phi^{L^\varepsilon} (\Phi^{L^\varepsilon})^T.

The objective of this paper is to construct a tight confidence bound for f^{L^*} using data (y_{ij})_{ij}. The main challenge is that the true dimension L^* is unknown. In practice, we can only work with a projected version on the space \mathcal{S}^{L} with L \in \{L_{\min}, \ldots, L_{\max}\}. Two issues are introduced: the bias induced by this projection and the choice of L. In this paper, we treat both of them in a non asymptotic setting to construct confidence band with a control level.

In the rest of the paper, we will work with a collection of models defined by the dimension L with L\in \{L_{\min}, \ldots, L_{\max}\}, L_{\max} being chosen to be sufficiently large by the user, expecting that L^*\leq L_{\max}. L_{\max} has to be large enough to do overfitting. We will study the functional bias and its asymptotic behavior. The we will propose different strategies to choose the best dimension among the different collections.

First, in Section 2.2 and Section 2.3, we define for a fixed L the corresponding regression model and its estimator. Then Section 3, Section 4 and Section 5 will introduce the different bandwidths.

2.2 Approximation of the model on a finite family

Let f^{L^*} \in \mathcal{S}^{L^*} with L^* unknown, and consider the space \mathcal{S}^L for L fixed in \{L_{\min}, \ldots, L_{\max}\}. As \mathcal{S}^L is the linear span of linearly independent functions, there is always a unique vector \mu^{L} of coefficients defining f^{L}(t) = \sum_{\ell_=1}^L \mu_\ell^{L} B_\ell^L(t)=B^L(t) \mu^{L} such that f^{L} = \arg\min_{f \in \mathcal{S}^L}\{\|f^{L^*} - f\|_2^2\}. If the family is orthonormal, it corresponds to the projected coefficients \mu_{\ell}^{L}: \mu_{\ell}^{L} :=<f^{L^*}, B_\ell^L>.

We know that under Assumption 1, f^{L^*,L^*} = f^{L^*}. Moreover under Assumption 2, \mu_{\ell}^{L} = \mu_\ell^{L^*} \quad{ for }\; \ell =1, \ldots, \min(L, L^{*}).

It is interesting to note that this is not true when the basis is not orthonormal.

In practice, data are observed at discrete time, we consider the operator \mathbf{P}^L defined as the matrix \mathbf{P}^L = ((\mathbf{B}^L)^T \mathbf{B}^L)^{-1} (\mathbf{B}^L)^T of size L\times n. This coincides with the orthogonal projection when we deal with an orthonormal basis, but this formula also holds for non orthonormal family, coming back to the least square estimator on a specified family. Then we define the coefficients \underline{\mu}^{L} which are the coefficients of {\mu}^{L} approximated on the vector space, denoted \mathbf{S}^L, defined by the matrix \mathbf{B}^L. \underline{\mu}^{L} := \mathbf{P}^L \mathbf{B}^{L^*}\mu^{L^*}.

Note that

When L\geq L^*, when n>L, we have for \ell=1, \ldots, L^*, \underline{\mu}^{L}_\ell = \mu^{L^*}_\ell.
When L<L^*, for \ell =1, \ldots, L, \underline{\mu}^{L}_\ell \neq \mu^{L^*}_\ell.

The corresponding finite approximated regression function is denoted \underline{f}^{L} and is defined, for all t\in [0,1], as \underline{f}^{L}(t) = B^L(t) \underline{\mu}^{L}.

We can observe the following, linking L, L^* and the number of timepoints n: under Assumption 1 and Assumption 2, the diagonal elements of \mathbf{P}^L \mathbf{B}^{L^*} are such that for \ell=1, \ldots, \min(L, L^*), [\mathbf{P}^L \mathbf{B}^{L^*}]_{\ell\ell}=1. Moreover,

When n\rightarrow\infty, for \ell = 1,\ldots, L \underline{\mu}_\ell^{L}\rightarrow \mu_\ell^{L^*}.
If n>L^*, then f^{L^*} = f^{L^*,L^*} = \underline{f}^{L^*,L^*}.

The last point is particularly interesting, relating n the number of timepoints to the true level L^* such that the discretized functions correspond to the true one.

2.3 Estimator

Let L\in \{L_{\min}, \ldots, L_{\max}\}. This section presents the least square estimator of the regression function on the space of dimension L defined by the family \mathbf{B}^L and discusses its error, i.e. its bias and its behavior with respect to L and n.

2.3.1 Estimation of the regression function

When considering the estimation of the regression function f^{L^*} on the space of dimension L defined by the family \mathbf{B}^L, we do not directly estimate f^{L^*} but its projection on this finite space, which corresponds to the projected function \underline{f}^{L}(t) and its associated coefficients (\underline{\mu}_\ell^{L})_{1 \leq \ell \leq L}.

Definition 1 The vector of coefficients (\underline{\mu}_\ell^{L})_{1 \leq \ell \leq L} is estimated by the least square estimator \hat{\underline{\mu}}^{L} defined as: \begin{align*} \hat{\underline{\mu}}^{L}:=\frac1N \sum_{i=1}^N \mathbf{P}^L y_{i.}. \end{align*}

For a fixed t \in [0,1], the estimator of the function \underline{f}^{L}(t) is defined by:

\underline{\hat{f}_{}}^{L}(t) = \sum_{\ell=1}^L \underline{\hat \mu}_\ell^{L} B^L_\ell(t)= B^L(t)\hat{\underline{\mu}}^{L}. \tag{1}

Equation 1 directly implies that the estimator is thus the empirical mean of the functional approximation of each individual vector of observations. Because we work with least squares estimators, we can easily study the error of estimation of \hat{\underline{\mu}}^{L} and \underline{\hat{f}_{}}^{L}.

Proposition 1 Under Assumption 1 and Assumption 3, we have \begin{align*} \hat{\underline{\mu}}^{L} &\sim \mathcal{N}_L\left(\underline{\mu}^{L}, \frac{\sigma^2}N\Sigma_{B}^{L,L^\varepsilon} \right), \end{align*} where the L\times L covariance matrix \Sigma_{B}^{L,L^{\varepsilon}} is defined as \Sigma_{B}^{L,L^\varepsilon}:=\mathbf{P}^L \Sigma^{L^\varepsilon} (\mathbf{P}^L)^T with \Sigma^{L^\varepsilon}= \Phi^{L^\varepsilon} (\Phi^{L^\varepsilon})^T.

Moreover, B^L\mathbf{P}^L y_i is a Gaussian process with mean \underline{f}^{L} and covariance function (s,t) \mapsto \sigma^2 B^L(s) \Sigma_{B}^{L,L^\varepsilon} (B^L(t))^T, and (\underline{\hat{f}_{}}^{L}- \underline{f}^{L}) is a centered Gaussian process with covariance function C^{L}: (s,t) \mapsto \frac{\sigma^2}N B^L(s) \Sigma_{B}^{L,L^\varepsilon} B^L(t)^T.

The proof is given in Appendix.

Even if the estimator \underline{\hat{f}_{}}^{L} is defined on the functional space associated to \mathbf{S}^L, our approach consists in seeing it as an estimator of the function f^{ L^*} which lies in the space \mathcal{S}^{L^*}. In that case, the error includes a functional approximation term due to the approximation of f^{ L^*} on the space \mathcal{S}^L, which will be nonzero if L\neq L^*. It corresponds to the bias of the estimator \underline{\hat{f}_{}}^{L}, i.e. the difference between its expectation and the true f^{ L^*}. Indeed, recalling that f^{L^*} = \underline{f}^{L^*,L^*}, the error of estimation can be decomposed into \underline{\hat{f}}^{L}(t) -f^{L^*}(t) = \underline{\hat{f}}^{L}(t) - \underline{f}^{L}(t) + \underline{f}^{L}(t) - \underline{f}^{L^*,L^*}(t) =: Stat_{L}(t) + Bias_{L}(t), \tag{2}

The first term Stat_{L}(t) = \underline{\hat{f}}^{L}(t) - \underline{f}^{L}(t) is the (unrescaled) statistics of the model. The second term Bias_{L}(t) = \mathbb{E}(\underline{\hat{f}}^{L}(t)) - \underline{f}^{L^*,L^*}(t) is the bias of the estimator \underline{\hat{f}}^{L}(t) when estimating the true function \underline{f}^{L^*,L^*}(t).

Let us remark that this bias is different than the bias of the estimator \underline{\hat{f}}^{L}(t) when estimating the projected function \underline{f}^{L}=f^{ L^*}, which is 0. The two terms defined in Equation 2 are more detailed in the two next subsections.

2.3.2 Statistics

The statistics of the model, t\mapsto Stat_{L}(t) = \underline{\hat{f}}^{L}(t) - \underline{f}^{L}(t), is a random functional quantity which depends on the estimator \underline{\hat{f}}^{L}. From Proposition 1, for any t\in [0,1], we define the centered and rescaled statistics Z_L(t): Z_L(t):= \frac{Stat_{L}(t)}{\sqrt{\text{Var}(Stat_{L}(t))}} = \frac{\underline{\hat{f}}^{L}(t) - \underline{f}^{L}(t)}{\sqrt{C^{L}(t,t)}}\sim \mathcal{N}(0,1).

The covariance function can be naturally estimated using the observations y_{i.} as \hat C^{L}(s,t) = \frac1{N-1}\sum_{i=1}^{N} (B^L(s)\mathbf{P}^L y_{i.} - \underline{\hat{f}_{}}^{L}(s))(B^L(t)\mathbf{P}^L y_{i.} - \underline{\hat{f}_{}}^{L}(t)).

2.3.3 Bias

The bias is due to the fact that the estimation is potentially performed in a different (finite) space than the space where the true function \underline{f}^{L^*,L^*} lives. This is a functional bias, which is not random. It corresponds to the approximation of f^{L^*} from \mathcal{S}^{L^*} to the space \mathcal{S}^{L}. It can be written as follows: Bias_{L}(t) = B^L(t) \underline{\mu} ^{L} - B^{L^*}(t) \mu^{L^*}. It depends on L and on the sample size through \underline{\mu} ^{L}. Let us describe its behavior. When L<L^* and the family is orthonormal, the approximation is the orthogonal projection and we have that Bias_{L}(t)= \sum_{\ell=1}^L B_\ell^L(t)\underline{\mu}_\ell^{L} - \sum_{\ell=1}^{L^*} B_\ell^{L^*}(t)\underline{\mu}_\ell^{L^*} = \sum_{\ell = L+1}^{L^*} B_\ell^{L^*}(t)\underline{\mu}_\ell^{L^*}.

From Proposition 1, we can directly deduce the following proposition:

Proposition 2 Under Assumption 1 and Assumption 3, the bias is, for all t\in [0,1],

for L<L^*, Bias_{L}(t)\neq 0,
for L\geq L^*, Bias_{L}(t) = 0.

Note that the bias is not 0 even when n\rightarrow \infty, as soon as L<L^*. Thus, a correct selection of the dimension L is an issue.

In the next section, we explain how we use this property to derive confidence bands of \underline{f}^{L} and f^{L}.

3 Preliminary step: Confidence Bands of \underline{f}^{L} and f^{L} for a fixed L

In this Section, we present some well known results on constructing a confidence band of \underline{f}^{L} and f^{L}. As its definition and properties are essential to construct the next steps about confidence band of f^{L^*}, we have decided to recall them in details.

The objective is to construct a confidence band for the two functions \underline{f}^{L} and f^{L}, based on the observations \mathbf{y}, for a given value L\in \{L_{\min}, \ldots, L_{\max}\}. The band for \underline{f}^{L} enters the framework proposed by Sun and Loader (1994) which relies on an unbiased and linear estimator of the function as the estimator \underline{\hat f}^{L} is an unbiased estimator of \underline{f}^{L}. We recall in Section 3.1 the construction of this confidence band which attains a given confidence level in a non-asymptotic setting, that is for a finite number of observations n for each individual. Then in Section 3.2, we prove that the confidence band proposed by Sun and Loader (1994) can be viewed as a confidence band for f^{L} with an asymptotic confidence level, the asymptotic framework being considered when n\rightarrow\infty.

3.1 Confidence band for \underline{f}^{L}

Consider 1-\alpha a fixed confidence level. The aim is to find a function d^L such that \mathbb{P}\left( \forall t \in [0,1],\; \underline{\hat{f}_{}}^{L}(t) -d^L(t)\leq \underline{f}^{L}(t)\leq \underline{\hat{f}_{}}^{L}(t) +d^L(t)\right) = 1-\alpha. Consider the normalized statistics Z_L(t) which is a centered and reduced Gaussian process. We want to find the quantile q^L satisfying

q^L =\arg\min_{q} \left\{ \mathbb{P}\left(\max_{t \in [0,1]} \left|Z_L(t)\right| \leq q \right) = 1 - \alpha\right\}. \tag{3}

Then we can take d^L(t) = q^L \sqrt{C^{L}(t,t)}. The covariance function C^{L}(t,t) can be replaced by its estimator \hat{C}^{L}(t,t), making the distribution a Student’s distribution with N-1 degrees of freedom. Thus, it only requires to be able to compute the critical value q^L.

This can be done following Sun and Loader (1994) who propose a confidence band for a centered Gaussian process. Their procedure is based on an unbiased linear estimator of the function of interest, which is the case for \underline{\hat{f}_{}}^{L} when we consider a band for \underline{f}^{L}. We recall their result in the following proposition, the computation of the value q^L is detailed thereafter. Note that the presentation of Telschow and Schwartzman (2022) is similar to the one adopted in this paper.

Theorem 1 (Sun and Loader (1994)) Set Assumption 1 and Assumption 3 and a probability \alpha\in [0,1]. Then, we have \mathbb{P}\left(\forall t \in [0,1], \left|\underline{\hat{f}}^{L}(t)-\underline{f}^{L}(t)\right| \leq \hat d^L(t)\right) = 1- \alpha with \begin{align*} \hat d^L(t) = \hat q^L \sqrt{\hat C^{L}(t,t)/N} \end{align*} and \hat q^L defined as the solution of the following equation, seen as a function of q^L: \alpha = \mathbb{P}\left(|t_{N-1}|>q^L\right) +\frac{\| \tau^L\|_1}{\pi}\left( 1+\frac{(q^L)^2}{N-1}\right)^{-(N-1)/2} , \tag{4} with (\tau^L)^2(t)= \partial_{12} c(t,t) = Var(Z'_L(t)) where we denote \partial_{12}c(t,t) the partial derivatives of a function c(t,s) in the first and second coordinates and then evaluated at t=s.

We can thus deduce a confidence band of level 1-\alpha for \underline{f}^{L}: \begin{align*} CB_1(\underline{f}^{L})& = \left\{ \forall t\in [0,1], \left [\underline{\hat{f}}^{L}(t) -\hat d^L(t) ; \underline{\hat{f}}^{L}(t) +\hat d^L(t) \right ]\right\}. \end{align*}

The value \hat q^L is defined implicitly in Equation 4 which involves the unknown and not explicit quantity t\mapsto \tau^L(t). Liebl and Reimherr (2023) propose to estimate \tau^L(t), for all t, by \begin{align*} \hat \tau^L(t) &= \left(\widehat{Var}(({U}^{L})'_{1}(t), \ldots, ({U}^{L})'_{N}(t)\right)^{1/2}\\ &= \left(\frac{1}{N-1}\sum_{i=1}^N\left(({U}^{L})'_{i}(t)-\frac1N\sum_{j=1}^N({U}^{L})'_{j}(t)\right)^2\right)^{1/2}, \end{align*} where {U}^L_{i}(t) = (P^L y_{i.}(t)-\underline{\hat{f}}^{L}(t))/(\hat C^{L}(t))^{1/2} and ({U}^{L})'_{i} is a smooth version of the differentiated function {U}^L_{i}. Then we take the L_1-norm of \hat \tau^L.

Let us describe the behavior of \hat d^L:

\|\hat d^L\|_\infty increases with L.
When the functions (B_\ell^L)_{1\leq \ell \leq L} form an orthonormal family, \|\hat d^L\|_\infty increases with L until L=L^* and then \|\hat d^L\|_\infty is constant with L.

Their behavior will be illustrated with different function families in Section 6.

3.2 Asymptotic confidence band for f^{L}

Note that in the asymptotic framework n\rightarrow\infty, the previous definition of \hat d^L induces a natural asymptotic confidence band for the function f^{L}. Indeed, we can prove that

Theorem 2 Set Assumption 1 and Assumption 3 and a probability \alpha\in [0,1]. Then, we have, \lim_{n \rightarrow +\infty} \mathbb{P}\left(\forall t \in [0,1], |\underline{\hat{f}}^{L}(t)-f^{L}(t)| \leq \hat d^L(t)\right) \geq 1-\alpha, with \hat d^L(t) = \hat q^L \sqrt{\hat C^{L}(t,t)/N} and \hat q^L is defined as the solution of Equation 4.

The proof is given in Appendix.

Then a confidence band for f^{L} at the asymptotic confidence level 1-\alpha for a large number of observations n is given by \begin{align*} CB(f^{L})& = \left\{ \forall t\in [0,1], \left[\underline{\hat{f}}^{L}(t) -\hat d^L(t) ; \underline{\hat{f}}^{L}(t) +\hat d^L(t) \right]\right\}. \end{align*}

4 Method 1: Confidence Band of f^{L^*} by correcting the bias

The function of interest is f^{L^*}=\underline{f}^{L^*, L^*}, rather than \underline{f}^{L}. Therefore, our objective is to construct a confidence band for f^{L^*}. However, an unbiased estimator of f^{L^*} is not available by definition, since the true dimension L^* is unknown. We propose instead to work with the estimator \underline{\hat f}^{L} and to debias the corresponding confidence band.

To do this, we use the decomposition between the bias term and the statistical term, outlined in Equation 2. The idea is to bound the infinite norm of these two terms. A first strategy is to bound each term separately, then add the two bounds to construct the confidence band. However, this approach tends to produce a band that is too large and too conservative. This is because applying the infinite norm to each term before bounding them does not take into account the functional nature of the two terms.

A second strategy is to retain the functional aspect by bounding the infinity norm of the sum of the two functional terms. This approach is detailed in this section.

In Section 4.1, we first rewrite the band as a band around \underline{f}^{L}(t). We need to estimate the band bound and the bias. To do this, we divide the sample into two sub-samples. This choice is not ideal because it increases the variability of the estimators. But at least, it provides independence between the estimators of the two quantities, which makes it possible to establish the theoretical coverage of the final band. More precisely, we use a first subsample \mathbf{y}^1 to estimate the bound defined in Section 3. A second subsample \mathbf{y}^2 is used to estimate the bias term (without the infinite norm). This results in a pointwise correction of the bias, and the final confidence band is centered around \underline{\hat{f}}^{L_{\max},L^*}.

This procedure provides a collection of confidence bands, for L\in \{L_{\min}, \ldots, L_{\max}\} with variable width. Then, in Section 4.2, we propose a criterion to select the “best” band by minimizing its width. We discuss the band thus obtained at the end of the section and its limits.

4.1 Construction of the band of f^{L^*} for a given L

We introduce two independent sub-samples \mathbf{y}^1 and \mathbf{y}^2 of \mathbf{y} of length N_1 and N_2 such that N_1+N_2=N.

We use the first sub-sample \mathbf{y}^1 to calculate \underline{\hat{f}_1}^{L}(t), an estimator of \underline{f}^{L}(t) and a functional bound denoted \hat d_1^L that controls the bias term \underline{f}^{L}(t)-\underline{\hat{f}_1}^{L}(t). This bound is defined in Section 3 applied on \mathbf{y}^1, for a given level \alpha, such that:

\mathbb{P}\left(\forall t\in[0,1], -\hat d_1^L(t) \leq \underline{f}^{L}(t) - \underline{\hat{f}}_1^{L}(t) \leq \hat d_1^L(t)\right) = 1-\alpha. \tag{5}

Next, we need to control the bias Bias_{L}(t) = \underline{f}^{L}(t)-f^{L^*}(t). Recall that when L_{\max} is sufficiently large and n> L_{\max}, we have f^{L^*} = \underline{f}^{L_{\max},L^*}. We therefore need to control the Bias_{L}(t) = \underline{f}^{L}(t) - \underline{f}^{L_{\max},L^*}(t). It would be tempting to replace Bias_{L}(t) by its estimation based on the second sample \mathbf{y}^2, but this would introduce an estimation error that we also need to control, in the same spirit as what is done in Lacour, Massart, and Rivoirard (2017). We can again use Section 3 to compute the function \hat d_2^{L,L_{\max}}(t) on the sample \mathbf{y}^2, and the functional estimators \underline{\hat{f}}_2^{L}(t) and \underline{\hat f}_2^{L_{\max}, L^*}(t) of \underline{f}^{L}(t) and \underline{ f}^{L_{\max}, L^*}(t), respectively. This allows us to construct the following band for \underline{f}^{L}(t) - \underline{f}^{L_{\max},L^*} for a confidence level 1-\beta,

\mathbb{P}\left(\forall t\in[0,1], - \hat d_2^{L,L_{\max}}(t) \leq \underline{ f}^{L_{\max}, L^*}(t)- \underline{ f}^{L}(t) - (\underline{\hat f}_2^{L_{\max}, L^*}(t)-\underline{\hat f}_2^{L}(t)) \leq \hat d_2^{L,L_{\max}}(t)\right) = 1-\beta. \tag{6}

Combining Equation 5 and Equation 6, we can provide a debiased confidence band of f^{L^*}(t).

Proposition 3 Let us define \begin{align*} \hat\theta_1^L(t) &:= -\hat d_1^L(t) - \hat d_2^{L,L_{\max}}(t) +\underline{\hat f}_2^{L_{\max}, L^*}(t)-\underline{\hat f}_2^{L}(t) \\ \hat\theta_2^L(t)&:= \hat d_1^L(t) + \hat d_2^{L,L_{\max}}(t) +\underline{\hat f}_2^{L_{\max}, L^*}(t) - \underline{\hat f}_2^{L}(t), \end{align*} where \hat d_1^L(t) is defined on sample \mathbf{y}^1 by Equation 5 for a level \alpha and \hat d_2^{L,L_{\max}}(t) is defined on sample \mathbf{y}^2 by Equation 6 for a level \beta. Then we have \begin{align*} \mathbb{P}\left(\forall t \in [0,1], \quad \hat \theta_1^L(t) \leq f^{L^*}(t)- \underline{\hat f}_1^{L}(t)\leq \hat\theta_2^L(t)\right)\geq 1-\alpha\beta. \end{align*}

The proof is given in Appendix.

This defines a confidence band which can be defined either around \underline{\hat{f}_{1}}^{L}: \begin{align*} CB_2(\underline{f}^{L^*})=\left\{\forall t\in[0,1], \left[\underline{\hat{f}_1}^{ L, L^*}(t) + \hat\theta_1^{ L}(t) \, ;\, \underline{\hat{f}_1}^{ L, L^*}(t) + \hat\theta_2^{ L}(t) \right] \right\} \end{align*} or around \underline{\hat f}_2^{ L_{\max}, L^*}: \begin{align*} CB_2(\underline{f}^{L^*})=\left\{\forall t\in[0,1], \left[\underline{\hat f_2}^{ L_{\max}, L^*}(t) + \bar \theta_1^L(t) \, ; \, \underline{\hat f_2}^{ L_{\max}, L^*}(t) +\bar \theta_2^L(t) \right]\right\}. \end{align*}

with \bar \theta_1^L(t) = \underline{\hat{f}_{1}}^{L}(t)-\underline{\hat f_2}^{ L, L^*}(t) -\hat d_1^L(t) - \hat d_2^{L,L_{\max}}(t) and \bar \theta_2^L(t) = \underline{\hat{f}_{1}}^{L}(t)-\underline{\hat f_2}^{ L, L^*}(t) + \hat d_1^L(t) + \hat d_2^{L,L_{\max}}(t).

Remark 1. The two functions \hat d_1^L(t) and \hat d_2^{L,L_{\max}}(t) are of the same order because they are constructed using the same approach. They depend on the length of the samples. To obtain the thinnest band, the best strategy is to divide the sample in two sub-samples of equal length N_1=N_2=N/2.

The behavior of \hat d_1^L was described in Section 3. Let us describe the behavior of \hat d_2^{L,L_{\max}}:

\|\hat d_2^{L,L_{\max}}\|_\infty decreases with L.
When L>L^\varepsilon, \|\hat d_2^{L,L_{\max}}\|_\infty is constant with L and the probability in Equation 6 is equal to 1.
When L^{*}<L<L^\varepsilon, \|\hat d_2^{L,L_{\max}}\|_\infty is constant with L when the functions B_\ell^L form an orthonormal family. Otherwise, the behavior is erratic.

This means that when the band defined in Proposition 3 is calculated for L>L^\varepsilon, the confidence level is 1-\alpha instead of 1-\alpha\beta.

The advantage of this approach is that the band bias is corrected and the level for the true function f^{L^*} is guaranteed when L^\varepsilon is large. This was the main aim of the paper. The main limit of this approach is that the band is constructed with samples with half sizes, leading to less precision. This will be illustrated in Section 6. Nevertheless, this method gives finer confidence bands than cross-validation, and with the right level of confidence.

A natural question is then the choice of the dimension L. This is the purpose of the next section.

4.2 Influence of L

This approach produces a collection of debiased confidence bands for different values of L. The confidence bands have different widths but the same confidence level 1-\alpha\beta. It is therefore natural to want to select one of them. This means we want to select the best dimension L among the collection \{L_{\min}, \ldots, L_{\max}\}. We need to define what “best” means. It is quite intuitive to want to focus on the finest band, fine in the sense of a certain norm. Here we consider the infinite norm of the width of the confidence band. This gives preference to smooth bands. We therefore define the following criteria for selecting L.

\hat L = \arg\min_L \left\{\sup_{t} |\hat\theta_2^L(t)-\hat\theta_1^L(t)|\right\} = \arg\min_L \left\{\sup_t | \hat d^L(t) + \hat d^{L,L_{\max}}(t)|\right\}. \tag{7}

This global approach guarantees that each band of the collection is debiased and then the dimension is selected. It will be illustrated in Section 6.

In the next section, instead of debiasing each band, we employ another strategy focusing on the construction of a selection criteria that will guarantee that the bias is negligible.

5 Method 2: Selection of the best confidence band with a criteria taking into account the bias

In this section, we propose a new method in the non asymptotic setting to provide a confidence band of f^{L^*} without correction the bias but taking it into account in the selection procedure.

Our method uses the collection of confidence bands defined in Section 3. Instead of correcting their bias, the strategy is to propose a selection criterion that is a trade-off between this bias and the dimension of the basis. To do this, we propose a new heuristic criterion linked to the definition of the band itself, considering the estimation of the band as the estimation of a quantile of a certain empirical process. The criterion is inspired by model selection tools for choosing the best dimension L. In the following, we assume that L_{\max} is large enough such that \underline f^{L_{\max},L^*}=f^{L^*}.

We work on the quantile q^L introduced in Equation 3, its oracle version q^{L^*} for level L^* and its estimate \hat q^L. All are scalars, belonging to a collection indexed by L \in \{L_{\min}, \ldots, L_{\max}\}. A natural criterion for choosing the best L is that the estimator \hat q^L minimizes the quadratic error \mathbb{E}\left( \|q^{L^*}-\hat q^L\|^2\right). However, this quadratic error is unknown as q^{L^*} is unknown. We cannot use it directly.

Instead, we study \|\hat{q}^{L_{\max}}-\hat q^L\|^2. While the theoretical quadratic error \mathbb{E}\left( \|q^{L^*}-\hat q^L\|^2\right) decreases when L<L^* and increases when L>L^*, the \|\hat{q}^{L_{\max}}-\hat q^L\|^2 approximation to this error always decreases when L>L^*, as illustrated in Section 6.

We see a behavior similar to a bias, high when the dimension is small, and small when the dimension is large. Selecting a dimension using this criterion will always overfit the data. We therefore propose to penalize this quantity by the dimension L divided by the sample size N, as usual in model selection criteria. To do this, we introduce a regularisation parameter \lambda>0 which balances the two terms. A natural criterion to select the best L is then \widetilde{crit}(L) = \|\hat q^{L_{\max}}-\hat q^L\|^2 +\lambda \frac{L}{N}.

Then we define \tilde L = \arg\min_L \widetilde{crit}(L), and take the band centered around \underline{\hat{f}}^{\tilde L, L^*}: CB_3(\underline{f} ^{L^*}) = CB_1(\underline{f}^{\tilde L, L^*})

This criterion is illustrated in Section 6.5. We also compare with two other standard approaches, heuristic as well, namely the cross-validation approach used to select the dimension L which minimizes the reconstruction error, and a thresholding method which keeps the higher dimension L with large enough coefficients. These two methods are less oriented to the objective of controling the level of the selected confidence band.

6 Simulation study

In this section, we illustrate the different statements provided along the paper on generated data. First, in Section 6.1, we describe the generating data process and illustrate the linear estimator considered in this paper. In Section 6.2, we illustrate the first confidence band, for a fixed level, as introduced in Section 3. Then, we illustrate the debiased confidence band in Section 6.3, and discuss the model selection criterion in Section 6.4, comparing both of them with state-of-the-art methods in Section 6.5. We finally study the generalization of the method out of the class of models in Section 6.6.

6.1 Generating data process

To illustrate the model, we simulate a regression functional model with n=40 regular timepoints per individual and N=25 individuals. In Figure 1, the function f (black curve) belongs to the Fourier (resp. Legendre and Spline) family with L^*=7 and the noisy individual observations (y_{ij})_{1\leq i \leq N, 1\leq j \leq n} (grey curves) have a functional noise in dimension L^{\varepsilon}=15, also in the Fourier (resp. Legendre and Spline) family on the left plot (resp. middle and right).

Hide/Show the code

library(ggplot2)
library(fda)
library(orthopolynom)
library(wavelets)
library(basefun)
library(fields)
library(ggpubr)

func.basis = function(time, L, basis){
  if (basis == 'Splines'){
    B = bsplineS(time,breaks = seq(min(time), max(time), length.out = L-1), norder = 3, 0) 
  } else if (basis == 'Fourier'){
    B = fourier(time,nbasis = L) 
  } else if (basis == 'Legendre'){
    B = Legendre_basis(numeric_var("x", support = c(min(time), max(time))), 
                       order = L-1)(time)
  }
  return(B)
}
dgp = function(n=40,N=25,sd=1,L.star=7, L.eps = 15, alpha.mu = NULL, basis = 'Fourier', f.true = NULL){ 
  ### n: number of timepoints
  ### N: number of individuals
  ### sd: noise level (standard deviation)
  ### L: number of functional basis for the signal
  ### L.eps: number of functional basis for the noise
  ### alpha.mu: vector of coefficients
  ### basis: functional basis, to be chosen in 'Legendre', 'Fourier', 'Splines'
  ### f.true: if one wants to fix the true function
  time = seq(0, 1, length.out = n)
  
  if (basis == 'Fourier'){
    if (L.star%%2 == 0){ L.star = L.star+1}
    if (L.eps%%2 == 0){ L.eps = L.eps+1}
  }
  
  if (is.null(f.true)){
    ### Create the functional basis 
    B = func.basis(time, L.star, basis)
    
    ### Generate randomly coefficients, if needed
    if (is.null(alpha.mu)){
      alpha.mu = sample(x = c(-5:-2, 2:5),size=L.star,replace=TRUE)} 
    if (basis == 'Fourier'){
      alpha.mu = alpha.mu / 3
      sd = sd/2
    }
    y = matrix(0,nrow = N, ncol = n)
    f.true = B %*% alpha.mu
  }
  B.eps = func.basis(time, L.eps, basis)
  y = matrix(0,nrow = N, ncol = n)
  for (i in 1:N){
    y[i,] = f.true + B.eps %*% matrix(rnorm(L.eps, 0, sd))
  }
  return(list(y = t(y), alpha.mu = alpha.mu, time = time, L.star=L.star, f.true = f.true))
}
set.seed(1)
n=40
N=25
sd=1
L.star=7
L.eps = 15
alpha.mu = NULL

set.seed(1)
basis = "Legendre"
data = dgp(n=n,N=N,sd=sd,L.star=L.star, L.eps = L.eps, alpha.mu = alpha.mu, basis = basis, f.true = NULL)
dfLegendre = data.frame( Time = rep(data$time,N+1),
                         basis = rep(basis, (N+1)*length(data$time)),
                         f = c(data$y, data$f.true),
                         type = rep(c(rep("Obs", N), "True mean"), each = n), ind = rep(1:(N+1), each = n))



basis = "Fourier"
data = dgp(n=n,N=N,sd=sd,L.star=L.star, L.eps = L.eps, alpha.mu = alpha.mu, basis = basis, f.true = NULL)
dfFourier = data.frame( Time = rep(data$time,N+1),
                         basis = rep(basis, (N+1)*length(data$time)),
                         f = c(data$y, data$f.true),
                         type = rep(c(rep("Obs", N), "True mean"), each = n), ind = rep(1:(N+1), each = n))

basis = "Splines"
data = dgp(n=n,N=N,sd=2*sd,L.star=L.star, L.eps = L.eps, alpha.mu = alpha.mu, basis = basis, f.true = NULL)
dfSplines = data.frame( Time = rep(data$time,N+1),
                         basis = rep(basis, (N+1)*length(data$time)),
                         f = c(data$y, data$f.true),
                         type = rep(c(rep("Obs", N), "True mean"), each = n), ind = rep(1:(N+1), each = n))

df = rbind(dfLegendre, dfFourier, dfSplines)
ggplot(data = df, aes(x = Time, y = f, color = type, group = ind)) + 
  geom_line(linewidth= 0.5)+
  scale_colour_manual(values = c("Obs" = "lightgrey","True mean" = "black")) +
  facet_grid(~basis)

Figure 1: Functional regression with different bases. We generate a regression functional model using three different basis families: Fourier (left), Legendre (middle), and Splines (right). The black curve represents the true mean function, while the gray curves show individual noisy observations.

6.2 Confidence band for a fixed level

The general band for f^L derived in Theorem 1 is illustrated on Figure 2. It displays on the top row several functional data generated under either the Fourier family (left), Legendre (middle) or Spline (right), on the middle row the confidence bands of \underline f^{L} for different values of L \in \{3,5, 7, 11\} and 15, and on the bottom row the bound \hat d^L. On the middle row, the true functions \underline f^{L} are displayed in black and the confidence bands in color. The bands are very precise for each L. The behavior of \hat d^L increases with L. As d^L can be seen as a variance, \hat d^L(t) is larger on the boundary of the time domain, as there are less observations near 0 and 1.

We also evaluate numerically the levels of the obtained confidence bands. For this, 1000 datasets are simulated, the confidence band is estimated for each of them. The empirical confidence level is then evaluated as the proportion of confidence bands that contain the true function over 5000 test timepoints. Table 1 presents the empirical confidence levels for different values of L and several sample sizes n \in \{40, 150\}, and N \in \{10, 25, 60\}, and for the 3 basis. When N=25 and N=60, the level is the expected one whatever the value of L. We will see in the next sections that this will not be the case for the debiased confidence band. When N=10, the level is too small, especially when L is large. This might be due to the the large number of parameters to be estimated in the covariance matrix, with a small number of observations N.

Hide/Show the code

estimator = function(data, basis, L){
  ### data: the observations, only the timepoints and the individual functions
  ### basis: functional basis, to be chosen in 'Fourier', 'Splines'
  ### L: number of functional basis for the signal
  
  if (basis == 'Fourier'){
    if (L%%2 == 0){ L = L+1}
  }
  B = func.basis(data$time, L, basis)
  
  ### Data projection
  Hi = solve(t(B) %*% B) %*% t(B) %*% data$y
  mu.hat = rowMeans(Hi)
  f.hat = c(B %*% mu.hat)
  data.proj = data
  data.proj$y = B %*% Hi
  cov.emp = crossprod(t(data.proj$y - f.hat)) / ncol(data.proj$y)
  
  return(list(f.hat=f.hat, mu.hat = mu.hat, cov.emp = cov.emp, data.proj = data.proj))
}

compute.c.L = function(data.scl, alpha = 0.05){
  ### data.scl: the rescaled data 
  ### alpha: the level for the confidence band
  x_approx = apply(data.scl$y, 2,FUN=function(yy){
    fn = stats::splinefun(x = data.scl$time, y = yy, method = "natural")
    pracma::fderiv(f = fn, x = data.scl$time, n = 1, h = 1e-6, method = "central")
  })
  N = ncol(data.scl$y)
  hat.tau = apply(x_approx, 1, stats::sd) 
  hat.tau[which(is.na(hat.tau))] = 0
  tau_01 = (hat.tau)[-1]%*%diff(data.scl$time)
  myfun = function(c){stats::pt(c, lower.tail = FALSE, df = N-1) + tau_01 * 
      (1 + c^2/(N-1))^(-(N-1)/2)/(2 * pi) - alpha/2}
  c.L = stats::uniroot(f = myfun,interval = c(.5,8))$root
  return(c.L)
}

CB1 = function(data, basis, L, alpha = 0.05){
  ### data: the observations, only the timepoints and the individual functions
  ### basis: functional basis, to be chosen in 'Fourier', 'Splines'
  ### L: number of functional basis for the signal
  ### alpha: the level for the confidence band
  
  est = estimator(data, basis, L)
  data.proj = est$data.proj
  data.scl = data.proj
  data.scl$y = (data.proj$y - est$f.hat)/sqrt(diag(est$cov.emp))
  
  c.L = compute.c.L(data.scl, alpha)
  d.L = c.L %*% sqrt(diag(est$cov.emp)/ncol(data.scl$y))
  f.hat.up = est$f.hat + d.L
  f.hat.low = est$f.hat - d.L
  
  return(list(f.hat.up = f.hat.up, f.hat.low = f.hat.low, d.L = d.L, c.L = c.L, width = max(abs(f.hat.up-f.hat.low))))
}

CB.empmean = function(data, alpha=0.05){
  ### data: the observations, only the timepoints and the individual functions
  ### alpha: the level for the confidence band
  
  est = rowMeans(data$y)
  cov.emp = crossprod(t(data$y - est)) / ncol(data$y)
  data.scl = data
  data.scl$y = (data$y - est)/sqrt(diag(cov.emp))
  
  c.L = compute.c.L(data.scl, alpha)
  d.L = c.L %*% sqrt(diag(cov.emp)/ncol(data.scl$y))
  f.hat.up = est + d.L
  f.hat.low = est - d.L
  
  return(list(f.hat.up = f.hat.up, f.hat.low = f.hat.low, d.L = d.L, c.L = c.L, width = max(abs(f.hat.up-f.hat.low))))
}


d.L.Lmax = function(data, basis, L, Lmax=30, alpha = 0.05){
  est.2.f = estimator(data, basis, L)
  data.2.proj = est.2.f$data.proj
  est.2.f.Lmax = estimator(data, basis, Lmax)
  data.2.proj.Lmax = est.2.f.Lmax$data.proj
  residuals.L.Lmax = data.2.proj.Lmax$y - data.2.proj$y - (est.2.f.Lmax$f.hat - est.2.f$f.hat) 
  cov.emp.rest = crossprod(t(residuals.L.Lmax)) / ncol(residuals.L.Lmax)
  data.2.scl = data.2.proj
  data.2.scl$y = residuals.L.Lmax/sqrt(cov.emp.rest)
  c.L.Lmax.2 = compute.c.L(data.2.scl, alpha)
  d.L.Lmax.2 = c.L.Lmax.2 %*% sqrt(cov.emp.rest/ncol(data.2.scl$y))
  return(c.L.Lmax.2)
}

Hide/Show the code

dfObs = data.frame(Time = double(),
                   basis = character(),
                   f = double(),
                   ind = integer())
dfTrue = data.frame(Time = double(),
                    basis = character(),
                    f = double())
dfTrueProjected = data.frame(Time = double(),
                             basis = character(),
                             f = double(),
                             dimension = character())
dfBand <- data.frame(Time = double(),
                     basis = character(),
                     flow = double(),
                     fup = double(),
                     dimension = character())
df.emp <- data.frame(Time = double(),basis = character(),
                     flow = double(),
                     fup = double())
dfdL <- data.frame(Time = double(),
                   basis = character(),
                   dL = double(),
                   dimension = character())
 dfdL.emp  <- data.frame(Time = double(),
                   basis = character(),
                   dL = double())
vec.L = c(3,5,7, 11, 15)
for (basis in c("Legendre", "Fourier", "Splines")){
  N = 25
  
  if (basis == "Splines"){
    data = dgp(sd=2*1,alpha.mu = NULL, basis = basis, f.true = NULL)
  } else  {data = dgp(alpha.mu = NULL, basis = basis, f.true = NULL)}
  
  dfObs = rbind(dfObs,data.frame(Time = rep(data$time, N),
                                 basis = rep(basis, length(c(data$y))),
                                 f = c(data$y),
                                 ind = rep(1:N, each = length(data$time))))
  
  dfTrue = rbind(dfTrue,data.frame(Time = data$time,
                                   basis = rep(basis, length(data$time)),
                                   f = data$f.true))
  for (L in vec.L){
    conf.band = CB1(data, basis, L)
    conf.band.emp = CB.empmean(data)
    B = func.basis(data$time, L, basis)
    f.true.L = t(B %*% solve(t(B) %*% B) %*% t(B) %*% data$f.true)
    dfBand = rbind(dfBand, data.frame(Time = data$time,
                                      basis = rep(basis, length(data$time)),
                                      flow = c(conf.band$f.hat.low), 
                                      fup = c(conf.band$f.hat.up), 
                                      dimension = as.factor(rep(L, length(data$time)))))
    dfTrueProjected = rbind(dfTrueProjected, data.frame(Time = data$time,
                                                        basis = rep(basis, length(data$time)),
                                                        f = c(f.true.L), 
                                                        dimension = rep(L, length(data$time))))
    dfdL = rbind(dfdL, data.frame(Time = data$time,
                                  basis = rep(basis, length(data$time)),
                                  dL = c(conf.band$d.L),
                                  dimension = as.factor(rep(L,length(data$time)))))                             
  } 
}

p1 <- ggplot() + 
  geom_line(data = dfObs, aes(x = Time, y = f, group = ind), linewidth = 0.1, color = "grey") +
  geom_line(data = dfTrue, aes(x = Time, y = f), colour = "black", linewidth = 0.5) + facet_grid(~basis)


p2 <- ggplot() +
  geom_line(data = dfBand, aes(x = Time, y = flow, group = dimension, color = dimension), linewidth = 0.8) +
  geom_line(data = dfBand, aes(x = Time, y = fup, group = dimension, color = dimension), linewidth = 0.8) +   
   geom_line(data = dfTrueProjected, aes(x = Time, y = f, group = dimension), colour = "black", linewidth = 0.2)+ 
  facet_grid(~basis) + ylab("confidence band")


p3 <- ggplot() +
  geom_line(data = dfdL, aes(x = Time, y = dL, group = dimension, color = dimension)) +
  facet_grid(~basis)

ggarrange(p1, p2, p3, nrow = 3, common.legend = TRUE, legend = "bottom")

Figure 2: Confidence bands for f^L. For the three basis families (Fourier, on the left, Legendre, on the middle, and Splines on the right) we display: (top row) the observed functional data; (middle row) the estimated confidence bands for increasing values of L (3, 5, 7, 11 and 15); and (bottom row) the associated bound dL.

Hide/Show the code

library(knitr)
library(kableExtra)
run = FALSE
vec.L = c(3,5,7, 11,15)
vec.n = c(40, 150) 
vec.N = c(10, 25, 60)

if (run == TRUE){
  nb.repeat = 1000
  n.test = 5000
  vec.basis = c('Fourier', "Legendre", "Splines")
  time = seq(min(data$time), max(data$time), length.out = n.test)
  perf = array(NA, dim = c(max(vec.L),length(vec.n),length(vec.N), length(vec.basis)))
  for (ind.basis in 1:length(vec.basis)){
    for (ind.n in 1:length(vec.n)){
      for (ind.N in 1:length(vec.N)){
        for (L in vec.L){
          cpt = 0
          for (rep in 1:nb.repeat){
            set.seed(rep)
            if (vec.basis[ind.basis] == "Splines"){ data = dgp(n=vec.n[ind.n], N = vec.N[ind.N], basis = vec.basis[ind.basis], sd = 2)
            } else { data = dgp(n=vec.n[ind.n], N = vec.N[ind.N], basis = vec.basis[ind.basis]) }
            conf.band = CB1(data, vec.basis[ind.basis], L, alpha= 0.05)
            if (vec.basis[ind.basis] == 'Fourier'){
              if (L%%2 == 0){ L = L+1}
            }
            B.old = func.basis(data$time, L, vec.basis[ind.basis])
            B = func.basis(time, L, vec.basis[ind.basis])
            f.true.L = t(B %*% solve(t(B.old) %*% B.old) %*% t(B.old) %*% data$f.true)
            f.hat.up = stats::spline(x = data$time, y = conf.band$f.hat.up, xout = time, method = "natural")$y
            f.hat.low = stats::spline(x = data$time, y = conf.band$f.hat.low, xout = time, method = "natural")$y
            if (sum(f.true.L< f.hat.up)+sum(f.true.L> f.hat.low) == 2*n.test){
              cpt = cpt+1
            } 
          }
          perf[L,ind.n, ind.N, ind.basis] = cpt/nb.repeat
        }
      }
    }
  }
  save(perf, file = "res_data/Res_Tab1.RData")
} else {load("res_data/Res_Tab1.RData") } ### charger les données

perf.F = cbind(perf[vec.L,,1,1],perf[vec.L,,2,1],perf[vec.L,,3,1])
perf.L = cbind(perf[vec.L,,1,2],perf[vec.L,,2,2],perf[vec.L,,3,2])
perf.S = cbind(perf[vec.L,,1,3],perf[vec.L,,2,3],perf[vec.L,,3,3])
rownames(perf.F) = vec.L
rownames(perf.L) = vec.L
rownames(perf.S) = vec.L
colnames(perf.F) = c('40/10', '150/10', '40/25', '150/25', '40/60', '150/60')
colnames(perf.L) = c('40/10', '150/10', '40/25', '150/25', '40/60', '150/60')
colnames(perf.S) = c('40/10', '150/10', '40/25', '150/25', '40/60', '150/60')

#kable(perf)
perf.F %>%
  kable("latex", booktabs = TRUE) %>%
  add_header_above(c("L" = 1, "n/N" = 6))
perf.L %>%
  kable("latex", booktabs = TRUE) %>%
  add_header_above(c("L" = 1, "n/N" = 6))
perf.S %>%
  kable("latex", booktabs = TRUE) %>%
  add_header_above(c("L" = 1, "n/N" = 6))

Table 1: Empirical coverage of CB for different basis families. The empirical confidence level of the confidence band CB is estimated over 1000 repetitions, for alpha = 0.05. Confidence bands are calculated with the Fourier (top), Legendre (middle) and Splines (bottom) basis families. Each row corresponds to a different number of basis functions L, and each column to a different pair of sample sizes (n,N).

6.3 Method 1: confidence bands by correcting the bias

We illustrate the confidence band of f^{L^*} given in Proposition 3. In Figure 3, top row, we plot the confidence bands obtained for different dimensions L \in \{3,5,7, 11,15\} with Fourier, Legendre and Splines families and \alpha=\beta=\sqrt{0.05}\approx 0.22. We can see that all the confidence bands are alike. Especially, they are unbiased, even for L=3. A larger dimension L provides a smoother band. On the middle and bottom rows of Figure 3, we illustrate the two terms that enter the confidence band, t \mapsto \hat d_1^L(t) and t\mapsto \hat d_2^{L,L_{\max}}(t). Their behavior is the same along time. The function \hat d_1^L(t) can be seen as a variance, this is why it is larger near 0 and 1 where there are less observations. The function \hat d_2^{L,L_{\max}}(t) is smaller than \hat d_1^L(t) because it controls the remaining rest after the projection. Note that as expected when L>L^\varepsilon, \hat d_2^{L,L_{\max}}(t) is close to 0. As explained before, the influence of L is not the same for the two functions. When L increases, \hat d_1^L(t) increases while \hat d_2^{L,L_{\max}}(t) decreases.

Hide/Show the code

CB2 = function(data, basis, L, Lmax = 30, alpha = 0.05, beta = 0.05){
  ### data: the observations, only the time-points and the individual functions
  ### basis: functional basis, to be chosen in 'Fourier', 'Splines'
  ### L: number of functional basis for the signal
  ### alpha: the level for the confidence band
  data.1 = data.2 = data
  ind.1 = sample(1:dim(data$y)[2], dim(data$y)[2]/2)
  data.1$y = data$y[,ind.1]
  data.2$y = data$y[,-ind.1]
  
  ### d.L.1
  est.1.f = estimator(data.1, basis, L)
  data.1.proj = est.1.f$data.proj
  data.1.scl = data.1.proj
  data.1.scl$y = (data.1.proj$y - est.1.f$f.hat)/sqrt(diag(est.1.f$cov.emp))
  c.L.1 = compute.c.L(data.1.scl, alpha)
  d.L.1 = c.L.1 %*% sqrt(diag(est.1.f$cov.emp)/ncol(data.1.scl$y))
  
  ### d.L.2
  est.2.f = estimator(data.2, basis, L)
  data.2.proj = est.2.f$data.proj
  est.2.f.Lmax = estimator(data.2, basis, Lmax)
  data.2.proj.Lmax = est.2.f.Lmax$data.proj
  residuals.L.Lmax = data.2.proj.Lmax$y - data.2.proj$y - (est.2.f.Lmax$f.hat - est.2.f$f.hat) 
  cov.emp.rest = crossprod(t(residuals.L.Lmax)) / ncol(residuals.L.Lmax)
  data.2.scl = data.2.proj
  data.2.scl$y = residuals.L.Lmax/sqrt(diag(cov.emp.rest))
  c.L.Lmax.2 = compute.c.L(data.2.scl, beta)
  d.L.Lmax.2 = c.L.Lmax.2 %*% sqrt(diag(cov.emp.rest)/ncol(data.2.scl$y))
  
  f.hat.up = est.2.f.Lmax$f.hat - est.2.f$f.hat + est.1.f$f.hat + d.L.1 + d.L.Lmax.2 
  f.hat.low = est.2.f.Lmax$f.hat - est.2.f$f.hat + est.1.f$f.hat - d.L.1 - d.L.Lmax.2
  
  return(list(f.hat.up = f.hat.up, f.hat.low = f.hat.low, d.L = d.L.1, d.L.Lmax = d.L.Lmax.2, width = max(abs(f.hat.up-f.hat.low))))
}

set.seed(1)

dfTrue = data.frame(Time = double(),
                    f = double(),
                    basis = character()
)
dfBandup = data.frame(Time = double(),
                      bandup = double(),
                      basis = character(),
                      dimension = character()
)
dfBandlow = data.frame(Time = double(),
                       bandlow = double(),
                       basis = character(),
                       dimension = character()
)
dfdL = data.frame(Time = double(),
                  dLLmax = double(),
                  basis = character(),
                  dimension = character()
)
dfdLLmax = data.frame(Time = double(),
                      dL = double(),
                      basis = character(),
                      dimension = character()
)
vec.L = c(3,5,7,11,15)
for (basis in c("Legendre", "Fourier", "Splines")){
  if (basis == "Splines"){ data = dgp(sd=2*1, alpha.mu = NULL, basis = basis, f.true = NULL)
  } else { data = dgp(sd=1, alpha.mu = NULL, basis = basis, f.true = NULL)}
  dfTrue = rbind(dfTrue, data.frame(Time = data$time,
                                    f = data$f.true,
                                    basis = rep(basis, length(data$time))))
  
  crit.sel = matrix(NA, ncol = 3, nrow = 22)
  for (L in vec.L){
    conf.band.L.star.2 = CB2(data, basis, L, Lmax = 25, alpha = 0.05)
    crit.sel[L,] =c(max(conf.band.L.star.2$d.L), max(conf.band.L.star.2$d.L.Lmax),
                    sum(abs(conf.band.L.star.2$d.L+conf.band.L.star.2$d.L.Lmax)))
    dfBandup = rbind(dfBandup, data.frame(Time = data$time,
                                          bandup = t(conf.band.L.star.2$f.hat.up),
                                          basis = rep(basis, length(data$time)),
                                          dimension = as.factor(rep(L, length(data$time)))))
    dfBandlow = rbind(dfBandlow, data.frame(Time = data$time,
                                            bandlow = t(conf.band.L.star.2$f.hat.low),
                                            basis = rep(basis, length(data$time)),
                                            dimension = as.factor(rep(L, length(data$time)))))
    dfdL = rbind(dfdL, data.frame(Time = data$time,
                                  dL = t(conf.band.L.star.2$d.L),
                                  basis = rep(basis, length(data$time)),
                                  dimension = as.factor(rep(L, length(data$time)))))
    dfdLLmax = rbind(dfdLLmax, data.frame(Time = data$time,
                                          dLLmax = t(conf.band.L.star.2$d.L.Lmax),
                                          basis = rep(basis, length(data$time)),
                                          dimension = as.factor(rep(L, length(data$time)))))
    
  }
}

p1 <- ggplot()+
  geom_line(data = dfTrue, aes(x = Time, y = f)) +
  geom_line(data = dfBandup, aes(x = Time, y = bandup, group = dimension, color = dimension), linewidth = 0.4) +
  geom_line(data = dfBandlow, aes(x = Time, y = bandlow, group = dimension, color = dimension), linewidth = 0.4) +
  facet_grid(~ basis)

p2 <- ggplot() + 
  geom_line(data = dfdL, aes(x = Time, y = dL, group = dimension, color = dimension), linewidth = 0.4) + 
  facet_grid( ~ basis) + ylab("d.L")

p3 <- ggplot() + 
  geom_line(data = dfdLLmax, aes(x = Time, y = dLLmax, group = dimension, color = dimension), linewidth = 0.4) + 
  facet_grid( ~ basis) + ylab("d.L.Lmax")

ggarrange(p1, p2, p3, nrow = 3, common.legend = TRUE, legend = "bottom")

Figure 3: Visualization of confidence bands by correcting the bias. For a given dataset, we plot several confidence bands (top row), functions dL (middle row) and dLLmax (bottom row). Bands and functions are estimated with Fourier (left column), Legendre (middle column) and Spline (right column) basis and several dimensions L (3, 5, 7, 11, 15), and Lmax = 25.

In Table 2, we simulate 1000 repeated datasets with the Legendre family and with two sample sizes n=40 and n=150 and N=25. For each dataset, we compute the confidence band defined in Proposition 3 with a theoretical confidence level of 1-\alpha\beta=0.95 and for different values of L. Then the confidence level is approximated as the proportion of confidence bands containing the true function f over 5000 test timepoints. Remark that when L<L^{\varepsilon}, the level is the expected one, that is 0.95. When L>L^{\varepsilon}, the level is no more ensured, as explained before. Indeed the term d^{L,L^{\max}} is mainly equal to 0 when L^{\max} is large enough, and the level is close to 1-\alpha instead of 1-\alpha\beta. This is not the case for the band in Section 3, as this is due to the correction of the bias.

Hide/Show the code

run = FALSE
table_list = list()
if (run == TRUE){
  nb.repeat = 1000
  n.test = 5000
  vec.L = c(3,5,7,11,15)
  vec.n = c(40, 150) 
  time = seq(min(data$time), max(data$time), length.out = n.test)
  
  for (basis in c( "Fourier", "Legendre", "Splines")){
    perf = matrix(NA, nrow = max(vec.L)+1, ncol = 2)
    for (ind.n in 1:length(vec.n)){
      for (L in vec.L){
        cpt = 0
        for (rep in 1:nb.repeat){
          set.seed(rep)
            if (basis == "Splines"){ data = dgp(sd=2*1, alpha.mu = NULL, basis = basis, f.true = NULL)
            } else { data = dgp(sd=1, alpha.mu = NULL, basis = basis, f.true = NULL)}
          conf.band.L.star.2 = CB2(data, basis, L, Lmax = 25, alpha = sqrt(0.05), beta=sqrt(0.05))
          f.true = stats::spline(x = data$time, y = data$f.true, xout = time, method = "natural")$y
          f.hat.up = stats::spline(x = data$time, y = conf.band.L.star.2$f.hat.up, xout = time, method = "natural")$y
          f.hat.low = stats::spline(x = data$time, y = conf.band.L.star.2$f.hat.low, xout = time, method = "natural")$y
          if (sum(f.true< f.hat.up)+sum(f.true> f.hat.low) == 2*n.test){
            cpt = cpt+1} 
          }
        perf[L,ind.n] = cpt/nb.repeat
      }
    }
    perf = perf[vec.L,]
    rownames(perf) = vec.L
    colnames(perf) = vec.n
    save(perf, file = paste0("res_data/Res_Tab2_", basis, ".RData"))
    kbl_obj = perf %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n" = 2))
    table_list[[basis]] <- kbl_obj
    }
} 
load(paste0("res_data/Res_Tab2_Fourier.RData"))
perf.F = perf 
load(paste0("res_data/Res_Tab2_Legendre.RData"))
perf.L = perf 
load(paste0("res_data/Res_Tab2_Splines.RData"))
perf.S = perf 
  

perf.F %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n" = 2))
perf.L %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n" = 2))
perf.S %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n" = 2))

Table 2: Empirical coverage of CB2. The table reports the empirical level of confidence of the proposed confidence band CB2, computed for various values of L (rows) and n (columns), with fixed N=25, and various basis (top: Fourier, middle: Legendre, bttom: Splines). The nominal confidence level is set to 0.05, using parameters alpha=beta=0.2, such that the combined coverage is 1-0.95.

We illustrate the different terms involved in Equation 7 in Figure 4: we plot for a given dataset, the infinity norm of the width of the band \hat d^L(t) + \hat d^{L,L_{\max}}(t) (top), of \hat d^L(t) (middle) and \hat d^{L,L_{\max}}(t) (bottom) functions obtained with the Fourier (left column), Legendre (middle column) and Spline (right column) basis. As already said, \|\hat d^L\|_{\infty} increases with L while \|\hat d^{L,L_{\max}}\|_{\infty} decreases (and is zero when L>L^{\varepsilon}). The width of the band wrt L does not have a U-shape, as expected. It is thus difficult to minimize this criterion and the selection of \hat L is thus not stable. But again, whatever the value of \hat L, the corresponding band is debiased in the collection. We will see in the next sections that its width is smaller than standard approaches. The performance of the selection is also illustrated in the next section.

Hide/Show the code

df = data.frame(Norm = double(),
                type = character(),
                basis = character(),
                dimension = integer()
)
for (basis in c("Legendre", "Fourier", "Splines")){
  if (basis == "Splines"){data = dgp(basis = basis, sd = 2)
  } else {data = dgp(basis = basis)}
  
  vec.L = seq(3,15, by=1)
  if (basis == 'Fourier'){vec.L = seq(3,15, by=2)}
  L.max = 20

  length.band = d.L = d.L.Lmax = NULL
  for (L in vec.L){
    conf.band.L.star.2 = CB2(data, basis, L, Lmax = L.max, alpha = sqrt(0.05), beta = sqrt(0.05))
    length.band = c(length.band, conf.band.L.star.2$width)
    d.L = c(d.L, max(conf.band.L.star.2$d.L))
    d.L.Lmax = c(d.L.Lmax, max(conf.band.L.star.2$d.L.Lmax))
  }
  
  df = rbind(df, data.frame (Norm = c(length.band, d.L, d.L.Lmax),
                             type = rep(c("Band width norm", "d.L norm", "d.L.Lmax norm"), each = length(vec.L)),
                             basis = rep(basis, 3* length(vec.L)),
                             dimension = rep(vec.L, 3)))
}

ggplot(df, aes(x = dimension, y = Norm, fill = type)) +
  geom_line() + 
  facet_grid(type ~ basis, scales = 'free') + xlab("L")

Figure 4: Norm of the confidence band width and associated bounds. For a given dataset, we compute: (top row) the norm of the width of the confidence band; (middle row) the norm of the function dL; and (bottom row) the norm of the function dLLmax, for various dimensions L. Results are shown for the Fourier (left column), Legendre (middle column), and Splines (right column) basis families.

6.4 Method 2: a model selection criterion to take into account the bias

In Figure 5, we illustrate the behavior of the selection criterion introduced in Section 5 on simulated data, with \lambda\in\{0, 0.5, 1, 2\} for the three basis. We can see that \tilde L is overestimated. When considering nested spaces, it ensures that \tilde L tends to be larger than L^* and thus the confidence band is automatically unbiased.

Hide/Show the code

L.max = 20
dfCriterion = data.frame(criterion = double(),
                         dimension = integer(),
                         basis = character())

for (basis in c("Legendre","Fourier", "Splines")){
  vec.L = seq(3,13, by=1)
  if (basis == 'Fourier'){ vec.L = seq(3,13, by=2) }
  length.band = c.L = c.L.Lmax = c()
  if (basis == "Splines"){  data = dgp(basis = basis, L.eps = 15, sd=2)
  } else {data = dgp(basis = basis, L.eps = 15, sd = 1)}
  conf.band.L.max = CB1(data, basis, L = L.max, alpha = 0.05)
  c.Lmax = conf.band.L.max$c.L 
  
  for (L in vec.L){
    conf.band.L = CB1(data, basis, L, alpha = (0.05))
    length.band[L] = conf.band.L$width
    c.L[L] = max(conf.band.L$c.L)
  }
  for (lambda in c(0, 0.5, 1, 2)){
    crit = abs(rep(c.Lmax, length(vec.L)) - c.L[vec.L]) + lambda * vec.L/((dim(data$y)[1]))
    dfCriterion = rbind(dfCriterion, data.frame(criterion = crit, dimension = vec.L, lambdas = lambda,
                                                basis = rep(basis, length(vec.L))))
  }
}

ggplot(dfCriterion, aes(x = dimension, y = criterion, color = lambdas)) + 
  geom_point()+
  facet_grid(~basis)

Figure 5: Behavior of selection criteria as a function of model dimension. For a given simulated dataset, we display the evolution of the selection criteria with respect to the dimension L, for several values of the regularization parameter lambda. Results are shown for the Fourier (left), Legendre (middle), and Splines (right) basis families.

6.5 Comparison of the methods with the state-of-the-art

We evaluate the performance of the two selection criteria introduced in this paper, and compare the two strategies CB_2 and CB_3 with some standard approaches. More precisely, we simulate 1000 repeated datasets. The different confidence bands and the norm of their widths are computed for several L. We apply the selection criteria and plot the distribution of the estimated dimension in Figure 6, for the three basis families, for several model selection criteria: \hat L, \tilde L, cross validation and hard thresholding. The dimension \hat L is almost always larger than the true L^*=7. The fact that it is larger is not a problem because the selected band is unbiased and has the correct level as soon as L^\varepsilon is large. However, the criterion tends to select a band that is (too) smooth. We can see that the selected dimension \tilde{L} is smaller in distribution, and closer to the true value than \hat{L}. In addition, as we then use the confidence band of Section 3, the confidence level is guaranteed to be as expected. The model selected by cross validation is rather good, for all the basis considered. On the other hand, the model selected by hard thresholding is not good, particularly for a non orthonormal basis.

Hide/Show the code

run = FALSE
if (run == TRUE){
  library(ffscb)
  repet = 1000
  length.CBL.CV = length.CBL.L0 = length.CBL.ffscb = length.emp.mean = c(NA)
  dfLengthSelect.CV = data.frame(length = double(), basis = character())
  dfLhatSelect.CV = dfLhatSelect.L0 = data.frame(Lhat = double(), L = integer(), basis = character())
  
  dfBandWidth = data.frame(Band.width = double(),
                           basis = character(),
                           dimension = integer())
  dfLhat = data.frame( L = integer(), selected = integer(), basis = character(), dimension = character())
  ind.basis = 0
  cpt = matrix(0, nrow = 8, ncol = 3)
  for (basis in c("Fourier", "Legendre", "Splines")){
    ind.basis = ind.basis + 1
    vec.L = seq(3,13, by=1)
    mod.sel = mod.sel.L0 = mod.sel.CV = coeff.L0 = err = conf.band.ffscb = conf.band.emp.mean = c()
    if (basis == 'Fourier'){ vec.L = seq(3,13, by=2)}
    mod.sel = matrix(NA, nrow = repet, ncol = 5)
    L.mod.sel = c.L = c()
    Lmax = 25
    ind = c()
    for (rep in 1:repet){
      set.seed(rep)
      if (basis == "Splines"){data = dgp(basis = basis, sd = 2)
      } else  {data = dgp(basis = basis)}
      length.band = NULL
      data.train = data.test = data
      data.train$y = data$y[, 1:(dim(data$y)[2]/2)]
      data.test$y = data$y[,(dim(data$y)[2]/2+1) : (dim(data$y)[2])]
      tot.VC <- rep(1:10, each = dim(data.train$y)[2] / 10)
      err <- sapply(vec.L, function(L) {
        mean(sapply(1:10, function(VC) {
          train <- data.train
          train$y <- train$y[,-which(tot.VC == VC)]
          test <- data.train
          test$y <- test$y[, which(tot.VC == VC)]
          est.L <- estimator(train, basis, L)
          sqrt(mean((test$y - est.L$f.hat)^2))
        }))
      })
      for (L in vec.L){
        CB2.L = CB2(data, basis = basis, L, Lmax = Lmax, alpha = sqrt(0.05), beta = sqrt(0.05))
        length.band = c(length.band, CB2.L$width)
        conf.band.L = CB1(data, basis, L, alpha = (0.05))
        c.L[L] = max(conf.band.L$c.L)
        est.L = estimator(data.train, basis, L)
        coeff.L0[L] = sum(abs(est.L$mu.hat) > 0.1)
      }
      mod.sel[rep,1:2] = c(min(length.band), vec.L[which.min(length.band)])
      CB2.Lhat = CB2(data, basis = basis, mod.sel[rep,2], Lmax = Lmax, alpha = sqrt(0.05), beta = sqrt(0.05))
      
      conf.band.L.max = CB1(data, basis, L = Lmax, alpha = 0.05)
      c.Lmax = conf.band.L.max$c.L 
      mod.sel[rep,3] = conf.band.L.max$width
      conf.band.L.star = CB1(data, basis, L = 7, alpha = 0.05)
      mod.sel[rep,4] = conf.band.L.star$width
      
      lambda = 1
      crit = abs(rep(c.Lmax, length(vec.L)) - c.L[vec.L]) + lambda * vec.L/((dim(data$y)[1]))
      L.mod.sel[rep] = vec.L[which.min(crit)]
      CB3 = CB1(data, basis, L = L.mod.sel[rep], alpha = 0.05)
      mod.sel[rep,5] = CB3$width
      
      mod.sel.CV[rep] = vec.L[which.min(err)]
      conf.band.CV = CB1(data.test, basis, mod.sel.CV[rep], alpha = (0.05))
      length.CBL.CV[rep] = conf.band.CV$width
      mod.sel.L0[rep] = max(coeff.L0, na.rm=TRUE)
      conf.band.L0 = CB1(data.test, basis, mod.sel.L0[rep], alpha = (0.05))
      length.CBL.L0[rep] = conf.band.L0$width
      N = dim(data$y)[2]
      mod.ffscb.test <- confidence_band(x = rowMeans(data$y), 
                         cov.x      = crossprod(t(data$y - rowMeans(data$y))) / (N^2), 
                         tau        = tau_fun(data$y), 
                         df         = N-1,
                         type       = c("FFSCB.t"),
                         conf.level = 0.95, 
                         n_int      = 3) 
      conf.band.ffscb$f.hat.up = mod.ffscb.test[,2]
      conf.band.ffscb$f.hat.low = mod.ffscb.test[,3]
      length.CBL.ffscb[rep] = max(abs(conf.band.ffscb$f.hat.up - conf.band.ffscb$f.hat.low))
      
      conf.band.emp.mean = CB.empmean(data)
      length.emp.mean[rep] = max(abs(conf.band.emp.mean$f.hat.up - conf.band.emp.mean$f.hat.low))
 
      ## coverage
      n.test = 5000
      time = seq(min(data$time), max(data$time), length.out = n.test)
      f.true = stats::spline(x = data$time, y = data$f.true, xout = time, method = "natural")$y
      models = list(conf.band.L.star, conf.band.L.max,  CB2.Lhat, CB3, conf.band.L0, conf.band.CV, conf.band.emp.mean, conf.band.ffscb)
      for (mod in 1:8){
        f.hat.up = stats::spline(x = data$time, y = models[[mod]]$f.hat.up, xout = time, method = "natural")$y
        f.hat.low = stats::spline(x = data$time, y = models[[mod]]$f.hat.low, xout = time, method = "natural")$y
        if (sum(f.true< f.hat.up)+sum(f.true> f.hat.low) == 2*n.test){cpt[mod,ind.basis] = cpt[mod,ind.basis] +1
        }
      }
    }

     dfBandWidth = rbind(dfBandWidth, data.frame(Band.width = c(mod.sel[ ,1], mod.sel[ ,3], mod.sel[ ,4],   mod.sel[ ,5],  length.CBL.CV,length.CBL.L0,  length.CBL.ffscb,length.emp.mean),basis = rep(basis, repet*8), dimension = rep(c("Lhat","Lmax", "Lstar", "Ltilde", "L.CV", "L.L0", "FFSCB", "Mean"), each = repet))) 

      dfLhat = rbind(dfLhat, data.frame(
      L = rep(vec.L, 4),
      selected = c(matrix(table(factor(mod.sel[,2], levels = vec.L))), matrix(table(factor(L.mod.sel, levels = vec.L))), matrix(table(factor(mod.sel.CV, levels = vec.L))), matrix(table(factor(mod.sel.L0, levels = vec.L)))),
      basis = rep(basis, 4*length(vec.L)),
      dimension = rep(c("Lhat", "Ltilde", "L.CV", "L.L0"), each = length(vec.L))))

   
 }
  cpt = cpt/repet
  
  save(cpt, file = "res_data/Res_Tab4.RData")
  save(dfBandWidth, file = "res_data/Res_Tab3_BandWidth.RData")
  save(dfLhat, file = "res_data/Res_Tab3_Lhat.RData")
} else {
  load('res_data/Res_Tab4.RData')}

row.names(cpt) = c("Lstar", "Lmax", "Lhat","Ltilde", "L.L0", "L.CV", "Mean", "FFSCB")


cpt %>%
  kable("latex", booktabs = TRUE) %>%
  add_header_above(c("model" = 1, "Fourier" = 1, "Legendre" = 1, "Splines" = 1))

Table 3: Comparison with the state-of-the-art for the empirical confidence level. The table reports the performance of the proposed confidence band CB3, estimated over 1000 repetitions, for various model selection criteria and competitive methods (rows) and for various basis families (columns). Results are shown for alpha=0.05, with n=40 and N=25.

Hide/Show the code

load('res_data/Res_Tab3_Lhat.RData')

dfLhat$dimension = factor(dfLhat$dimension, levels = c('Lhat', 'Ltilde', 'L.CV', 'L.L0'))

ggplot(data = dfLhat, aes(x = L, y = selected))+
  geom_bar(stat="identity")+
  facet_grid(dimension~basis)

Figure 6: Distribution of the selected model dimension across selection methods. Based on 1000 simulated datasets, we report the distribution of the estimated dimension L for four model selection methods: from top to bottom — the debiased confidence band CB2 with Lhat, the band CB3 with Ltilde , cross-validation, and hard-thresholding. The true model dimension is Lstar = 7.

The reformulation of the band around \underline{\hat f}_2^{ L_{\max}, L^*} is close to the band presented in Section 3 for L=L_{\max}, that is a band centered around \underline{\hat f}^{ L_{\max}, L^*}. A natural question is to understand what is the gain by doing so instead of using the band from Section 3 with L=L_{\max}, namely the band \left [\underline{\hat f}^{ L_{\max}, L^*}(t)-\hat d^{L_{\max}}(t); \underline{\hat f}^{ L_{\max}, L^*}(t)+\hat d^{L_{\max}}(t)\right]. To do that, we have to understand the behavior of the different terms. As it is difficult to compare theoretically the width of the two bands, we compare them using simulations. For 1000 repeated datasets, we compute several confidence bands: the CB_1 band constructed in Section 3 with L_{\max}, the CB_{2} band defined in Proposition 3 with \hat L, the CB_{3} band defined in Section 5 with \tilde L and the ideal (and not accessible) band constructed in Section 3 with the true L^*. In Figure 7, we present boxplots of the norms of the band width with \hat L, L_{\max}, L^* and \tilde{L}. We also use the empirical mean and the method FFSCB Liebl and Reimherr (2023). The width of the confidence band with the true L^* is smaller, which is expected but unfortunately not achievable. The width of the confidence band CB_1 is smaller than that of the band CB_2. This can be explained by the fact that we estimate two different quantities, on smaller datasets, for more conservative levels (1-\alpha and 1-\beta respectively) in order to finally achieve the confidence level of 1-\alpha\beta. This also explains why the cross validation and hard-thresholding methods, which also divide the sample into two parts, do not give good results either. The model given by the heuristic model selection criterion \widetilde{crit} achieves good performance. Note that the width of the selected model \tilde L is better than the width of the confidence band with a large level L_{\max}, which one should have used to avoid model selection. The empirical mean and the band given by FFSCB are a bit larger.

Hide/Show the code

load('res_data/Res_Tab3_BandWidth.RData')

dfBandWidth$dimension <- factor(dfBandWidth$dimension, levels = c('Lstar','Lmax','Lhat','Ltilde','L.CV','L.L0', 'FFSCB', 'Mean'))

ggplot(dfBandWidth, aes(x = dimension, y = Band.width)) + 
  geom_boxplot() + 
  facet_wrap(~basis, scales = "free_y")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Figure 7: Width of the confidence bands accross selection methods. Boxplots show the distribution of the confidence band width over 1000 repetitions, for the dimension selected by the various criteria introduced in this paper (as well as standard baselines). Results are shown for the Fourier (left), Legendre (middle), and Splines (right) basis families.

6.6 Generalization out of the model

We now illustrate the behaviour of the bands when the basis used for estimation is poorly specified. We simulate 1000 data sets with a spline basis and estimate the confidence bands with the Fourier and Legendre basis, for different values of n and N. The coverage rates are presented in Table 4. The Fourier basis does not give a correct rate. On the other hand, the Legendre basis gives very satisfactory coverage rates for L>11.

Hide/Show the code

library(knitr)
library(kableExtra)
run = FALSE
vec.L = c(5,7, 11,15)
if (run == TRUE){
  nb.repeat = 1000
  n.test = 5000
  vec.n = c(40, 150) 
  vec.N = c(10, 25)
  basis = "Splines"
  basis.2 = c('Fourier', 'Legendre')
  time = seq(min(data$time), max(data$time), length.out = n.test)
  perf = array(NA, dim = c(max(vec.L),2,2,2))
  for (ind.n in 1:length(vec.n)){
    for (L in vec.L){
      for (ind.N in 1:length(vec.N)){
        for (ind.basis in 1:2){
          cpt = 0
          for (rep in 1:nb.repeat){
            set.seed(rep)
            if (basis == "Splines"){
              data = dgp(n=vec.n[ind.n], N = vec.N[ind.N], basis = basis, sd = 2)
            } else {data = dgp(n=vec.n[ind.n], N = vec.N[ind.N], basis = basis)}
            
            conf.band = CB1(data, basis.2[ind.basis], L)
            if (basis == 'Fourier'){
              if (L%%2 == 0){ L = L+1}
            }
            B.old = func.basis(data$time, L = 7, basis)
            B = func.basis(time, L = 7, basis)
            
            f.true = t(B %*% solve(t(B.old) %*% B.old) %*% t(B.old) %*% data$f.true)
            f.hat.up = stats::spline(x = data$time, y = conf.band$f.hat.up, xout = time, method = "natural")$y
            f.hat.low = stats::spline(x = data$time, y = conf.band$f.hat.low, xout = time, method = "natural")$y
            if (sum(f.true< f.hat.up)+sum(f.true> f.hat.low) == 2*n.test){
              cpt = cpt+1
            } 
          }
          perf[L,ind.n, ind.N, ind.basis] = cpt/nb.repeat
        }
      }
    }
    
  }
  save(perf, file = "res_data/Res_Tab_other.RData")
} else {load("res_data/Res_Tab_other.RData") } ### charger les données
perf.Fourier = cbind(perf[vec.L,,1,1],perf[vec.L,,2,1])
rownames(perf.Fourier) = vec.L
colnames(perf.Fourier) = c(" 50/10", "150/10", "50/40", "150/40")

perf.Fourier %>%
  kable("latex", booktabs = TRUE) %>%
  add_header_above(c("L" = 1, "n/N" = 4))
pref.Legendre = cbind(perf[vec.L,,1,2],perf[vec.L,,2,2])
rownames(pref.Legendre) = vec.L
colnames(pref.Legendre) = c(" 50/10", "150/10", "50/40", "150/40")

pref.Legendre %>%
  kable("latex", booktabs = TRUE) %>%
  add_header_above(c("L" = 1, "n/N" = 4))

Table 4: Empirical coverage of confidence bands under model misspecification. The empirical confidence level is estimated over 1000 repetitions. Data are generated using a Splines basis, while confidence bands are computed using the Fourier (top) and Legendre (bottom) basis families. Each row corresponds to a different value of L, and each column to a different pair of sample sizes (n,N).

Next, we illustrate the \tilde L dimension selection method and compare it to the cross-validation method. Table 5 presents the coverage rates of the corresponding confidence bands estimated with the Fourier and Legendre basis, in the case N=25 and n\in \{40,150\}. Once again, we see that the Fourier basis does not give good results, either by cross-validation or by \tilde L. On the other hand, with the Legendre basis, the \tilde L method gives a satisfactory coverage rate, even if it is underestimated, whereas the cross-validation method is very poor. Moreover, the widths of the confidence bands selected with \tilde L and by cross validation are represented by boxplot in Figure 8. It can be seen that the cross-validation approach provides wider bands, even though their confidence level is not guaranteed. The method proposed in this paper provides a narrower band with a correct level of confidence. We thus recommend to use the Legendre family with the criteria \tilde L.

Hide/Show the code

library(knitr)
library(kableExtra)
run = FALSE
vec.L = c(5,7, 11,15)
vec.n = c(40, 150) 
if (run == TRUE){
  nb.repeat = 1000
  n.test = 5000
  vec.N = c(25)
  basis = "Splines"
  basis.2 = c('Fourier', 'Legendre')
  time = seq(min(data$time), max(data$time), length.out = n.test)
  perf.sel = array(0, dim = c(2,2,1,2))
  err = coeff.L0 = c()
  L.mod.sel = mod.sel.CV = length.CBL.CV = c()
  mod.sel = matrix(NA, nrow = nb.repeat, ncol = 2)
  width = array(NA, dim=c(2, nb.repeat))
  dfBandWidth = data.frame(Band.width = double(),
                           basis = character(),
                           dimension = integer())
  for (ind.n in 1:length(vec.n)){
    for (ind.N in 1:length(vec.N)){
      for (ind.basis in 1:2){
        cpt = 0
        for (rep in 1:nb.repeat){
          set.seed(rep)
          if (basis == "Splines"){
            data = dgp(n=vec.n[ind.n], N = vec.N[ind.N], basis = basis, sd = 2)
          } else {data = dgp(n=vec.n[ind.n], N = vec.N[ind.N], basis = basis)}
          data.train = data
          data.train$y = data$y[, 1:(dim(data$y)[2]/2)]
          data.test = data
          data.test$y = data$y[,(dim(data$y)[2]/2) : (dim(data$y)[2])]
          tot.VC <- rep(1:10, each = dim(data.train$y)[2] / 10)
          err <- sapply(vec.L, function(L) {
            mean(sapply(1:10, function(VC) {
              train <- data.train
              train$y <- train$y[,-which(tot.VC == VC)]
              test <- data.train
              test$y <- test$y[, which(tot.VC == VC)]
              est.L <- estimator(train, basis, L)
              sqrt(mean((test$y - est.L$f.hat)^2))
            }))
          })
          for (L in vec.L){
            CB2.L = CB2(data, basis = basis, L, Lmax = 25, alpha = sqrt(0.05), beta = sqrt(0.05))
            length.band = c(length.band, CB2.L$width)
            conf.band.L = CB1(data, basis, L, alpha = (0.05))
            c.L[L] = max(conf.band.L$c.L)
          }
          
          mod.sel[rep,1:2] = c(min(length.band), vec.L[which.min(length.band)])
          lambda = 1
          crit = abs(rep(c.Lmax, length(vec.L)) - c.L[vec.L]) + lambda * vec.L/((dim(data$y)[1]))
          L.mod.sel[rep] = vec.L[which.min(crit)]
          CB3 = CB1(data, basis = basis.2[ind.basis], L = L.mod.sel[rep], alpha = 0.05)
          
          mod.sel.CV[rep] = vec.L[which.min(err)]
          conf.band.CV = CB1(data.test, basis = basis.2[ind.basis], mod.sel.CV[rep], alpha = (0.05))
          
          
          ## coverage
          n.test = 5000
          time = seq(min(data$time), max(data$time), length.out = n.test)
          B.old = func.basis(data$time, L = 7, basis)
          B = func.basis(time, L = 7, basis)
          
          f.true = t(B %*% solve(t(B.old) %*% B.old) %*% t(B.old) %*% data$f.true)
          models = list(CB3, conf.band.CV)
          
          
          width[1,rep] = CB3$width
          width[2,rep] = conf.band.CV$width
          
          for (mod in 1:2){
            f.hat.up = stats::spline(x = data$time, y = models[[mod]]$f.hat.up, xout = time, method = "natural")$y
            f.hat.low = stats::spline(x = data$time, y = models[[mod]]$f.hat.low, xout = time, method = "natural")$y
            if (sum(f.true< f.hat.up)+sum(f.true> f.hat.low) == 2*n.test){perf.sel[mod,ind.n, ind.N, ind.basis] = perf.sel[mod,ind.n, ind.N, ind.basis] +1}
          }
        }
        dfBandWidth = rbind(dfBandWidth, data.frame(Band.width = c(width[1,], width[2,]),
                                                basis = rep(basis.2[ind.basis], nb.repeat*2), dimension = rep(c("L tilde", "L.CV"), 
                                                                                             each = nb.repeat))) 
      }
    }
    
  }
  perf.sel = perf.sel / nb.repeat  
  
  save(perf.sel, file = "res_data/Res_Tab_other_2.RData")
  save(dfBandWidth, file = "res_data/Res_Tab_other_2_width.RData")
} else {load("res_data/Res_Tab_other_2.RData") } ### charger les données
perf.Fourier = matrix(perf.sel[,,1,1], nrow=2, ncol=2)
rownames(perf.Fourier) = c('L tilde', 'L.CV')

perf.Fourier %>%
  kable("latex", booktabs = TRUE) %>%
  add_header_above(c("L" = 1, "n=40, N=25" = 1, "n=150, N=25" = 1))
pref.Splines = matrix(perf.sel[,,1,2], nrow=2, ncol=2)
rownames(pref.Splines) = c('L tilde', 'L.CV')

pref.Splines %>%
  kable("latex", booktabs = TRUE) %>%
  add_header_above(c("L" = 1, "n=40, N=25" = 1, "n=150, N=25" = 1))

Table 5: Empirical coverage of confidence bands across selection methods under model misspecification. The empirical confidence level is estimated over 1000 repetitions. Data are generated using a Splines basis, while confidence bands are computed using the Fourier (top) and Legendre (bottom) basis families. The rows correspond to different model selection criteria and dimensions L, and the columns to various combinations of sample sizes (n,N).

Hide/Show the code

load("res_data/Res_Tab_other_2_width.RData")
ggplot(dfBandWidth, aes(x = dimension, y = Band.width)) + 
  geom_boxplot() + 
  facet_wrap(~basis, scales = "free_y")

Figure 8: Width of confidence bands with model selection under model misspecification. The average width of the confidence bands is evaluated over 1000 repetitions. Data are generated using a Splines basis, while bands are computed with the Fourier (top) and Legendre (bottom) basis families. Rows correspond to combinations of model selection criteria and dimensions L, and columns to different sample size pairs (n,N).

7 Real data analysis

In this section, we illustrate the proposed method on the Berkeley Growth Study data. It consists of the heights in centimeters of 39 boys at 31 ages from 1 to 18. We approximate these curves by the 3 basis Legendre, Splines and Fourier. We select the level of each basis using the method introduced in Section 5.

Hide/Show the code

data(growth)
data = list()
data$time = (growth$age - min(growth$age))/(max(growth$age) - min(growth$age))
data$y = growth$hgtm
L.max = 25
vec.L = 3:24
alpha = 0.2
dfBandlow = dfBandup = df.max = res_growth = c()
for (basis in c("Legendre", "Splines", "Fourier")){
  mod.sel= c()
  if (basis == 'Fourier'){vec.L = seq(3,24, by=2)}
  c.L = c.L.Lmax = c()
  conf.band.L.max = CB1(data, basis, L = L.max, alpha = alpha)
  c.L.Lmax = max(conf.band.L.max$c.L)
  df.max = rbind(df.max, data.frame(Time = growth$age, 
                                    band.up = t(conf.band.L.max$f.hat.up),
                                    band.low = t(conf.band.L.max$f.hat.low),
                                    basis = rep(basis, length(data$time))))
  for (L in vec.L){
    conf.band.L = CB1(data, basis, L, alpha = (alpha))
    c.L[L] = max(conf.band.L$c.L)
  }
  lambda = 1
  crit = abs(rep(c.Lmax, length(vec.L)) - c.L[vec.L]) + lambda * vec.L/((dim(data$y)[1]))
  mod.sel = vec.L[which.min(crit)]
  #print(paste0('model selected:', mod.sel))
  conf.band.L.star.2 = CB1(data, basis, mod.sel, alpha = alpha)
  
  dfBandup = rbind(dfBandup, data.frame(Time = growth$age,
                                        bandup = t(conf.band.L.star.2$f.hat.up),
                                        basis = rep(basis, length(data$time))))
  dfBandlow = rbind(dfBandlow, data.frame(Time = growth$age,
                                          bandlow = t(conf.band.L.star.2$f.hat.low),
                                          basis = rep(basis, length(data$time))))
  #print(paste0('length L sel:', conf.band.L.star.2$c.L))
  
  res_growth = round(cbind(res_growth, c(conf.band.L.max$c.L, conf.band.L.star.2$c.L, mod.sel)),2)
}
colnames(res_growth) = c("Legendre", "Splines", "Fourier")
row.names(res_growth) = c("Width Lmax", "Width selected", "Model selected")
df.data = data.frame(Time = rep(growth$age,dim(data$y)[2]), f = c(data$y), ind = rep(1:(dim(data$y)[2]), each = dim(data$y)[1]))
p1 <- ggplot()+ geom_line(data=df.data, aes(x=Time, y =f, group=ind), color = "lightgrey",linewidth= 0.5)+
  geom_line(data = dfBandup, aes(x = Time, y = bandup,col = basis, group = basis), linewidth = 0.4) +
  geom_line(data = dfBandlow, aes(x = Time, y = bandlow,col = basis, group = basis), linewidth = 0.4) + 
  geom_line(data = df.max, aes(x=Time, y = band.up, group=basis), linewidth = 0.4) + 
  geom_line(data = df.max, aes(x=Time, y = band.low, group=basis), linewidth = 0.4) +
  facet_grid(~ basis)

p1

Figure 9: Real data analysis example. We display the confidence bands for Fourier (left), Legendre (middle) and Splines (right) basis on the Berkeley Growth Study data. Black curves correspond to the confidence bands with L_{max}, while colored one are the confidence bands with L tilde.

Hide/Show the code

res_growth %>% 
  kable("latex", booktabs = TRUE) %>%
  add_header_above(c(" " = 1, "Basis" = 3))

Table 6: Real data analysis example, Berkeley Growth Study data. We display the width of the confidence bands for Fourier, Legendre and Splines basis for the confidence band of Section 3 with Lmax and the confidence band of Section 5. We also precise the dimension of the selected model.

In Figure 9, we display the confidence bands associated with Section 3 in black and those associated with Section 5, for the three basis. As the data is not periodic, the Fourier basis is meaningless, as is the associated confidence band, whatever the dimension considered. Splines and Legendre basis give similar confidence bands. Analyzing the width of the bands in Table 6, compared to that obtained with L_{\max}, we find that they are less smooth but also smaller, and from our empirical study we guess that it makes a trade-off between bias and variance.

8 Conclusion

This paper discusses the construction of confidence bands when considering a functional model. Depending on the nature of the family (an orthogonal or orthonormal basis, or simply a vector space), the theoretical guarantees of the linear estimator are recalled and illustrated. Several confidence bands are then proposed. An extensive experimental study on Fourier, Legendre, and Spline basis illustrates the theoretical and methodological propositions, and a real data study is proposed to conclude the paper.

First, when considering a functional family with fixed dimension, we discuss the confidence band derived from Sun and Loader (1994). It is biased if the dimension is not high enough to approximate well the true function. We then propose a new confidence band that corrects this bias. To do this, the bias is estimated and the additional randomness is controlled. A selection criterion is proposed to select the best dimension. Unfortunately, the two types of randomness lead to a wider confidence band, and this result is therefore no more interesting than the naive one, which consists of taking the largest possible dimension L_{\max}. Finally, a heuristic selection criterion is proposed to select the dimension on the first confidence band, which has not corrected the bias. It takes into account the bias as well as the variance, to select a moderate dimension. Numerical experiments show that this criterion, combined with the Legendre basis, achieves the best performance when considering the confidence level and the width of the corresponding simultaneous confidence band.

An interesting next step, but out of the scope of this paper, is a theoretical study of this criterion. No result, to our knowledge, exists for confidence bands with the supremum norm. The Euclidean norm has been extensively studied, but is not of interest here, where we want to ensure that the tube is valid as a whole. The supremum norm, on the other hand, is difficult to study theoretically. Furthermore, a key point here is the randomness of the criterion, which must also be taken into account, through an oracle inequality for example.

References

Aneiros, Germán, Silvia Novo, and Philippe Vieu. 2022. “Variable Selection in Functional Regression Models: A Review.” Journal of Multivariate Analysis 188: 104871.

Basna, Rani, Hiba Nassar, and Krzysztof Podgórski. 2022. “Data Driven Orthogonal Basis Selection for Functional Data Analysis.” Journal of Multivariate Analysis 189: 104868.

Bunea, Florentina, Andrada E. Ivanescu, and Marten H. Wegkamp. 2011. “Adaptive Inference for the Mean of a Gaussian Process in Functional Data.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (4): 531–58.

Chen, Ming, and Qiongxia Song. 2015. “Simultaneous inference of the mean of functional time series.” Electronic Journal of Statistics 9 (2): 1779–98. https://doi.org/10.1214/15-EJS1052.

Claeskens, G., and I. Van Keilegom. 2003. “Bootstrap Confidence Bands for Regression Curves and Their Derivatives.” Ann. Stat. 31: 1852–1884".

Diquigiovanni, Jacopo, Matteo Fontana, and Simone Vantini. 2022. “Conformal Prediction Bands for Multivariate Functional Data.” Journal of Multivariate Analysis 189: 104879.

Goepp, V., O. Bouaziz, and G. Nuel. 2025. “Spline Regression with Automatic Knot Selection.” Computational Statistics and Data Analysis 202.

Hall, P. 1991. “On Convergence Rates of Suprema.” Probab Theory Related Fields.

Hernández, Nicolás, Jairo Cugliari, and Julien Jacques. 2024. “Simultaneous Predictive Bands for Functional Time Series Using Minimum Entropy Sets.” Communications in Statistics - Simulation and Computation, 1–25.

Jacques, Julien, and Sanja Samardžić. 2022. “Analysing Cycling Sensors Data Through Ordinal Logistic Regression with Functional Covariates.” Journal of the Royal Statistical Society Series C: Applied Statistics 71 (4): 969–86.

Kokoszka, P., and M. Reimherr. 2017. Introduction to Functional Data Analysis. Chapman & Hall / CRC Numerical Analysis and Scientific Computing. CRC Press.

Krivobokova, Tatyana, Thomas Kneib, and Gerda Claeskens. 2010. “Simultaneous Confidence Bands for Penalized Spline Estimators.” Journal of the American Statistical Association 105 (490): 852–63.

Lacour, C., P. Massart, and V. Rivoirard. 2017. “Estimator Selection: A New Method with Applications to Kernel Density Estimation.” Sankhya A 79: 298–335.

Li, Yehua, Yumou Qiu, and Yuhang Xu. 2022. “From Multivariate to Functional Data Analysis: Fundamentals, Recent Developments, and Emerging Areas.” Journal of Multivariate Analysis 188: 104806.

Liebl, D, and M. Reimherr. 2023. “Fast and Fair Simultaneous Confidence Bands for Functional Parameters.” Journal of the Royal Statistical Society Series B: Statistical Methodology 85: 842–68.

Neumann, M., and J. Polzehl. 1998. “Simultaneous Bootstrap Confidence Bands in Nonparametric Regression.” J Nonparametr Statist 9: 307–33.

Quinton, J-C., E. Devijver, A. Leclercq-Samson, and A. Smeding. 2017. “Functional Mixed Effect Models for Mouse-Tracking Data in Social Cognition.” In ESCON Transfer of Knowledge Conference, Gdansk, Polland.

Ramsay, James. 2005. “Functional Data Analysis.” In Encyclopedia of Statistics in Behavioral Science. John Wiley & Sons, Ltd. https://doi.org/https://doi.org/10.1002/0470013192.bsa239.

Sachs, Michael C., Adam Brand, and Erin E. Gabriel. 2022. “Confidence Bands in Survival Analysis.” The British Journal of Cancer. Supplement 127: 1636–41.

Sun, Jiayang, and Clive R. Loader. 1994. “Simultaneous Confidence Bands for Linear Regression and Smoothing.” Ann. Statist. 22 (3): 1328–45.

Telschow, Fabian J. E., Dan Cheng, Pratyush Pranav, and Armin Schwartzman. 2023. “Estimation of expected Euler characteristic curves of nonstationary smooth random fields.” The Annals of Statistics 51 (5): 2272–97.

Telschow, Fabian J. E., and Armin Schwartzman. 2022. “Simultaneous Confidence Bands for Functional Data Using the Gaussian Kinematic Formula.” Journal of Statistical Planning and Inference 216: 70–94.

Wang, L. And Yang, J. And Gu. 2022. “Oracle-Efficient Estimation for Functional Data Error Distribution with Simultaneous Confidence Band.” Computational Statistics & Data Analysis 167: 107363.

Xia, Y. 1998. “Bias-Corrected Confidence Bands in Nonparametric Regression.” J.R. Stat. Soc. Ser. B 60: 797–811.

Zhang, C. And Wu, Y. And Wang. 2020. “Prediction of Working Memory Ability Based on EEG by Functional Data Analysis.” J Neur. Meth. 333: 108552.

Zhou, S., X. Shen, and D. A. Wolfe. 1998. “Local Asymptotics for Regression Splines and Confidence Regions.” Ann. Statist. 26: 1760–82.

9 Appendix: proofs

9.1 Proof of Proposition 1

Let us prove the first point. We have \mathbb{E}(\hat{\underline{\mu}}^{L}) = (\mathbf{B}_L^T \mathbf{B}_L)^{-1} \mathbf{B}_L^T \mathbb{E}(\mathbf{y}) =(\mathbf{B}_L^T \mathbf{B}_L)^{-1} \mathbf{B}_L^T \mathbf{B}_{L^*}\mu^{L^*}=:\underline{\mu}^{L}.

The theory of the linear model gives that the variance of \hat{\underline{\mu}}^L is equal to \sigma^2 (\mathbf{B}^T\mathbf{B})^{-1} \mathbf{B}^T\Sigma \mathbf{B}(\mathbf{B}^T\mathbf{B})^{-1} with \Sigma=Diag(\Sigma_1, \ldots, \Sigma_N) the nN \times nN covariance matrix of \mathbf{y}. So finally, we have

\hat{\underline{\mu}}^{L} \sim \mathcal{N}\left(\underline{\mu}^{L}, \sigma^2\Sigma_{B}^{L, L^ \varepsilon} \right).

Now we can easily deduce the distribution of \hat{\underline{f}}^{L}(t), for each t\in [0,1]: \hat{\underline{f}}^{L}(t)- \mathbf f^{L}(t) \sim \mathcal{N}\left(0, \sigma^2 B(t)\Sigma_{B}^{L, L^\varepsilon} B(t)^T\right).

To prove that (\hat{\underline{f}}^{L}-f^{L}) is a Gaussian process, we consider any finite sequence of times (t_1, \ldots, t_d)\in [0,1]. The sequence (\hat{\underline{f}}^{L}(t_1)-f^{L}(t_1), \ldots, \hat{\underline{f}}^{L}(t_d)-f^{L}(t_d)) is Gaussian, centered, and the covariance is equal to cov(\hat{\underline{f}}^{L}(t_1)-f^{L}(t_1), \hat{\underline{f}}^{L}(t_2)-f^{L}(t_2)) = \sigma^2 B(t_1)\Sigma_{B}^{L, L^\varepsilon} B(t_2)^T. Thus, the process is Gaussian.

9.2 Proof of Theorem 2

For all t, for all \omega \in \Omega, \lim_{n\rightarrow +\infty} \underline{f}_n^{L}(t) - f^{L}(t) =0 which means for all \varepsilon >0, there exists N_0 such that for all n>N_0, |\underline{f}^{L}(t) - f^{L}(t)| \leq \varepsilon. Then, we have, with probability 1-\alpha, |\hat{\underline{f}}^{L}(t)-\underline{f}^{L}(t)|+ |\underline{f}^{L}(t)- f^{L}(t)| \leq \hat d^L(t) +\varepsilon with \hat d^L(t) = \hat c^L \sqrt{\hat C_L(t,t)/N} and \hat c^L defined as the solution of Equation 4.

Then, with probability larger than 1-\alpha, |\hat{\underline{f}}^{L}(t)-\underline{f}^{L}(t) + \underline{f}^{L}(t)- f^{L}(t)| \leq \hat d^L(t) +\varepsilon

9.3 Proof of Proposition 3

To simplify the notations, let us denote a(t) = \underline{f}^{L}(t) - \underline{\hat{f}}_1^{L}(t) and b(t) = \underline{f}^{L_{\max},L^*}(t)- \underline{f}^{L}(t)-(\underline{\hat f}_2^{L_{\max}, L^*}(t)-\underline{\hat f}_2^{L}(t)). We have \begin{align*} P\left( \exists t |a(t)+b(t)|\geq \hat d_1^L(t) + \hat d_2^{L,L_{\max}}(t)\right) &\leq P\left( \exists t |a(t)|+|b(t)|\geq \hat d_1^L(t) + \hat d_2^{L,L_{\max}}(t)\right)\\ &= P\left( \exists t |a(t)| \geq \hat d_1^L(t) \right)P\left( \exists t |b(t)|\geq \hat d_2^{L,L_{\max}}(t)\right) = \alpha\beta. \end{align*}

The last equality holds thanks to the independence of the two sub-samples.

10 Appendix: more experiments

The properties of the coefficients are illustrated in Figure 10. The true dimension is L^*=11. Three families are considered, Fourier, Legendre and Splines. The plots display the absolute difference between the coefficients \mu_\ell^{L^*} and the projected coefficients \underline{\mu}^{L}, for different \ell in x-axis and for different values of L and n in the y-axis, namely a case with L<L^* and two values of n: L=7, n=20 and L=7, n=100; and a case with L>L^* and two values of n: L=15, n=20 and L=15, n=100. The absolute difference is represented as a gradient of color, this gradient being adapted to each functional family. We can see that as Legendre (resp. Fourier) are orthonormal (resp. orthogonal) families, the differences are close to 0 when L=15, whatever the values of n. When L<L^*, the difference is close to 0 when n is large. This property does not hold for the spline family, which is not orthogonal.

Hide/Show the code

set.seed(1)
library(RColorBrewer)
cols = brewer.pal(9, "BuPu")
pal <- colorRampPalette(cols)
L.star = 10
Time = seq(data$time[1], data$time[length(data$time)], length.out = 100)

L.star = 11
vec.L = c(7,15)
par(mfrow=c(3,1))
for (basis in c("Fourier", "Legendre", "Splines")){
  if (basis == "Splines"){
    data = dgp(n=20,N=40,sd=2*1,L.star=L.star, L.eps = 20, alpha.mu = NULL, basis = basis, f.true = NULL)
  } else { data = dgp(n=20,N=40,sd=1,L.star=L.star, L.eps = 20, alpha.mu = NULL, basis = basis, f.true = NULL)
  }
  alpha.tot = c()
  for (L in vec.L){
    set.seed(1)
    B = func.basis(data$time, L, basis)
    B.n.large = func.basis(Time, L, basis)
    B.Lstar.n.large = func.basis(Time, L.star, basis)
    alpha.true.L = solve(t(B) %*% B) %*% t(B) %*% data$f.true
    alpha.true.L.n.large = solve(t(B.n.large) %*% B.n.large) %*% t(B.n.large) %*% B.Lstar.n.large %*% data$alpha.mu
    alpha.tot = cbind(alpha.tot, alpha.true.L[1:min(c(L.star, vec.L))]-data$alpha.mu[1:min(c(L.star, vec.L))], alpha.true.L.n.large[1:min(c(L.star, vec.L))] - data$alpha.mu[1:min(c(L.star, vec.L))])
  }
  image.plot(abs(alpha.tot), main = paste0(basis, "'s basis"), col = pal(20), yaxt="n", xaxt = "n", xlab = "index of the basis" )
  axis(2, at = seq(0, 1, length = ncol(alpha.tot)), labels = c("L=7, n=20", "L=7, n=100", "L=15, n=20", "L=15, n=100"), las = 2, cex.axis = 0.6)
  axis(1, at = seq(0,1,length.out = 7), labels = c("1", "2", "3", "4", "5", "6", "7"))
}

Figure 10: Illustrative example. The true dimension is 11, we generate the coefficients with three families, Fourier (which is orthogonal), Legendre (which is orthonormal) and the splines (which are not orthogonal wrt the standard scalar product). In the y-axis, two dimensions of the family (7 or 15) and two numbers of timepoints (20 or 100) are compared. We plot in x-axis the value of the absolute difference between the true coefficients and their approximations for the first 7 coefficients of the basis. The color scale is adapted to each functional basis.

Reuse

CC BY 4.0

Citation

BibTeX citation:

@article{devijver2025,
  author = {Devijver, Emilie and Samson, Adeline},
  publisher = {Société Française de Statistique},
  title = {Should We Correct the Bias in {Confidence} {Bands} for
    {Repeated} {Functional} {Data?}},
  journal = {Computo},
  date = {2025-11-02},
  url = {https://computo.sfds.asso.fr/template-computo-quarto},
  doi = {xxxx},
  issn = {2824-7795},
  langid = {en},
  abstract = {While confidence intervals for finite quantities are
    well-established, constructing confidence bands for objects of
    infinite dimension, such as functions, poses challenges. In this
    paper, we explore the concept of parametric confidence bands for
    functional data with an orthonormal basis. Specifically, we revisit
    the method proposed by Sun and Loader, which yields confidence bands
    for the projection of the regression function in a fixed-dimensional
    space. This approach can introduce bias in the confidence bands when
    the dimension of the basis is misspecified. Leveraging this insight,
    we introduce a corrected, unbiased confidence band. Surprisingly,
    our corrected band tends to be wider than what a naive approach
    would suggest. To address this, we propose a model selection
    criterion that allows for data-driven estimation of the basis
    dimension. The bias is then automatically corrected after dimension
    selection. We illustrate these strategies using an extensive
    simulation study. We conclude with an application to real data.}
}

For attribution, please cite this work as:

Devijver, Emilie, and Adeline Samson. 2025. “Should We Correct the Bias in Confidence Bands for Repeated Functional Data?” Computo, November. https://doi.org/xxxx.

--- title: "Should we correct the bias in Confidence Bands for Repeated Functional Data?" author: - name: Emilie Devijver corresponding: true email: emilie.devijver@univ-grenoble-alpes.fr url: https://lig-aptikal.imag.fr/~devijvee/ affiliations: - name: CNRS, Univ. Grenoble Alpes, Grenoble INP, LIG, 38000 Grenoble, France - name: Adeline Samson email: adeline.leclercq-samson@univ-grenoble-alpes.fr url: http://adeline.e-samson.org affiliations: - name: Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France date: last-modified date-modified: last-modified description: | abstract: >+ While confidence intervals for finite quantities are well-established, constructing confidence bands for objects of infinite dimension, such as functions, poses challenges. In this paper, we explore the concept of parametric confidence bands for functional data with an orthonormal basis. Specifically, we revisit the method proposed by Sun and Loader, which yields confidence bands for the projection of the regression function in a fixed-dimensional space. This approach can introduce bias in the confidence bands when the dimension of the basis is misspecified. Leveraging this insight, we introduce a corrected, unbiased confidence band. Surprisingly, our corrected band tends to be wider than what a naive approach would suggest. To address this, we propose a model selection criterion that allows for data-driven estimation of the basis dimension. The bias is then automatically corrected after dimension selection. We illustrate these strategies using an extensive simulation study. We conclude with an application to real data. keywords: [functional data, repeated data, confidence band, Kac-Rice formulae, bias, dimension selection] citation: type: article-journal container-title: "Computo" doi: "xxxx" url: https://computo.sfds.asso.fr/template-computo-quarto publisher: "Société Française de Statistique" issn: "2824-7795" bibliography: references.bib github-user: computorg repo: "template-computo-r" draft: true # set to false once the build is running published: false # will be set to true once accepted format: computo-pdf: default computo-html: default --- # Introduction Functional data analysis is widely used for handling complex data with smooth shapes, finding applications in diverse fields such as neuroscience (e.g., EEG data, @Zhang2020), psychology (e.g., mouse-tracking data, @Quinton-2017-escon), and sensor data from daily-life activities (@jacques2022). It consists in estimating a function, which is an object of infinite dimension. It is important to accompany the function estimate with a measure of uncertainty, for example through a simultaneous confidence band. This task presents several challenges: the confidence band must control the simultaneous functional type-I error rate, as opposed to point-wise rates; it must strike a balance between being sufficiently conservative to maintain a confidence level while not being overly so as to render it meaningless; and the method used to construct this confidence band should be computationally feasible for practical application (@RamsayBook). We consider several independent observations of the same function, i.e., noisy functional data. To analyze this noisy data, a classic approach consists in either averaging pointwise the data, applying a kernel function to smooth the noise or projecting the data onto a functional space defined by a family of functions (@kokoszka2017introduction Chapter 3, @LI2022104806). The pointwise empirical mean is not a function but a vector of estimated discrete points. On the other hand, with the two other approaches, the targeted function is then an approximation of the true function and obtained confidence bands are thus bands of this approximated function. In such context, several methods have been proposed to construct a confidence band that controls the simultaneous functional type-I error rate. In the case of a single individual (no repetition) but with many time points, some methods study the asymptotic distribution of the infinity norm between the targeted approximated function and its estimator, the asymptotics being in the number of time points (@Hall1991, @Claeskens2003). These approaches only work for large datasets in time and are likely to be too conservative otherwise. For small samples, bootstrap methods have been developed to compute the confidence band (@Neumann1998, @Claeskens2003), but with a high computational cost. Another approach consists in constructing confidence bands based on the volume-of-tube formula. @sun1994 studied the tail probabilities of suprema of Gaussian random processes. This approach is based on an unbiased linear estimator of the regression function, which corresponds to a band of the approximated targeted function. @Zhou1998 used the volume-of-tube formula for estimation by regression splines. @Krivobokova2010 applied this method for the construction of confidence bands using penalized spline estimators. @bunea2011 propose a threshold-type estimator and derive error bounds and simultaneous confidence bands. In the case of several observations of the same function, @Liebl2019 proposed a method based on random field theory and the volume-of-tube formula, leveraging the Kac-Rice formula. Their approach introduces a quantile that varies with location, which allows to achieve their fairness property. Their confidence band uses an unbiased estimator. Unlike other methods, it does not require estimating the full covariance function of the estimator, but only its diagonal, which reduces computational time. From a practical viewpoint, @CB_survivalAnalysis2022 introduced a package to popularize simultaneous confidence bands in the context of survival analysis. @TELSCHOW202270 propose a simultaneous confidence band based on the Gaussian kinematic formula. Again, it assumes access to an asymptotically unbiased estimator of the function of interest. The coverage is thus guaranteed in the asymptotic setting after removing the bias and by targeting the approximated smoothed function. Their paper considers the non-gaussian and non-stationary cases. @wang2022 proposed a simultaneous Kolmogorov-Smirnov confidence band by modeling the error distribution, thus avoiding the estimation of the covariance structure of the underlying stochastic process. They rely on B-splines for the estimation of the mean curve. Note that recent extensions have been proposed in @Telschow2023 to construct simultaneous confidence bands, or based on conformal prediction in @conformalPrediction2022, or having a prediction goal in mind in @Jacques2023 by considering functional time series data. These different papers use approximation of the function of interest and do not deal with the non asymptotic associated bias. We want to work in the non asymptotic setting and to propose confidence band of the original true function, not only on the approximated one (obtained by smoothing kernel or by projection). We do not work with the empirical mean estimator for two main reasons. It does not inherit functional properties, especially its regularity. Furthermore, the dimension of the empirical mean estimator is larger than the dimension of a projection or kernel estimator and this induces a larger variance and wider confidence band. Thus we focus on functional estimator. The difference between the original function and its approximation is a functional (deterministic) bias that depends on the quality of the kernel or the projection. This bias may be neglected in the asymptotic setting where the number of observations goes to infinity, but not with finite sample sizes. It has to be taken into account in the construction of the confidence band. Both methods (kernel and projection) rely on the choice of an hyperparameter, the kernel bandwidth in the first case and the dimension of the functional projection basis in the second. The choice of this hyperparameter is crucial to control the type-I-error rate of the confidence band viewed as a band of the true function in a non-asymptotic setting. This is the question we explore in this paper. @sun1994 proposed a bias correction for a particular class of functions but left the smoothing parameter choice open, leading to an unusable estimator. In the nonparametric framework, the bias is approximated using the estimator of the second derivative of the underlying mean function (@Xia1998). But in general, there is a lack of discussion on how to handle the bias of the functional estimator, even in the simple case of a functional space of finite dimension. Hard-thresholding approaches, cross-validation methods (@LI2022104806) or model selection framework could be used to select the best dimension. However, these approaches need to be adapted to the specific case of controlling the level of a confidence band. Few references exist on this subject. For example, while the model selection paradigm has been extensively studied in the literature, in multivariate statistics or functional data analysis (e.g., @GoeppSubmitted, @ANEIROS2022104871, @BASNA2022104868), it has not been explored in the context of confidence band construction. The objective of this paper is to address the bias problem in confidence band construction for a general function, in the non-asymptotic setting with respect to the number of individuals and the number of time points. We adopt the point of view of projecting data onto a functional space, because the functional bias is easier to study and control, than when smoothing data with a kernel. Especially, when the family is an orthonormal basis, e.g., the Legendre basis (with the standard scalar product) or Fourier (with another scalar product), the projection is explicit and it is possible to obtain theoretical results. Moreover, the functional space offers a key advantage: it reduces the problem of inference to the estimation of coefficients, for example by least squares or maximum likelihood estimation. The function estimator is then simply an average after projection onto the functional base (@RamsayBook). Our contributions are as follows: - we disentangle the bias issue by explicitly defining the parameter of interest within the approaches of @sun1994 and @Telschow2023; - we propose a confidence band for the true function of interest, including a bias correction. This provides a collection of debiased confidence bands. We also propose a criteria to select the best band, by splitting the sample into two sub-samples; - we propose a second heuristic method for selecting the dimension of the approximation space, treating it as a model selection problem, with a trade-off between conservatism and confidence level assurance; this approach does not correct the bias of each band of the collection but selects a band with a negligible bias; - we illustrate the proposed strategies and compare them to cross-validation or threshold approaches; - we also illustrate the impact of the choice of the functional family, including non-orthonormal families. The paper is organized as follows: @sec-model introduces the functional regression model, the considered functional family and the corresponding approximate regression models, as well as an estimator defined in the finite space, along with descriptions of the error terms. In @sec-bandSun, we propose a confidence band for the approximate regression function in the space of finite dimension, where the dimension is fixed. @sec-band2 proposes a strategy to construct a confidence band for the true function. This last confidence band being too conservative, @sec-modsel introduces a model selection criterion to select the best confidence band, doing a trade-off between conservatism and confidence level assurance. @sec-simulation illustrates the different estimating procedures of the confidence band. @sec-realdata proposes an application on real data. @sec-conc ends the paper by a conclusion and discussion of perspectives. # Statistical Model {#sec-model} In this paper, we consider time series as discrete measurements of functional curves. We first present the general functional regression model (@sec-functional_model) where the regression function belongs to a finite functional family of dimension $L^*$. In practice, this dimension $L^*$ is unknown and we will work on functional space of dimension $L$. The regression model on the finite family of functions is presented in @sec-projection_model, and an estimator is proposed in @sec-estim, with a description of the error terms. In the rest of the paper, we consider the space $L^2([0,1])$ with its standard scalar product $<f_1,f_2> = \int_0^1 |f_1(t)f_2(t)|dt$, for $f_1,f_2 \in L^2([0,1])$. The notation $Vect$ denotes the linear span. \bigskip ## Functional regression model {#sec-functional_model} Let $y_{ij}$ be the measure at fixed time $t_{j} \in [a,b]$ for individual $i=1, \ldots, N$, with $j=1, \ldots, n$. The case with time observations dependent of the individuals is a natural extension of this case, but is not treated in this paper. We restrict ourselves to $[a,b] = [0,1]$, without loss of generality. We assume these observations are discrete-time measurements of individual curves, which are independent and noisy realisations of a common function $f$ that belongs to a functional space. Thus for each individual $i$, we consider the following functional regression model \begin{equation*} y_{ij} = f(t_{j}) + \varepsilon_{ij}, \end{equation*} where $\varepsilon_{i.}=(\varepsilon_{i1}, \ldots, \varepsilon_{in})$ is the measurement noise assuming that the $\varepsilon_{i}$ are independent. For each individual $i=1, \ldots, N$, we denote $y_{i.}=(y_{i1}, \ldots, y_{in})$, $t_{.}=(t_{1}, \ldots, t_{n})$ and $f(t_{.})=(f(t_{1}), \ldots, f(t_{n}))$ the $n \times 1$ vectors of the observations, times and function $f$ evaluated in $t_{.}$, respectively. We also denote $\mathbf{y} = (y_{1.}, \ldots, y_{N.})$ the whole matrix of observations. The unknown regression function $f$ lives in an infinite space and can not be directly estimated. First we reduce the dimension by projecting $f$ on a function space. Let us consider $L^*$ functions $(B_\ell^{L^*})_{1\leq \ell \leq L^*}$ assumed to be linearly independent and the corresponding functional space $\mathcal{S}^{L^*} = Vect((t \mapsto B_\ell^{L^*}(t))_{1\leq \ell \leq L^*})$. Then, for any $f \in \mathcal{S}^{L^*}$, there exists a unique vector of coefficients $(\mu_{\ell}^{L^*})_{1\leq \ell \leq L^*}$ such that, for all $t$, $f(t) = \sum_{\ell =1}^{L^*} \mu_{\ell}^{L^*} B_\ell^{L^*}(t)$. The regression function $f$ verifies the following assumption: ***Assumption 1.*** The function $f$ belongs to the space $\mathcal{S}^{L^*}$ of dimension $L^*$. It is denoted ${f}^{L^*}$ and defined as: $$f(t) = {f}^{L^*}(t) = \sum_{\ell = 1}^{L^*} \mu_\ell^{L^*} B^{L^*}_\ell(t). $$ Many functional spaces are available in the literature, as Splines, Fourier or Legendre families. We introduce the following assumption: ***Assumption 2.*** The functional family $(t \mapsto B^{L^*}_\ell(t))_{1\leq \ell\leq L^*}$ is orthonormal with respect to the standard scalar product $<.,.>$. Note that if Assumption 2 holds, one get $\mu_\ell^{L^*} = <{f}^{L^*},B^{L^*}_\ell>$ for $\ell = 1,\ldots, L^*$. The Legendre family is orthonormal, the Fourier family is orthogonal for the standard scalar product (but not orthonormal), and the B-splines family is not orthogonal. We will illustrate the impact of using one family or the other in @sec-gen. We also consider a functional noise through the following assumption. ***Assumption 3.*** The sequence $\varepsilon_{i}$ is functional and allows Karhunen-Loève $L^2$ representation: there exists a sequence of coefficients $(c_{i\ell})_{1\leq \ell}$ such that $$\varepsilon_{ij} = \sum_{\ell \geq 1} c_{i\ell} \phi_\ell(t_{j}),$$ where the functions $(\phi_\ell)_{1\leq L}$ can be written through eigenvalues and eigenfunctions of the covariance matrix $cov(\varepsilon_{ij}, \varepsilon_{ij'})$. Practically, we assume this sum to be finite, as done for example in @Chen2015: there exists $L^\varepsilon$ such that $$\varepsilon_{ij} = \sum_{1 \leq L \leq L^\varepsilon} c_{i\ell} \phi_\ell(t_{j}).$$ We also assume that the coefficients are Gaussian: for all $i=1,\ldots, N$ and $\ell=1, \ldots, L^\varepsilon$, $$c_{i \ell} \sim_{iid} \mathcal{N}(0,\sigma^2).$$ Note that we could also work with elliptic distributions instead of the Gaussian distribution. The results of the paper would be the same but require more technical details. For simplicity, we focus on the Gaussian case. Assumption 1 and Assumption 3 imply that each curve $y_i$ belongs to a finite family: for $j=1,\ldots,n$, $$ y_{ij} = \sum_{\ell=1}^{L^*} \mu_{\ell}^{L^*} B_\ell^{L^*}(t_{j}) + \sum_{\ell=1}^{L^\varepsilon} c_{i\ell} \phi_\ell(t_{j}). $$ As the observations are recorded at discrete time points $(t_j)_{1\leq j \leq n}$, for $L\in \mathbb{N}$, let us denote $\mathbf{B}^L$ the matrix of $n\times L$ with coefficient in row $j$ and column $\ell$ equal to $B^L_\ell(t_{j})$, and the basis for the noise $\Phi^{L^\varepsilon} = (\phi_\ell(t_j))_{1\leq \ell \leq L^\varepsilon, 1\leq j \leq n}$. Let us introduce $c_{i.}=(c_{i1}, \ldots, c_{iL^\varepsilon})$ the $L^\varepsilon \times 1$ vector. Then $\varepsilon_{i.} = \Phi^{L^\varepsilon} c_{i.}$. The vectors $y_{i.} \in \mathbb{R}^n$ are thus independent and $y_i\sim \mathcal{N}_n(f(t_{.}), \sigma^2 \Sigma^{L^\varepsilon})$ with $\Sigma^{L^\varepsilon} = \Phi^{L^\varepsilon} (\Phi^{L^\varepsilon})^T$. The objective of this paper is to construct a tight confidence bound for $f^{L^*}$ using data $(y_{ij})_{ij}$. The main challenge is that the true dimension $L^*$ is unknown. In practice, we can only work with a projected version on the space $\mathcal{S}^{L}$ with $L \in \{L_{\min}, \ldots, L_{\max}\}$. Two issues are introduced: the bias induced by this projection and the choice of $L$. In this paper, we treat both of them in a non asymptotic setting to construct confidence band with a control level. In the rest of the paper, we will work with a collection of models defined by the dimension $L$ with $L\in \{L_{\min}, \ldots, L_{\max}\}$, $L_{\max}$ being chosen to be sufficiently large by the user, expecting that $L^*\leq L_{\max}$. $L_{\max}$ has to be large enough to do overfitting. We will study the functional bias and its asymptotic behavior. The we will propose different strategies to choose the best dimension among the different collections. First, in @sec-projection_model and @sec-estim, we define for a fixed $L$ the corresponding regression model and its estimator. Then @sec-bandSun, @sec-band2 and @sec-modsel will introduce the different bandwidths. ## Approximation of the model on a finite family {#sec-projection_model} Let $f^{L^*} \in \mathcal{S}^{L^*}$ with $L^*$ unknown, and consider the space $\mathcal{S}^L$ for $L$ fixed in $\{L_{\min}, \ldots, L_{\max}\}$. As $\mathcal{S}^L$ is the linear span of linearly independent functions, there is always a unique vector $\mu^{L}$ of coefficients defining $f^{L}(t) = \sum_{\ell_=1}^L \mu_\ell^{L} B_\ell^L(t)=B^L(t) \mu^{L}$ such that $$f^{L} = \arg\min_{f \in \mathcal{S}^L}\{\|f^{L^*} - f\|_2^2\}.$$ If the family is orthonormal, it corresponds to the projected coefficients $\mu_{\ell}^{L}$: $$\mu_{\ell}^{L} :=<f^{L^*}, B_\ell^L>.$$ We know that under Assumption 1, $f^{L^*,L^*} = f^{L^*}$. Moreover under Assumption 2, $$\mu_{\ell}^{L} = \mu_\ell^{L^*} \quad{ for }\; \ell =1, \ldots, \min(L, L^{*}).$$ It is interesting to note that this is not true when the basis is not orthonormal. In practice, data are observed at discrete time, we consider the operator $\mathbf{P}^L$ defined as the matrix $\mathbf{P}^L = ((\mathbf{B}^L)^T \mathbf{B}^L)^{-1} (\mathbf{B}^L)^T$ of size $L\times n$. This coincides with the orthogonal projection when we deal with an orthonormal basis, but this formula also holds for non orthonormal family, coming back to the least square estimator on a specified family. Then we define the coefficients $\underline{\mu}^{L}$ which are the coefficients of ${\mu}^{L}$ approximated on the vector space, denoted $\mathbf{S}^L$, defined by the matrix $\mathbf{B}^L$. $$\underline{\mu}^{L} := \mathbf{P}^L \mathbf{B}^{L^*}\mu^{L^*}.$$ Note that - When $L\geq L^*$, when $n>L$, we have for $\ell=1, \ldots, L^*$, $$ \underline{\mu}^{L}_\ell = \mu^{L^*}_\ell.$$ - When $L<L^*$, for $\ell =1, \ldots, L$, $$\underline{\mu}^{L}_\ell \neq \mu^{L^*}_\ell.$$ The corresponding finite approximated regression function is denoted $\underline{f}^{L}$ and is defined, for all $t\in [0,1]$, as $$\underline{f}^{L}(t) = B^L(t) \underline{\mu}^{L}.$$ We can observe the following, linking $L, L^*$ and the number of timepoints $n$: under Assumption 1 and Assumption 2, the diagonal elements of $\mathbf{P}^L \mathbf{B}^{L^*}$ are such that for $\ell=1, \ldots, \min(L, L^*), [\mathbf{P}^L \mathbf{B}^{L^*}]_{\ell\ell}=1$. Moreover, - When $n\rightarrow\infty$, for $\ell = 1,\ldots, L$ $$\underline{\mu}_\ell^{L}\rightarrow \mu_\ell^{L^*}. $$ - If $n>L^*$, then $f^{L^*} = f^{L^*,L^*} = \underline{f}^{L^*,L^*}$. The last point is particularly interesting, relating $n$ the number of timepoints to the true level $L^*$ such that the discretized functions correspond to the true one. ## Estimator {#sec-estim} Let $L\in \{L_{\min}, \ldots, L_{\max}\}$. This section presents the least square estimator of the regression function on the space of dimension $L$ defined by the family $\mathbf{B}^L$ and discusses its error, i.e. its bias and its behavior with respect to $L$ and $n$. ### Estimation of the regression function When considering the estimation of the regression function $f^{L^*}$ on the space of dimension $L$ defined by the family $\mathbf{B}^L$, we do not directly estimate $f^{L^*}$ but its projection on this finite space, which corresponds to the projected function $\underline{f}^{L}(t)$ and its associated coefficients $(\underline{\mu}_\ell^{L})_{1 \leq \ell \leq L}$. ::: {#def-proj2} The vector of coefficients $(\underline{\mu}_\ell^{L})_{1 \leq \ell \leq L}$ is estimated by the least square estimator $\hat{\underline{\mu}}^{L}$ defined as: \begin{align*} \hat{\underline{\mu}}^{L}:=\frac1N \sum_{i=1}^N \mathbf{P}^L y_{i.}. \end{align*} For a fixed $t \in [0,1]$, the estimator of the function $\underline{f}^{L}(t)$ is defined by: $$ \underline{\hat{f}_{}}^{L}(t) = \sum_{\ell=1}^L \underline{\hat \mu}_\ell^{L} B^L_\ell(t)= B^L(t)\hat{\underline{\mu}}^{L}. $$ {#eq-fhatL} ::: @eq-fhatL directly implies that the estimator is thus the empirical mean of the functional approximation of each individual vector of observations. Because we work with least squares estimators, we can easily study the error of estimation of $\hat{\underline{\mu}}^{L}$ and $\underline{\hat{f}_{}}^{L}$. ::: {#prp-error} Under Assumption 1 and Assumption 3, we have \begin{align*} \hat{\underline{\mu}}^{L} &\sim \mathcal{N}_L\left(\underline{\mu}^{L}, \frac{\sigma^2}N\Sigma_{B}^{L,L^\varepsilon} \right), \end{align*} where the $L\times L$ covariance matrix $\Sigma_{B}^{L,L^{\varepsilon}}$ is defined as $\Sigma_{B}^{L,L^\varepsilon}:=\mathbf{P}^L \Sigma^{L^\varepsilon} (\mathbf{P}^L)^T$ with $\Sigma^{L^\varepsilon}= \Phi^{L^\varepsilon} (\Phi^{L^\varepsilon})^T$. Moreover, $B^L\mathbf{P}^L y_i$ is a Gaussian process with mean $\underline{f}^{L}$ and covariance function $(s,t) \mapsto \sigma^2 B^L(s) \Sigma_{B}^{L,L^\varepsilon} (B^L(t))^T$, and $(\underline{\hat{f}_{}}^{L}- \underline{f}^{L})$ is a centered Gaussian process with covariance function $C^{L}: (s,t) \mapsto \frac{\sigma^2}N B^L(s) \Sigma_{B}^{L,L^\varepsilon} B^L(t)^T$. ::: The proof is given in Appendix. Even if the estimator $\underline{\hat{f}_{}}^{L}$ is defined on the functional space associated to $\mathbf{S}^L$, our approach consists in seeing it as an estimator of the function $f^{ L^*}$ which lies in the space $\mathcal{S}^{L^*}$. In that case, the error includes a functional approximation term due to the approximation of $f^{ L^*}$ on the space $\mathcal{S}^L$, which will be nonzero if $L\neq L^*$. It corresponds to the bias of the estimator $\underline{\hat{f}_{}}^{L}$, i.e. the difference between its expectation and the true $f^{ L^*}$. Indeed, recalling that $f^{L^*} = \underline{f}^{L^*,L^*}$, the error of estimation can be decomposed into $$ \underline{\hat{f}}^{L}(t) -f^{L^*}(t) = \underline{\hat{f}}^{L}(t) - \underline{f}^{L}(t) + \underline{f}^{L}(t) - \underline{f}^{L^*,L^*}(t) =: Stat_{L}(t) + Bias_{L}(t), $$ {#eq-decomposition} The first term $Stat_{L}(t) = \underline{\hat{f}}^{L}(t) - \underline{f}^{L}(t)$ is the (unrescaled) statistics of the model. The second term $Bias_{L}(t) = \mathbb{E}(\underline{\hat{f}}^{L}(t)) - \underline{f}^{L^*,L^*}(t)$ is the bias of the estimator $\underline{\hat{f}}^{L}(t)$ when estimating the true function $\underline{f}^{L^*,L^*}(t)$. Let us remark that this bias is different than the bias of the estimator $\underline{\hat{f}}^{L}(t)$ when estimating the projected function $\underline{f}^{L}=f^{ L^*}$, which is 0. The two terms defined in @eq-decomposition are more detailed in the two next subsections. ### Statistics The statistics of the model, $t\mapsto Stat_{L}(t) = \underline{\hat{f}}^{L}(t) - \underline{f}^{L}(t)$, is a random functional quantity which depends on the estimator $\underline{\hat{f}}^{L}$. From @prp-error, for any $t\in [0,1]$, we define the centered and rescaled statistics $Z_L(t)$: $$Z_L(t):= \frac{Stat_{L}(t)}{\sqrt{\text{Var}(Stat_{L}(t))}} = \frac{\underline{\hat{f}}^{L}(t) - \underline{f}^{L}(t)}{\sqrt{C^{L}(t,t)}}\sim \mathcal{N}(0,1). $$ The covariance function can be naturally estimated using the observations $y_{i.}$ as $$\hat C^{L}(s,t) = \frac1{N-1}\sum_{i=1}^{N} (B^L(s)\mathbf{P}^L y_{i.} - \underline{\hat{f}_{}}^{L}(s))(B^L(t)\mathbf{P}^L y_{i.} - \underline{\hat{f}_{}}^{L}(t)).$$ ### Bias The bias is due to the fact that the estimation is potentially performed in a different (finite) space than the space where the true function $\underline{f}^{L^*,L^*}$ lives. This is a functional bias, which is not random. It corresponds to the approximation of $f^{L^*}$ from $\mathcal{S}^{L^*}$ to the space $\mathcal{S}^{L}$. It can be written as follows: $$Bias_{L}(t) = B^L(t) \underline{\mu} ^{L} - B^{L^*}(t) \mu^{L^*}.$$ It depends on $L$ and on the sample size through $\underline{\mu} ^{L}$. Let us describe its behavior. When $L<L^*$ and the family is orthonormal, the approximation is the orthogonal projection and we have that $$Bias_{L}(t)= \sum_{\ell=1}^L B_\ell^L(t)\underline{\mu}_\ell^{L} - \sum_{\ell=1}^{L^*} B_\ell^{L^*}(t)\underline{\mu}_\ell^{L^*} = \sum_{\ell = L+1}^{L^*} B_\ell^{L^*}(t)\underline{\mu}_\ell^{L^*}.$$ From @prp-error, we can directly deduce the following proposition: ::: {#prp-bias} Under Assumption 1 and Assumption 3, the bias is, for all $t\in [0,1]$, - for $L<L^*$, $Bias_{L}(t)\neq 0$, - for $L\geq L^*$, $Bias_{L}(t) = 0$. ::: Note that the bias is not $0$ even when $n\rightarrow \infty$, as soon as $L<L^*$. Thus, a correct selection of the dimension $L$ is an issue. \bigskip In the next section, we explain how we use this property to derive confidence bands of $\underline{f}^{L}$ and $f^{L}$. # Preliminary step: Confidence Bands of $\underline{f}^{L}$ and $f^{L}$ for a fixed $L$ {#sec-bandSun} In this Section, we present some well known results on constructing a confidence band of $\underline{f}^{L}$ and $f^{L}$. As its definition and properties are essential to construct the next steps about confidence band of $f^{L^*}$, we have decided to recall them in details. The objective is to construct a confidence band for the two functions $\underline{f}^{L}$ and $f^{L}$, based on the observations $\mathbf{y}$, for a given value $L\in \{L_{\min}, \ldots, L_{\max}\}$. The band for $\underline{f}^{L}$ enters the framework proposed by @sun1994 which relies on an unbiased and linear estimator of the function as the estimator $\underline{\hat f}^{L}$  is an unbiased estimator of $\underline{f}^{L}$. We recall in @sec-bandSun2 the construction of this confidence band which attains a given confidence level in a non-asymptotic setting, that is for a finite number of observations $n$ for each individual. Then in @sec-bandfLLstar, we prove that the confidence band proposed by @sun1994 can be viewed as a confidence band for $f^{L}$ with an asymptotic confidence level, the asymptotic framework being considered when $n\rightarrow\infty$. \bigskip ## Confidence band for $\underline{f}^{L}$ {#sec-bandSun2}  Consider $1-\alpha$  a fixed confidence level. The aim is to find a function $d^L$ such that $$\mathbb{P}\left( \forall t \in [0,1],\; \underline{\hat{f}_{}}^{L}(t) -d^L(t)\leq \underline{f}^{L}(t)\leq \underline{\hat{f}_{}}^{L}(t) +d^L(t)\right) = 1-\alpha.$$ Consider the normalized statistics $Z_L(t)$ which is a centered and reduced Gaussian process. We want to find the quantile $q^L$ satisfying $$q^L =\arg\min_{q} \left\{ \mathbb{P}\left(\max_{t \in [0,1]} \left|Z_L(t)\right| \leq q \right) = 1 - \alpha\right\}.$${#eq-qL} Then we can take $d^L(t) = q^L \sqrt{C^{L}(t,t)}$. The covariance function $C^{L}(t,t)$ can be replaced by its estimator $\hat{C}^{L}(t,t)$, making the distribution a Student's distribution with $N-1$ degrees of freedom. Thus, it only requires to be able to compute the critical value $q^L$. This can be done following @sun1994 who propose a confidence band for a centered Gaussian process. Their procedure is based on an unbiased linear estimator of the function of interest, which is the case for $\underline{\hat{f}_{}}^{L}$ when we consider a band for $\underline{f}^{L}$. We recall their result in the following proposition, the computation of the value $q^L$ is detailed thereafter. Note that the presentation of @TELSCHOW202270 is similar to the one adopted in this paper. ::: {#thm-sunLoader} ## @sun1994 Set Assumption 1 and Assumption 3 and a probability $\alpha\in [0,1]$. Then, we have $$\mathbb{P}\left(\forall t \in [0,1], \left|\underline{\hat{f}}^{L}(t)-\underline{f}^{L}(t)\right| \leq \hat d^L(t)\right) = 1- \alpha$$ with \begin{align*} \hat d^L(t) = \hat q^L \sqrt{\hat C^{L}(t,t)/N} \end{align*} and $\hat q^L$ defined as the solution of the following equation, seen as a function of $q^L$: $$ \alpha = \mathbb{P}\left(|t_{N-1}|>q^L\right) +\frac{\| \tau^L\|_1}{\pi}\left( 1+\frac{(q^L)^2}{N-1}\right)^{-(N-1)/2} , $${#eq-cL} with $(\tau^L)^2(t)= \partial_{12} c(t,t) = Var(Z'_L(t))$ where we denote $\partial_{12}c(t,t)$ the partial derivatives of a function $c(t,s)$ in the first and second coordinates and then evaluated at $t=s$. ::: We can thus deduce a confidence band of level $1-\alpha$ for $\underline{f}^{L}$: \begin{align*} CB_1(\underline{f}^{L})& = \left\{ \forall t\in [0,1], \left [\underline{\hat{f}}^{L}(t) -\hat d^L(t) ; \underline{\hat{f}}^{L}(t) +\hat d^L(t) \right ]\right\}. \end{align*} The value $\hat q^L$ is defined implicitly in @eq-cL which involves the unknown and not explicit quantity $t\mapsto \tau^L(t)$. @Liebl2019 propose to estimate $\tau^L(t)$, for all $t$, by \begin{align*} \hat \tau^L(t) &= \left(\widehat{Var}(({U}^{L})'_{1}(t), \ldots, ({U}^{L})'_{N}(t)\right)^{1/2}\\ &= \left(\frac{1}{N-1}\sum_{i=1}^N\left(({U}^{L})'_{i}(t)-\frac1N\sum_{j=1}^N({U}^{L})'_{j}(t)\right)^2\right)^{1/2}, \end{align*} where ${U}^L_{i}(t) = (P^L y_{i.}(t)-\underline{\hat{f}}^{L}(t))/(\hat C^{L}(t))^{1/2}$ and $({U}^{L})'_{i}$ is a smooth version of the differentiated function ${U}^L_{i}$. Then we take the $L_1$-norm of $\hat \tau^L$. Let us describe the behavior of $\hat d^L$: - $\|\hat d^L\|_\infty$ increases with $L$. - When the functions $(B_\ell^L)_{1\leq \ell \leq L}$ form an orthonormal family, $\|\hat d^L\|_\infty$ increases with $L$ until $L=L^*$ and then $\|\hat d^L\|_\infty$ is constant with $L$. Their behavior will be illustrated with different function families in @sec-simulation. ## Asymptotic confidence band for $f^{L}$ {#sec-bandfLLstar} Note that in the asymptotic framework $n\rightarrow\infty$, the previous definition of $\hat d^L$ induces a natural asymptotic confidence band for the function $f^{L}$. Indeed, we can prove that ::: {#thm-CB_Liebl_asymptotic} Set Assumption 1 and Assumption 3 and a probability $\alpha\in [0,1]$. Then, we have, $$\lim_{n \rightarrow +\infty} \mathbb{P}\left(\forall t \in [0,1], |\underline{\hat{f}}^{L}(t)-f^{L}(t)| \leq \hat d^L(t)\right) \geq 1-\alpha,$$ with $\hat d^L(t) = \hat q^L \sqrt{\hat C^{L}(t,t)/N}$ and $\hat q^L$ is defined as the solution of @eq-cL. ::: The proof is given in Appendix. Then a confidence band for $f^{L}$ at the asymptotic confidence level $1-\alpha$ for a large number of observations $n$ is given by \begin{align*} CB(f^{L})& = \left\{ \forall t\in [0,1], \left[\underline{\hat{f}}^{L}(t) -\hat d^L(t) ; \underline{\hat{f}}^{L}(t) +\hat d^L(t) \right]\right\}. \end{align*}  # Method 1: Confidence Band of $f^{L^*}$ by correcting the bias {#sec-band2} The function of interest is $f^{L^*}=\underline{f}^{L^*, L^*}$, rather than $\underline{f}^{L}$. Therefore, our objective is to construct a confidence band for $f^{L^*}$. However, an unbiased estimator of $f^{L^*}$ is not available by definition, since the true dimension $L^*$ is unknown. We propose instead to work with the estimator $\underline{\hat f}^{L}$ and to debias the corresponding confidence band. To do this, we use the decomposition between the bias term and the statistical term, outlined in @eq-decomposition. The idea is to bound the infinite norm of these two terms. A first strategy is to bound each term separately, then add the two bounds to construct the confidence band. However, this approach tends to produce a band that is too large and too conservative. This is because applying the infinite norm to each term before bounding them does not take into account the functional nature of the two terms. A second strategy is to retain the functional aspect by bounding the infinity norm of the sum of the two functional terms. This approach is detailed in this section. In @sec-construction2, we first rewrite the band as a band around $\underline{f}^{L}(t)$. We need to estimate the band bound and the bias. To do this, we divide the sample into two sub-samples. This choice is not ideal because it increases the variability of the estimators. But at least, it provides independence between the estimators of the two quantities, which makes it possible to establish the theoretical coverage of the final band. More precisely, we use a first subsample $\mathbf{y}^1$ to estimate the bound defined in @sec-bandSun. A second subsample $\mathbf{y}^2$ is used to estimate the bias term (without the infinite norm). This results in a pointwise correction of the bias, and the final confidence band is centered around $\underline{\hat{f}}^{L_{\max},L^*}$. This procedure provides a collection of confidence bands, for $L\in \{L_{\min}, \ldots, L_{\max}\}$ with variable width. Then, in @sec-SelectionL2, we propose a criterion to select the "best" band by minimizing its width. We discuss the band thus obtained at the end of the section and its limits. ## Construction of the band of $f^{L^*}$ for a given $L$ {#sec-construction2} We introduce two independent sub-samples $\mathbf{y}^1$ and $\mathbf{y}^2$ of $\mathbf{y}$ of length $N_1$ and $N_2$ such that $N_1+N_2=N$. We use the first sub-sample $\mathbf{y}^1$ to calculate $\underline{\hat{f}_1}^{L}(t)$, an estimator of $\underline{f}^{L}(t)$ and a functional bound denoted $\hat d_1^L$ that controls the bias term $\underline{f}^{L}(t)-\underline{\hat{f}_1}^{L}(t)$. This bound is defined in @sec-bandSun applied on $\mathbf{y}^1$, for a given level $\alpha$, such that: $$ \mathbb{P}\left(\forall t\in[0,1], -\hat d_1^L(t) \leq \underline{f}^{L}(t) - \underline{\hat{f}}_1^{L}(t) \leq \hat d_1^L(t)\right) = 1-\alpha. $$ {#eq-def-dL1} Next, we need to control the bias $Bias_{L}(t) = \underline{f}^{L}(t)-f^{L^*}(t)$. Recall that when $L_{\max}$ is sufficiently large and $n> L_{\max}$, we have $f^{L^*} = \underline{f}^{L_{\max},L^*}$. We therefore need to control the $Bias_{L}(t) = \underline{f}^{L}(t) - \underline{f}^{L_{\max},L^*}(t)$. It would be tempting to replace $Bias_{L}(t)$ by its estimation based on the second sample $\mathbf{y}^2$, but this would introduce an estimation error that we also need to control, in the same spirit as what is done in @Pascal2017. We can again use @sec-bandSun to compute the function $\hat d_2^{L,L_{\max}}(t)$ on the sample $\mathbf{y}^2$, and the functional estimators $\underline{\hat{f}}_2^{L}(t)$ and $\underline{\hat f}_2^{L_{\max}, L^*}(t)$ of $\underline{f}^{L}(t)$ and $\underline{ f}^{L_{\max}, L^*}(t)$, respectively. This allows us to construct the following band for $\underline{f}^{L}(t) - \underline{f}^{L_{\max},L^*}$ for a confidence level $1-\beta$, $$ \mathbb{P}\left(\forall t\in[0,1], - \hat d_2^{L,L_{\max}}(t) \leq \underline{ f}^{L_{\max}, L^*}(t)- \underline{ f}^{L}(t) - (\underline{\hat f}_2^{L_{\max}, L^*}(t)-\underline{\hat f}_2^{L}(t)) \leq \hat d_2^{L,L_{\max}}(t)\right) = 1-\beta. $$ {#eq-def-dL2} Combining @eq-def-dL1 and @eq-def-dL2, we can provide a debiased confidence band of $f^{L^*}(t)$. ::: {#prp-CBf} Let us define \begin{align*} \hat\theta_1^L(t) &:= -\hat d_1^L(t) - \hat d_2^{L,L_{\max}}(t) +\underline{\hat f}_2^{L_{\max}, L^*}(t)-\underline{\hat f}_2^{L}(t) \\ \hat\theta_2^L(t)&:= \hat d_1^L(t) + \hat d_2^{L,L_{\max}}(t) +\underline{\hat f}_2^{L_{\max}, L^*}(t) - \underline{\hat f}_2^{L}(t), \end{align*} where $\hat d_1^L(t)$ is defined on sample $\mathbf{y}^1$ by @eq-def-dL1 for a level $\alpha$ and $\hat d_2^{L,L_{\max}}(t)$ is defined on sample $\mathbf{y}^2$ by @eq-def-dL2 for a level $\beta$. Then we have \begin{align*} \mathbb{P}\left(\forall t \in [0,1], \quad \hat \theta_1^L(t) \leq f^{L^*}(t)- \underline{\hat f}_1^{L}(t)\leq \hat\theta_2^L(t)\right)\geq 1-\alpha\beta. \end{align*} ::: The proof is given in Appendix. This defines a confidence band which can be defined either around $\underline{\hat{f}_{1}}^{L}$: \begin{align*} CB_2(\underline{f}^{L^*})=\left\{\forall t\in[0,1], \left[\underline{\hat{f}_1}^{ L, L^*}(t) + \hat\theta_1^{ L}(t) \, ;\, \underline{\hat{f}_1}^{ L, L^*}(t) + \hat\theta_2^{ L}(t) \right] \right\} \end{align*} or around $\underline{\hat f}_2^{ L_{\max}, L^*}$: \begin{align*} CB_2(\underline{f}^{L^*})=\left\{\forall t\in[0,1], \left[\underline{\hat f_2}^{ L_{\max}, L^*}(t) + \bar \theta_1^L(t) \, ; \, \underline{\hat f_2}^{ L_{\max}, L^*}(t) +\bar \theta_2^L(t) \right]\right\}. \end{align*} with $\bar \theta_1^L(t) = \underline{\hat{f}_{1}}^{L}(t)-\underline{\hat f_2}^{ L, L^*}(t) -\hat d_1^L(t) - \hat d_2^{L,L_{\max}}(t)$ and $\bar \theta_2^L(t) = \underline{\hat{f}_{1}}^{L}(t)-\underline{\hat f_2}^{ L, L^*}(t) + \hat d_1^L(t) + \hat d_2^{L,L_{\max}}(t)$. ::: {#rem-dLLmax} The two functions $\hat d_1^L(t)$ and $\hat d_2^{L,L_{\max}}(t)$ are of the same order because they are constructed using the same approach. They depend on the length of the samples. To obtain the thinnest band, the best strategy is to divide the sample in two sub-samples of equal length $N_1=N_2=N/2$. ::: The behavior of $\hat d_1^L$ was described in @sec-bandSun. Let us describe the behavior of $\hat d_2^{L,L_{\max}}$: - $\|\hat d_2^{L,L_{\max}}\|_\infty$ decreases with $L$. - When $L>L^\varepsilon$, $\|\hat d_2^{L,L_{\max}}\|_\infty$ is constant with $L$ and the probability in @eq-def-dL2 is equal to $1$. - When $L^{*}<L<L^\varepsilon$, $\|\hat d_2^{L,L_{\max}}\|_\infty$ is constant with $L$ when the functions $B_\ell^L$ form an orthonormal family. Otherwise, the behavior is erratic. This means that when the band defined in @prp-CBf is calculated for $L>L^\varepsilon$, the confidence level is $1-\alpha$ instead of $1-\alpha\beta$. The advantage of this approach is that the band bias is corrected and the level for the true function $f^{L^*}$ is guaranteed when $L^\varepsilon$ is large. This was the main aim of the paper. The main limit of this approach is that the band is constructed with samples with half sizes, leading to less precision. This will be illustrated in @sec-simulation. Nevertheless, this method gives finer confidence bands than cross-validation, and with the right level of confidence. A natural question is then the choice of the dimension $L$. This is the purpose of the next section. ## Influence of $L$ {#sec-SelectionL2} This approach produces a collection of debiased confidence bands for different values of $L$. The confidence bands have different widths but the same confidence level $1-\alpha\beta$. It is therefore natural to want to select one of them. This means we want to select the best dimension $L$ among the collection $\{L_{\min}, \ldots, L_{\max}\}$. We need to define what "best" means. It is quite intuitive to want to focus on the finest band, fine in the sense of a certain norm. Here we consider the infinite norm of the width of the confidence band. This gives preference to smooth bands. We therefore define the following criteria for selecting $L$. $$ \hat L = \arg\min_L \left\{\sup_{t} |\hat\theta_2^L(t)-\hat\theta_1^L(t)|\right\} = \arg\min_L \left\{\sup_t | \hat d^L(t) + \hat d^{L,L_{\max}}(t)|\right\}. $$ {#eq-def-Lhat} This global approach guarantees that each band of the collection is debiased and then the dimension is selected. It will be illustrated in @sec-simulation. In the next section, instead of debiasing each band, we employ another strategy focusing on the construction of a selection criteria that will guarantee that the bias is negligible. # Method 2: Selection of the best confidence band with a criteria taking into account the bias {#sec-modsel} In this section, we propose a new method in the non asymptotic setting to provide a confidence band of $f^{L^*}$ without correction the bias but taking it into account in the selection procedure. Our method uses the collection of confidence bands defined in @sec-bandSun. Instead of correcting their bias, the strategy is to propose a selection criterion that is a trade-off between this bias and the dimension of the basis. To do this, we propose a new heuristic criterion linked to the definition of the band itself, considering the estimation of the band as the estimation of a quantile of a certain empirical process. The criterion is inspired by model selection tools for choosing the best dimension $L$. In the following, we assume that $L_{\max}$ is large enough such that $\underline f^{L_{\max},L^*}=f^{L^*}$. We work on the quantile $q^L$ introduced in @eq-qL, its oracle version $q^{L^*}$ for level $L^*$ and its estimate $\hat q^L$. All are scalars, belonging to a collection indexed by $L \in \{L_{\min}, \ldots, L_{\max}\}$. A natural criterion for choosing the best $L$ is that the estimator $\hat q^L$ minimizes the quadratic error $\mathbb{E}\left( \|q^{L^*}-\hat q^L\|^2\right)$. However, this quadratic error is unknown as $q^{L^*}$ is unknown. We cannot use it directly. Instead, we study $\|\hat{q}^{L_{\max}}-\hat q^L\|^2$. While the theoretical quadratic error $\mathbb{E}\left( \|q^{L^*}-\hat q^L\|^2\right)$ decreases when $L<L^*$ and increases when $L>L^*$, the $\|\hat{q}^{L_{\max}}-\hat q^L\|^2$ approximation to this error always decreases when $L>L^*$, as illustrated in @sec-simulation. We see a behavior similar to a bias, high when the dimension is small, and small when the dimension is large. Selecting a dimension using this criterion will always overfit the data. We therefore propose to penalize this quantity by the dimension $L$ divided by the sample size $N$, as usual in model selection criteria. To do this, we introduce a regularisation parameter $\lambda>0$ which balances the two terms. A natural criterion to select the best $L$ is then $$\widetilde{crit}(L) = \|\hat q^{L_{\max}}-\hat q^L\|^2 +\lambda \frac{L}{N}.$$ Then we define $$\tilde L = \arg\min_L \widetilde{crit}(L),$$ and take the band centered around $\underline{\hat{f}}^{\tilde L, L^*}$: $$CB_3(\underline{f} ^{L^*}) = CB_1(\underline{f}^{\tilde L, L^*})$$ This criterion is illustrated in @sec-simumodsel. We also compare with two other standard approaches, heuristic as well, namely the cross-validation approach used to select the dimension $L$ which minimizes the reconstruction error, and a thresholding method which keeps the higher dimension $L$ with large enough coefficients. These two methods are less oriented to the objective of controling the level of the selected confidence band. # Simulation study {#sec-simulation} In this section, we illustrate the different statements provided along the paper on generated data. First, in @sec-dgp, we describe the generating data process and illustrate the linear estimator considered in this paper. In @sec-simu-band, we illustrate the first confidence band, for a fixed level, as introduced in @sec-bandSun. Then, we illustrate the debiased confidence band in @sec-simudebias, and discuss the model selection criterion in @sec-simumethod2, comparing both of them with state-of-the-art methods in @sec-simumodsel. We finally study the generalization of the method out of the class of models in @sec-gen. ## Generating data process {#sec-dgp} To illustrate the model, we simulate a regression functional model with $n=40$ regular timepoints per individual and $N=25$ individuals. In @fig-BasisModel, the function $f$ (black curve) belongs to the Fourier (resp. Legendre and Spline) family with $L^*=7$ and the noisy individual observations $(y_{ij})_{1\leq i \leq N, 1\leq j \leq n}$ (grey curves) have a functional noise in dimension $L^{\varepsilon}=15$, also in the Fourier (resp. Legendre and Spline) family on the left plot (resp. middle and right). ```{r message=FALSE, fig.height=3, fig.width=7} #| label: fig-BasisModel #| fig-cap: "Functional regression with different bases. We generate a regression functional model using three different basis families: Fourier (left), Legendre (middle), and Splines (right). The black curve represents the true mean function, while the gray curves show individual noisy observations." library(ggplot2) library(fda) library(orthopolynom) library(wavelets) library(basefun) library(fields) library(ggpubr) func.basis = function(time, L, basis){ if (basis == 'Splines'){ B = bsplineS(time,breaks = seq(min(time), max(time), length.out = L-1), norder = 3, 0) } else if (basis == 'Fourier'){ B = fourier(time,nbasis = L) } else if (basis == 'Legendre'){ B = Legendre_basis(numeric_var("x", support = c(min(time), max(time))), order = L-1)(time) } return(B) } dgp = function(n=40,N=25,sd=1,L.star=7, L.eps = 15, alpha.mu = NULL, basis = 'Fourier', f.true = NULL){ ### n: number of timepoints ### N: number of individuals ### sd: noise level (standard deviation) ### L: number of functional basis for the signal ### L.eps: number of functional basis for the noise ### alpha.mu: vector of coefficients ### basis: functional basis, to be chosen in 'Legendre', 'Fourier', 'Splines' ### f.true: if one wants to fix the true function time = seq(0, 1, length.out = n) if (basis == 'Fourier'){ if (L.star%%2 == 0){ L.star = L.star+1} if (L.eps%%2 == 0){ L.eps = L.eps+1} } if (is.null(f.true)){ ### Create the functional basis B = func.basis(time, L.star, basis) ### Generate randomly coefficients, if needed if (is.null(alpha.mu)){ alpha.mu = sample(x = c(-5:-2, 2:5),size=L.star,replace=TRUE)} if (basis == 'Fourier'){ alpha.mu = alpha.mu / 3 sd = sd/2 } y = matrix(0,nrow = N, ncol = n) f.true = B %*% alpha.mu } B.eps = func.basis(time, L.eps, basis) y = matrix(0,nrow = N, ncol = n) for (i in 1:N){ y[i,] = f.true + B.eps %*% matrix(rnorm(L.eps, 0, sd)) } return(list(y = t(y), alpha.mu = alpha.mu, time = time, L.star=L.star, f.true = f.true)) } set.seed(1) n=40 N=25 sd=1 L.star=7 L.eps = 15 alpha.mu = NULL set.seed(1) basis = "Legendre" data = dgp(n=n,N=N,sd=sd,L.star=L.star, L.eps = L.eps, alpha.mu = alpha.mu, basis = basis, f.true = NULL) dfLegendre = data.frame( Time = rep(data$time,N+1), basis = rep(basis, (N+1)*length(data$time)), f = c(data$y, data$f.true), type = rep(c(rep("Obs", N), "True mean"), each = n), ind = rep(1:(N+1), each = n)) basis = "Fourier" data = dgp(n=n,N=N,sd=sd,L.star=L.star, L.eps = L.eps, alpha.mu = alpha.mu, basis = basis, f.true = NULL) dfFourier = data.frame( Time = rep(data$time,N+1), basis = rep(basis, (N+1)*length(data$time)), f = c(data$y, data$f.true), type = rep(c(rep("Obs", N), "True mean"), each = n), ind = rep(1:(N+1), each = n)) basis = "Splines" data = dgp(n=n,N=N,sd=2*sd,L.star=L.star, L.eps = L.eps, alpha.mu = alpha.mu, basis = basis, f.true = NULL) dfSplines = data.frame( Time = rep(data$time,N+1), basis = rep(basis, (N+1)*length(data$time)), f = c(data$y, data$f.true), type = rep(c(rep("Obs", N), "True mean"), each = n), ind = rep(1:(N+1), each = n)) df = rbind(dfLegendre, dfFourier, dfSplines) ggplot(data = df, aes(x = Time, y = f, color = type, group = ind)) + geom_line(linewidth= 0.5)+ scale_colour_manual(values = c("Obs" = "lightgrey","True mean" = "black")) + facet_grid(~basis) ``` ## Confidence band for a fixed level {#sec-simu-band} The general band for $f^L$ derived in @thm-sunLoader is illustrated on @fig-bandL. It displays on the top row several functional data generated under either the Fourier family (left), Legendre (middle) or Spline (right), on the middle row the confidence bands of $\underline f^{L}$ for different values of $L \in \{3,5, 7, 11\}$ and $15$, and on the bottom row the bound $\hat d^L$. On the middle row, the true functions $\underline f^{L}$ are displayed in black and the confidence bands in color. The bands are very precise for each $L$. The behavior of $\hat d^L$ increases with $L$. As $d^L$ can be seen as a variance, $\hat d^L(t)$ is larger on the boundary of the time domain, as there are less observations near 0 and 1. We also evaluate numerically the levels of the obtained confidence bands. For this, 1000 datasets are simulated, the confidence band is estimated for each of them. The empirical confidence level is then evaluated as the proportion of confidence bands that contain the true function over 5000 test timepoints. @tbl-levelL presents the empirical confidence levels for different values of $L$ and several sample sizes $n \in \{40, 150\}$, and $N \in \{10, 25, 60\}$, and for the 3 basis. When $N=25$ and $N=60$, the level is the expected one whatever the value of $L$. We will see in the next sections that this will not be the case for the debiased confidence band. When $N=10$, the level is too small, especially when $L$ is large. This might be due to the the large number of parameters to be estimated in the covariance matrix, with a small number of observations $N$. ```{r } estimator = function(data, basis, L){ ### data: the observations, only the timepoints and the individual functions ### basis: functional basis, to be chosen in 'Fourier', 'Splines' ### L: number of functional basis for the signal if (basis == 'Fourier'){ if (L%%2 == 0){ L = L+1} } B = func.basis(data$time, L, basis) ### Data projection Hi = solve(t(B) %*% B) %*% t(B) %*% data$y mu.hat = rowMeans(Hi) f.hat = c(B %*% mu.hat) data.proj = data data.proj$y = B %*% Hi cov.emp = crossprod(t(data.proj$y - f.hat)) / ncol(data.proj$y) return(list(f.hat=f.hat, mu.hat = mu.hat, cov.emp = cov.emp, data.proj = data.proj)) } compute.c.L = function(data.scl, alpha = 0.05){ ### data.scl: the rescaled data ### alpha: the level for the confidence band x_approx = apply(data.scl$y, 2,FUN=function(yy){ fn = stats::splinefun(x = data.scl$time, y = yy, method = "natural") pracma::fderiv(f = fn, x = data.scl$time, n = 1, h = 1e-6, method = "central") }) N = ncol(data.scl$y) hat.tau = apply(x_approx, 1, stats::sd) hat.tau[which(is.na(hat.tau))] = 0 tau_01 = (hat.tau)[-1]%*%diff(data.scl$time) myfun = function(c){stats::pt(c, lower.tail = FALSE, df = N-1) + tau_01 * (1 + c^2/(N-1))^(-(N-1)/2)/(2 * pi) - alpha/2} c.L = stats::uniroot(f = myfun,interval = c(.5,8))$root return(c.L) } CB1 = function(data, basis, L, alpha = 0.05){ ### data: the observations, only the timepoints and the individual functions ### basis: functional basis, to be chosen in 'Fourier', 'Splines' ### L: number of functional basis for the signal ### alpha: the level for the confidence band est = estimator(data, basis, L) data.proj = est$data.proj data.scl = data.proj data.scl$y = (data.proj$y - est$f.hat)/sqrt(diag(est$cov.emp)) c.L = compute.c.L(data.scl, alpha) d.L = c.L %*% sqrt(diag(est$cov.emp)/ncol(data.scl$y)) f.hat.up = est$f.hat + d.L f.hat.low = est$f.hat - d.L return(list(f.hat.up = f.hat.up, f.hat.low = f.hat.low, d.L = d.L, c.L = c.L, width = max(abs(f.hat.up-f.hat.low)))) } CB.empmean = function(data, alpha=0.05){ ### data: the observations, only the timepoints and the individual functions ### alpha: the level for the confidence band est = rowMeans(data$y) cov.emp = crossprod(t(data$y - est)) / ncol(data$y) data.scl = data data.scl$y = (data$y - est)/sqrt(diag(cov.emp)) c.L = compute.c.L(data.scl, alpha) d.L = c.L %*% sqrt(diag(cov.emp)/ncol(data.scl$y)) f.hat.up = est + d.L f.hat.low = est - d.L return(list(f.hat.up = f.hat.up, f.hat.low = f.hat.low, d.L = d.L, c.L = c.L, width = max(abs(f.hat.up-f.hat.low)))) } d.L.Lmax = function(data, basis, L, Lmax=30, alpha = 0.05){ est.2.f = estimator(data, basis, L) data.2.proj = est.2.f$data.proj est.2.f.Lmax = estimator(data, basis, Lmax) data.2.proj.Lmax = est.2.f.Lmax$data.proj residuals.L.Lmax = data.2.proj.Lmax$y - data.2.proj$y - (est.2.f.Lmax$f.hat - est.2.f$f.hat) cov.emp.rest = crossprod(t(residuals.L.Lmax)) / ncol(residuals.L.Lmax) data.2.scl = data.2.proj data.2.scl$y = residuals.L.Lmax/sqrt(cov.emp.rest) c.L.Lmax.2 = compute.c.L(data.2.scl, alpha) d.L.Lmax.2 = c.L.Lmax.2 %*% sqrt(cov.emp.rest/ncol(data.2.scl$y)) return(c.L.Lmax.2) } ``` ```{r , fig.height = 7, fig.width=7} #| label: fig-bandL #| fig-cap: "Confidence bands for $f^L$. For the three basis families (Fourier, on the left, Legendre, on the middle, and Splines on the right) we display: (top row) the observed functional data; (middle row) the estimated confidence bands for increasing values of L (3, 5, 7, 11 and 15); and (bottom row) the associated bound dL." dfObs = data.frame(Time = double(), basis = character(), f = double(), ind = integer()) dfTrue = data.frame(Time = double(), basis = character(), f = double()) dfTrueProjected = data.frame(Time = double(), basis = character(), f = double(), dimension = character()) dfBand <- data.frame(Time = double(), basis = character(), flow = double(), fup = double(), dimension = character()) df.emp <- data.frame(Time = double(),basis = character(), flow = double(), fup = double()) dfdL <- data.frame(Time = double(), basis = character(), dL = double(), dimension = character()) dfdL.emp <- data.frame(Time = double(), basis = character(), dL = double()) vec.L = c(3,5,7, 11, 15) for (basis in c("Legendre", "Fourier", "Splines")){ N = 25 if (basis == "Splines"){ data = dgp(sd=2*1,alpha.mu = NULL, basis = basis, f.true = NULL) } else {data = dgp(alpha.mu = NULL, basis = basis, f.true = NULL)} dfObs = rbind(dfObs,data.frame(Time = rep(data$time, N), basis = rep(basis, length(c(data$y))), f = c(data$y), ind = rep(1:N, each = length(data$time)))) dfTrue = rbind(dfTrue,data.frame(Time = data$time, basis = rep(basis, length(data$time)), f = data$f.true)) for (L in vec.L){ conf.band = CB1(data, basis, L) conf.band.emp = CB.empmean(data) B = func.basis(data$time, L, basis) f.true.L = t(B %*% solve(t(B) %*% B) %*% t(B) %*% data$f.true) dfBand = rbind(dfBand, data.frame(Time = data$time, basis = rep(basis, length(data$time)), flow = c(conf.band$f.hat.low), fup = c(conf.band$f.hat.up), dimension = as.factor(rep(L, length(data$time))))) dfTrueProjected = rbind(dfTrueProjected, data.frame(Time = data$time, basis = rep(basis, length(data$time)), f = c(f.true.L), dimension = rep(L, length(data$time)))) dfdL = rbind(dfdL, data.frame(Time = data$time, basis = rep(basis, length(data$time)), dL = c(conf.band$d.L), dimension = as.factor(rep(L,length(data$time))))) } } p1 <- ggplot() + geom_line(data = dfObs, aes(x = Time, y = f, group = ind), linewidth = 0.1, color = "grey") + geom_line(data = dfTrue, aes(x = Time, y = f), colour = "black", linewidth = 0.5) + facet_grid(~basis) p2 <- ggplot() + geom_line(data = dfBand, aes(x = Time, y = flow, group = dimension, color = dimension), linewidth = 0.8) + geom_line(data = dfBand, aes(x = Time, y = fup, group = dimension, color = dimension), linewidth = 0.8) + geom_line(data = dfTrueProjected, aes(x = Time, y = f, group = dimension), colour = "black", linewidth = 0.2)+ facet_grid(~basis) + ylab("confidence band") p3 <- ggplot() + geom_line(data = dfdL, aes(x = Time, y = dL, group = dimension, color = dimension)) + facet_grid(~basis) ggarrange(p1, p2, p3, nrow = 3, common.legend = TRUE, legend = "bottom") ``` ```{r} #| label: tbl-levelL #| tbl-cap: "Empirical coverage of CB for different basis families. The empirical confidence level of the confidence band CB is estimated over 1000 repetitions, for alpha = 0.05. Confidence bands are calculated with the Fourier (top), Legendre (middle) and Splines (bottom) basis families. Each row corresponds to a different number of basis functions L, and each column to a different pair of sample sizes (n,N)." library(knitr) library(kableExtra) run = FALSE vec.L = c(3,5,7, 11,15) vec.n = c(40, 150) vec.N = c(10, 25, 60) if (run == TRUE){ nb.repeat = 1000 n.test = 5000 vec.basis = c('Fourier', "Legendre", "Splines") time = seq(min(data$time), max(data$time), length.out = n.test) perf = array(NA, dim = c(max(vec.L),length(vec.n),length(vec.N), length(vec.basis))) for (ind.basis in 1:length(vec.basis)){ for (ind.n in 1:length(vec.n)){ for (ind.N in 1:length(vec.N)){ for (L in vec.L){ cpt = 0 for (rep in 1:nb.repeat){ set.seed(rep) if (vec.basis[ind.basis] == "Splines"){ data = dgp(n=vec.n[ind.n], N = vec.N[ind.N], basis = vec.basis[ind.basis], sd = 2) } else { data = dgp(n=vec.n[ind.n], N = vec.N[ind.N], basis = vec.basis[ind.basis]) } conf.band = CB1(data, vec.basis[ind.basis], L, alpha= 0.05) if (vec.basis[ind.basis] == 'Fourier'){ if (L%%2 == 0){ L = L+1} } B.old = func.basis(data$time, L, vec.basis[ind.basis]) B = func.basis(time, L, vec.basis[ind.basis]) f.true.L = t(B %*% solve(t(B.old) %*% B.old) %*% t(B.old) %*% data$f.true) f.hat.up = stats::spline(x = data$time, y = conf.band$f.hat.up, xout = time, method = "natural")$y f.hat.low = stats::spline(x = data$time, y = conf.band$f.hat.low, xout = time, method = "natural")$y if (sum(f.true.L< f.hat.up)+sum(f.true.L> f.hat.low) == 2*n.test){ cpt = cpt+1 } } perf[L,ind.n, ind.N, ind.basis] = cpt/nb.repeat } } } } save(perf, file = "res_data/Res_Tab1.RData") } else {load("res_data/Res_Tab1.RData") } ### charger les données perf.F = cbind(perf[vec.L,,1,1],perf[vec.L,,2,1],perf[vec.L,,3,1]) perf.L = cbind(perf[vec.L,,1,2],perf[vec.L,,2,2],perf[vec.L,,3,2]) perf.S = cbind(perf[vec.L,,1,3],perf[vec.L,,2,3],perf[vec.L,,3,3]) rownames(perf.F) = vec.L rownames(perf.L) = vec.L rownames(perf.S) = vec.L colnames(perf.F) = c('40/10', '150/10', '40/25', '150/25', '40/60', '150/60') colnames(perf.L) = c('40/10', '150/10', '40/25', '150/25', '40/60', '150/60') colnames(perf.S) = c('40/10', '150/10', '40/25', '150/25', '40/60', '150/60') #kable(perf) perf.F %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n/N" = 6)) perf.L %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n/N" = 6)) perf.S %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n/N" = 6)) ``` ## Method 1: confidence bands by correcting the bias {#sec-simudebias} We illustrate the confidence band of $f^{L^*}$ given in @prp-CBf. In @fig-Lstar, top row, we plot the confidence bands obtained for different dimensions $L \in \{3,5,7, 11,15\}$ with Fourier, Legendre and Splines families and $\alpha=\beta=\sqrt{0.05}\approx 0.22$. We can see that all the confidence bands are alike. Especially, they are unbiased, even for $L=3$. A larger dimension $L$ provides a smoother band. On the middle and bottom rows of @fig-Lstar, we illustrate the two terms that enter the confidence band, $t \mapsto \hat d_1^L(t)$ and $t\mapsto \hat d_2^{L,L_{\max}}(t)$. Their behavior is the same along time. The function $\hat d_1^L(t)$ can be seen as a variance, this is why it is larger near 0 and 1 where there are less observations. The function $\hat d_2^{L,L_{\max}}(t)$ is smaller than $\hat d_1^L(t)$ because it controls the remaining rest after the projection. Note that as expected when $L>L^\varepsilon$, $\hat d_2^{L,L_{\max}}(t)$ is close to 0. As explained before, the influence of $L$ is not the same for the two functions. When $L$ increases, $\hat d_1^L(t)$ increases while $\hat d_2^{L,L_{\max}}(t)$ decreases. ```{r, fig.height = 7, fig.width=7} #| label: fig-Lstar #| fig-cap: "Visualization of confidence bands by correcting the bias. For a given dataset, we plot several confidence bands (top row), functions dL (middle row) and dLLmax (bottom row). Bands and functions are estimated with Fourier (left column), Legendre (middle column) and Spline (right column) basis and several dimensions L (3, 5, 7, 11, 15), and Lmax = 25." CB2 = function(data, basis, L, Lmax = 30, alpha = 0.05, beta = 0.05){ ### data: the observations, only the time-points and the individual functions ### basis: functional basis, to be chosen in 'Fourier', 'Splines' ### L: number of functional basis for the signal ### alpha: the level for the confidence band data.1 = data.2 = data ind.1 = sample(1:dim(data$y)[2], dim(data$y)[2]/2) data.1$y = data$y[,ind.1] data.2$y = data$y[,-ind.1] ### d.L.1 est.1.f = estimator(data.1, basis, L) data.1.proj = est.1.f$data.proj data.1.scl = data.1.proj data.1.scl$y = (data.1.proj$y - est.1.f$f.hat)/sqrt(diag(est.1.f$cov.emp)) c.L.1 = compute.c.L(data.1.scl, alpha) d.L.1 = c.L.1 %*% sqrt(diag(est.1.f$cov.emp)/ncol(data.1.scl$y)) ### d.L.2 est.2.f = estimator(data.2, basis, L) data.2.proj = est.2.f$data.proj est.2.f.Lmax = estimator(data.2, basis, Lmax) data.2.proj.Lmax = est.2.f.Lmax$data.proj residuals.L.Lmax = data.2.proj.Lmax$y - data.2.proj$y - (est.2.f.Lmax$f.hat - est.2.f$f.hat) cov.emp.rest = crossprod(t(residuals.L.Lmax)) / ncol(residuals.L.Lmax) data.2.scl = data.2.proj data.2.scl$y = residuals.L.Lmax/sqrt(diag(cov.emp.rest)) c.L.Lmax.2 = compute.c.L(data.2.scl, beta) d.L.Lmax.2 = c.L.Lmax.2 %*% sqrt(diag(cov.emp.rest)/ncol(data.2.scl$y)) f.hat.up = est.2.f.Lmax$f.hat - est.2.f$f.hat + est.1.f$f.hat + d.L.1 + d.L.Lmax.2 f.hat.low = est.2.f.Lmax$f.hat - est.2.f$f.hat + est.1.f$f.hat - d.L.1 - d.L.Lmax.2 return(list(f.hat.up = f.hat.up, f.hat.low = f.hat.low, d.L = d.L.1, d.L.Lmax = d.L.Lmax.2, width = max(abs(f.hat.up-f.hat.low)))) } set.seed(1) dfTrue = data.frame(Time = double(), f = double(), basis = character() ) dfBandup = data.frame(Time = double(), bandup = double(), basis = character(), dimension = character() ) dfBandlow = data.frame(Time = double(), bandlow = double(), basis = character(), dimension = character() ) dfdL = data.frame(Time = double(), dLLmax = double(), basis = character(), dimension = character() ) dfdLLmax = data.frame(Time = double(), dL = double(), basis = character(), dimension = character() ) vec.L = c(3,5,7,11,15) for (basis in c("Legendre", "Fourier", "Splines")){ if (basis == "Splines"){ data = dgp(sd=2*1, alpha.mu = NULL, basis = basis, f.true = NULL) } else { data = dgp(sd=1, alpha.mu = NULL, basis = basis, f.true = NULL)} dfTrue = rbind(dfTrue, data.frame(Time = data$time, f = data$f.true, basis = rep(basis, length(data$time)))) crit.sel = matrix(NA, ncol = 3, nrow = 22) for (L in vec.L){ conf.band.L.star.2 = CB2(data, basis, L, Lmax = 25, alpha = 0.05) crit.sel[L,] =c(max(conf.band.L.star.2$d.L), max(conf.band.L.star.2$d.L.Lmax), sum(abs(conf.band.L.star.2$d.L+conf.band.L.star.2$d.L.Lmax))) dfBandup = rbind(dfBandup, data.frame(Time = data$time, bandup = t(conf.band.L.star.2$f.hat.up), basis = rep(basis, length(data$time)), dimension = as.factor(rep(L, length(data$time))))) dfBandlow = rbind(dfBandlow, data.frame(Time = data$time, bandlow = t(conf.band.L.star.2$f.hat.low), basis = rep(basis, length(data$time)), dimension = as.factor(rep(L, length(data$time))))) dfdL = rbind(dfdL, data.frame(Time = data$time, dL = t(conf.band.L.star.2$d.L), basis = rep(basis, length(data$time)), dimension = as.factor(rep(L, length(data$time))))) dfdLLmax = rbind(dfdLLmax, data.frame(Time = data$time, dLLmax = t(conf.band.L.star.2$d.L.Lmax), basis = rep(basis, length(data$time)), dimension = as.factor(rep(L, length(data$time))))) } } p1 <- ggplot()+ geom_line(data = dfTrue, aes(x = Time, y = f)) + geom_line(data = dfBandup, aes(x = Time, y = bandup, group = dimension, color = dimension), linewidth = 0.4) + geom_line(data = dfBandlow, aes(x = Time, y = bandlow, group = dimension, color = dimension), linewidth = 0.4) + facet_grid(~ basis) p2 <- ggplot() + geom_line(data = dfdL, aes(x = Time, y = dL, group = dimension, color = dimension), linewidth = 0.4) + facet_grid( ~ basis) + ylab("d.L") p3 <- ggplot() + geom_line(data = dfdLLmax, aes(x = Time, y = dLLmax, group = dimension, color = dimension), linewidth = 0.4) + facet_grid( ~ basis) + ylab("d.L.Lmax") ggarrange(p1, p2, p3, nrow = 3, common.legend = TRUE, legend = "bottom") ``` In @tbl-levLstar, we simulate 1000 repeated datasets with the Legendre family and with two sample sizes $n=40$ and $n=150$ and $N=25$. For each dataset, we compute the confidence band defined in @prp-CBf with a theoretical confidence level of $1-\alpha\beta=0.95$ and for different values of $L$. Then the confidence level is approximated as the proportion of confidence bands containing the true function $f$ over 5000 test timepoints. Remark that when $L<L^{\varepsilon}$, the level is the expected one, that is 0.95. When $L>L^{\varepsilon}$, the level is no more ensured, as explained before. Indeed the term $d^{L,L^{\max}}$ is mainly equal to 0 when $L^{\max}$ is large enough, and the level is close to $1-\alpha$ instead of $1-\alpha\beta$. This is not the case for the band in @sec-bandSun, as this is due to the correction of the bias. ```{r} #| label: tbl-levLstar #| tbl-cap: "Empirical coverage of CB2. The table reports the empirical level of confidence of the proposed confidence band CB2, computed for various values of L (rows) and n (columns), with fixed N=25, and various basis (top: Fourier, middle: Legendre, bttom: Splines). The nominal confidence level is set to 0.05, using parameters alpha=beta=0.2, such that the combined coverage is 1-0.95." run = FALSE table_list = list() if (run == TRUE){ nb.repeat = 1000 n.test = 5000 vec.L = c(3,5,7,11,15) vec.n = c(40, 150) time = seq(min(data$time), max(data$time), length.out = n.test) for (basis in c( "Fourier", "Legendre", "Splines")){ perf = matrix(NA, nrow = max(vec.L)+1, ncol = 2) for (ind.n in 1:length(vec.n)){ for (L in vec.L){ cpt = 0 for (rep in 1:nb.repeat){ set.seed(rep) if (basis == "Splines"){ data = dgp(sd=2*1, alpha.mu = NULL, basis = basis, f.true = NULL) } else { data = dgp(sd=1, alpha.mu = NULL, basis = basis, f.true = NULL)} conf.band.L.star.2 = CB2(data, basis, L, Lmax = 25, alpha = sqrt(0.05), beta=sqrt(0.05)) f.true = stats::spline(x = data$time, y = data$f.true, xout = time, method = "natural")$y f.hat.up = stats::spline(x = data$time, y = conf.band.L.star.2$f.hat.up, xout = time, method = "natural")$y f.hat.low = stats::spline(x = data$time, y = conf.band.L.star.2$f.hat.low, xout = time, method = "natural")$y if (sum(f.true< f.hat.up)+sum(f.true> f.hat.low) == 2*n.test){ cpt = cpt+1} } perf[L,ind.n] = cpt/nb.repeat } } perf = perf[vec.L,] rownames(perf) = vec.L colnames(perf) = vec.n save(perf, file = paste0("res_data/Res_Tab2_", basis, ".RData")) kbl_obj = perf %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n" = 2)) table_list[[basis]] <- kbl_obj } } load(paste0("res_data/Res_Tab2_Fourier.RData")) perf.F = perf load(paste0("res_data/Res_Tab2_Legendre.RData")) perf.L = perf load(paste0("res_data/Res_Tab2_Splines.RData")) perf.S = perf perf.F %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n" = 2)) perf.L %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n" = 2)) perf.S %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n" = 2)) ``` We illustrate the different terms involved in @eq-def-Lhat in @fig-dLLmax: we plot for a given dataset, the infinity norm of the width of the band $\hat d^L(t) + \hat d^{L,L_{\max}}(t)$ (top), of $\hat d^L(t)$ (middle) and $\hat d^{L,L_{\max}}(t)$ (bottom) functions obtained with the Fourier (left column), Legendre (middle column) and Spline (right column) basis. As already said, $\|\hat d^L\|_{\infty}$ increases with $L$ while $\|\hat d^{L,L_{\max}}\|_{\infty}$ decreases (and is zero when $L>L^{\varepsilon}$). The width of the band wrt $L$ does not have a $U$-shape, as expected. It is thus difficult to minimize this criterion and the selection of $\hat L$ is thus not stable. But again, whatever the value of $\hat L$, the corresponding band is debiased in the collection. We will see in the next sections that its width is smaller than standard approaches. The performance of the selection is also illustrated in the next section. ```{r, out.height = "50%", fig.width=7} #| label: fig-dLLmax #| fig-cap: "Norm of the confidence band width and associated bounds. For a given dataset, we compute: (top row) the norm of the width of the confidence band; (middle row) the norm of the function dL; and (bottom row) the norm of the function dLLmax, for various dimensions L. Results are shown for the Fourier (left column), Legendre (middle column), and Splines (right column) basis families." df = data.frame(Norm = double(), type = character(), basis = character(), dimension = integer() ) for (basis in c("Legendre", "Fourier", "Splines")){ if (basis == "Splines"){data = dgp(basis = basis, sd = 2) } else {data = dgp(basis = basis)} vec.L = seq(3,15, by=1) if (basis == 'Fourier'){vec.L = seq(3,15, by=2)} L.max = 20 length.band = d.L = d.L.Lmax = NULL for (L in vec.L){ conf.band.L.star.2 = CB2(data, basis, L, Lmax = L.max, alpha = sqrt(0.05), beta = sqrt(0.05)) length.band = c(length.band, conf.band.L.star.2$width) d.L = c(d.L, max(conf.band.L.star.2$d.L)) d.L.Lmax = c(d.L.Lmax, max(conf.band.L.star.2$d.L.Lmax)) } df = rbind(df, data.frame (Norm = c(length.band, d.L, d.L.Lmax), type = rep(c("Band width norm", "d.L norm", "d.L.Lmax norm"), each = length(vec.L)), basis = rep(basis, 3* length(vec.L)), dimension = rep(vec.L, 3))) } ggplot(df, aes(x = dimension, y = Norm, fill = type)) + geom_line() + facet_grid(type ~ basis, scales = 'free') + xlab("L") ``` ## Method 2: a model selection criterion to take into account the bias {#sec-simumethod2} In @fig-critlast, we illustrate the behavior of the selection criterion introduced in @sec-modsel on simulated data, with $\lambda\in\{0, 0.5, 1, 2\}$ for the three basis. We can see that $\tilde L$ is overestimated. When considering nested spaces, it ensures that $\tilde L$ tends to be larger than $L^*$ and thus the confidence band is automatically unbiased. ```{r, fig.height = 3, fig.width=7} #| label: fig-critlast #| fig-cap: "Behavior of selection criteria as a function of model dimension. For a given simulated dataset, we display the evolution of the selection criteria with respect to the dimension L, for several values of the regularization parameter lambda. Results are shown for the Fourier (left), Legendre (middle), and Splines (right) basis families." #| L.max = 20 dfCriterion = data.frame(criterion = double(), dimension = integer(), basis = character()) for (basis in c("Legendre","Fourier", "Splines")){ vec.L = seq(3,13, by=1) if (basis == 'Fourier'){ vec.L = seq(3,13, by=2) } length.band = c.L = c.L.Lmax = c() if (basis == "Splines"){ data = dgp(basis = basis, L.eps = 15, sd=2) } else {data = dgp(basis = basis, L.eps = 15, sd = 1)} conf.band.L.max = CB1(data, basis, L = L.max, alpha = 0.05) c.Lmax = conf.band.L.max$c.L for (L in vec.L){ conf.band.L = CB1(data, basis, L, alpha = (0.05)) length.band[L] = conf.band.L$width c.L[L] = max(conf.band.L$c.L) } for (lambda in c(0, 0.5, 1, 2)){ crit = abs(rep(c.Lmax, length(vec.L)) - c.L[vec.L]) + lambda * vec.L/((dim(data$y)[1])) dfCriterion = rbind(dfCriterion, data.frame(criterion = crit, dimension = vec.L, lambdas = lambda, basis = rep(basis, length(vec.L)))) } } ggplot(dfCriterion, aes(x = dimension, y = criterion, color = lambdas)) + geom_point()+ facet_grid(~basis) ``` ## Comparison of the methods with the state-of-the-art {#sec-simumodsel} We evaluate the performance of the two selection criteria introduced in this paper, and compare the two strategies $CB_2$ and $CB_3$ with some standard approaches. More precisely, we simulate 1000 repeated datasets. The different confidence bands and the norm of their widths are computed for several $L$. We apply the selection criteria and plot the distribution of the estimated dimension in @fig-modelsel, for the three basis families, for several model selection criteria: $\hat L$, $\tilde L$, cross validation and hard thresholding. The dimension $\hat L$ is almost always larger than the true $L^*=7$. The fact that it is larger is not a problem because the selected band is unbiased and has the correct level as soon as $L^\varepsilon$ is large. However, the criterion tends to select a band that is (too) smooth. We can see that the selected dimension $\tilde{L}$ is smaller in distribution, and closer to the true value than $\hat{L}$. In addition, as we then use the confidence band of @sec-bandSun, the confidence level is guaranteed to be as expected. The model selected by cross validation is rather good, for all the basis considered. On the other hand, the model selected by hard thresholding is not good, particularly for a non orthonormal basis. ```{r, fig.height = 7, fig.width=7} #| label: tbl-levLstar2 #| tbl-cap: "Comparison with the state-of-the-art for the empirical confidence level. The table reports the performance of the proposed confidence band CB3, estimated over 1000 repetitions, for various model selection criteria and competitive methods (rows) and for various basis families (columns). Results are shown for alpha=0.05, with n=40 and N=25." run = FALSE if (run == TRUE){ library(ffscb) repet = 1000 length.CBL.CV = length.CBL.L0 = length.CBL.ffscb = length.emp.mean = c(NA) dfLengthSelect.CV = data.frame(length = double(), basis = character()) dfLhatSelect.CV = dfLhatSelect.L0 = data.frame(Lhat = double(), L = integer(), basis = character()) dfBandWidth = data.frame(Band.width = double(), basis = character(), dimension = integer()) dfLhat = data.frame( L = integer(), selected = integer(), basis = character(), dimension = character()) ind.basis = 0 cpt = matrix(0, nrow = 8, ncol = 3) for (basis in c("Fourier", "Legendre", "Splines")){ ind.basis = ind.basis + 1 vec.L = seq(3,13, by=1) mod.sel = mod.sel.L0 = mod.sel.CV = coeff.L0 = err = conf.band.ffscb = conf.band.emp.mean = c() if (basis == 'Fourier'){ vec.L = seq(3,13, by=2)} mod.sel = matrix(NA, nrow = repet, ncol = 5) L.mod.sel = c.L = c() Lmax = 25 ind = c() for (rep in 1:repet){ set.seed(rep) if (basis == "Splines"){data = dgp(basis = basis, sd = 2) } else {data = dgp(basis = basis)} length.band = NULL data.train = data.test = data data.train$y = data$y[, 1:(dim(data$y)[2]/2)] data.test$y = data$y[,(dim(data$y)[2]/2+1) : (dim(data$y)[2])] tot.VC <- rep(1:10, each = dim(data.train$y)[2] / 10) err <- sapply(vec.L, function(L) { mean(sapply(1:10, function(VC) { train <- data.train train$y <- train$y[,-which(tot.VC == VC)] test <- data.train test$y <- test$y[, which(tot.VC == VC)] est.L <- estimator(train, basis, L) sqrt(mean((test$y - est.L$f.hat)^2)) })) }) for (L in vec.L){ CB2.L = CB2(data, basis = basis, L, Lmax = Lmax, alpha = sqrt(0.05), beta = sqrt(0.05)) length.band = c(length.band, CB2.L$width) conf.band.L = CB1(data, basis, L, alpha = (0.05)) c.L[L] = max(conf.band.L$c.L) est.L = estimator(data.train, basis, L) coeff.L0[L] = sum(abs(est.L$mu.hat) > 0.1) } mod.sel[rep,1:2] = c(min(length.band), vec.L[which.min(length.band)]) CB2.Lhat = CB2(data, basis = basis, mod.sel[rep,2], Lmax = Lmax, alpha = sqrt(0.05), beta = sqrt(0.05)) conf.band.L.max = CB1(data, basis, L = Lmax, alpha = 0.05) c.Lmax = conf.band.L.max$c.L mod.sel[rep,3] = conf.band.L.max$width conf.band.L.star = CB1(data, basis, L = 7, alpha = 0.05) mod.sel[rep,4] = conf.band.L.star$width lambda = 1 crit = abs(rep(c.Lmax, length(vec.L)) - c.L[vec.L]) + lambda * vec.L/((dim(data$y)[1])) L.mod.sel[rep] = vec.L[which.min(crit)] CB3 = CB1(data, basis, L = L.mod.sel[rep], alpha = 0.05) mod.sel[rep,5] = CB3$width mod.sel.CV[rep] = vec.L[which.min(err)] conf.band.CV = CB1(data.test, basis, mod.sel.CV[rep], alpha = (0.05)) length.CBL.CV[rep] = conf.band.CV$width mod.sel.L0[rep] = max(coeff.L0, na.rm=TRUE) conf.band.L0 = CB1(data.test, basis, mod.sel.L0[rep], alpha = (0.05)) length.CBL.L0[rep] = conf.band.L0$width N = dim(data$y)[2] mod.ffscb.test <- confidence_band(x = rowMeans(data$y), cov.x = crossprod(t(data$y - rowMeans(data$y))) / (N^2), tau = tau_fun(data$y), df = N-1, type = c("FFSCB.t"), conf.level = 0.95, n_int = 3) conf.band.ffscb$f.hat.up = mod.ffscb.test[,2] conf.band.ffscb$f.hat.low = mod.ffscb.test[,3] length.CBL.ffscb[rep] = max(abs(conf.band.ffscb$f.hat.up - conf.band.ffscb$f.hat.low)) conf.band.emp.mean = CB.empmean(data) length.emp.mean[rep] = max(abs(conf.band.emp.mean$f.hat.up - conf.band.emp.mean$f.hat.low)) ## coverage n.test = 5000 time = seq(min(data$time), max(data$time), length.out = n.test) f.true = stats::spline(x = data$time, y = data$f.true, xout = time, method = "natural")$y models = list(conf.band.L.star, conf.band.L.max, CB2.Lhat, CB3, conf.band.L0, conf.band.CV, conf.band.emp.mean, conf.band.ffscb) for (mod in 1:8){ f.hat.up = stats::spline(x = data$time, y = models[[mod]]$f.hat.up, xout = time, method = "natural")$y f.hat.low = stats::spline(x = data$time, y = models[[mod]]$f.hat.low, xout = time, method = "natural")$y if (sum(f.true< f.hat.up)+sum(f.true> f.hat.low) == 2*n.test){cpt[mod,ind.basis] = cpt[mod,ind.basis] +1 } } } dfBandWidth = rbind(dfBandWidth, data.frame(Band.width = c(mod.sel[ ,1], mod.sel[ ,3], mod.sel[ ,4], mod.sel[ ,5], length.CBL.CV,length.CBL.L0, length.CBL.ffscb,length.emp.mean),basis = rep(basis, repet*8), dimension = rep(c("Lhat","Lmax", "Lstar", "Ltilde", "L.CV", "L.L0", "FFSCB", "Mean"), each = repet))) dfLhat = rbind(dfLhat, data.frame( L = rep(vec.L, 4), selected = c(matrix(table(factor(mod.sel[,2], levels = vec.L))), matrix(table(factor(L.mod.sel, levels = vec.L))), matrix(table(factor(mod.sel.CV, levels = vec.L))), matrix(table(factor(mod.sel.L0, levels = vec.L)))), basis = rep(basis, 4*length(vec.L)), dimension = rep(c("Lhat", "Ltilde", "L.CV", "L.L0"), each = length(vec.L)))) } cpt = cpt/repet save(cpt, file = "res_data/Res_Tab4.RData") save(dfBandWidth, file = "res_data/Res_Tab3_BandWidth.RData") save(dfLhat, file = "res_data/Res_Tab3_Lhat.RData") } else { load('res_data/Res_Tab4.RData')} row.names(cpt) = c("Lstar", "Lmax", "Lhat","Ltilde", "L.L0", "L.CV", "Mean", "FFSCB") cpt %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("model" = 1, "Fourier" = 1, "Legendre" = 1, "Splines" = 1)) ``` ```{r, out.height = "45%", fig.width=10} #| label: fig-modelsel #| fig-cap: "Distribution of the selected model dimension across selection methods. Based on 1000 simulated datasets, we report the distribution of the estimated dimension L for four model selection methods: from top to bottom — the debiased confidence band CB2 with Lhat, the band CB3 with Ltilde , cross-validation, and hard-thresholding. The true model dimension is Lstar = 7." load('res_data/Res_Tab3_Lhat.RData') dfLhat$dimension = factor(dfLhat$dimension, levels = c('Lhat', 'Ltilde', 'L.CV', 'L.L0')) ggplot(data = dfLhat, aes(x = L, y = selected))+ geom_bar(stat="identity")+ facet_grid(dimension~basis) ``` The reformulation of the band around $\underline{\hat f}_2^{ L_{\max}, L^*}$ is close to the band presented in @sec-bandSun for $L=L_{\max}$, that is a band centered around $\underline{\hat f}^{ L_{\max}, L^*}$. A natural question is to understand what is the gain by doing so instead of using the band from @sec-bandSun with $L=L_{\max}$, namely the band $\left [\underline{\hat f}^{ L_{\max}, L^*}(t)-\hat d^{L_{\max}}(t); \underline{\hat f}^{ L_{\max}, L^*}(t)+\hat d^{L_{\max}}(t)\right]$. To do that, we have to understand the behavior of the different terms. As it is difficult to compare theoretically the width of the two bands, we compare them using simulations. For 1000 repeated datasets, we compute several confidence bands: the $CB_1$ band constructed in @sec-bandSun with $L_{\max}$, the $CB_{2}$ band defined in @prp-CBf with $\hat L$, the $CB_{3}$ band defined in @sec-modsel with $\tilde L$ and the ideal (and not accessible) band constructed in @sec-bandSun with the true $L^*$. In @fig-width-2, we present boxplots of the norms of the band width with $\hat L$, $L_{\max}$, $L^*$ and $\tilde{L}$. We also use the empirical mean and the method FFSCB @Liebl2019. The width of the confidence band with the true $L^*$ is smaller, which is expected but unfortunately not achievable. The width of the confidence band $CB_1$ is smaller than that of the band $CB_2$. This can be explained by the fact that we estimate two different quantities, on smaller datasets, for more conservative levels ($1-\alpha$ and $1-\beta$ respectively) in order to finally achieve the confidence level of $1-\alpha\beta$. This also explains why the cross validation and hard-thresholding methods, which also divide the sample into two parts, do not give good results either. The model given by the heuristic model selection criterion $\widetilde{crit}$ achieves good performance. Note that the width of the selected model $\tilde L$ is better than the width of the confidence band with a large level $L_{\max}$, which one should have used to avoid model selection. The empirical mean and the band given by FFSCB are a bit larger. ```{r, out.height = "25%", fig.width=10} #| label: fig-width-2 #| fig-cap: "Width of the confidence bands accross selection methods. Boxplots show the distribution of the confidence band width over 1000 repetitions, for the dimension selected by the various criteria introduced in this paper (as well as standard baselines). Results are shown for the Fourier (left), Legendre (middle), and Splines (right) basis families." load('res_data/Res_Tab3_BandWidth.RData') dfBandWidth$dimension <- factor(dfBandWidth$dimension, levels = c('Lstar','Lmax','Lhat','Ltilde','L.CV','L.L0', 'FFSCB', 'Mean')) ggplot(dfBandWidth, aes(x = dimension, y = Band.width)) + geom_boxplot() + facet_wrap(~basis, scales = "free_y")+ theme(axis.text.x = element_text(angle = 45, hjust = 1)) ``` ## Generalization out of the model {#sec-gen} We now illustrate the behaviour of the bands when the basis used for estimation is poorly specified. We simulate 1000 data sets with a spline basis and estimate the confidence bands with the Fourier and Legendre basis, for different values of $n$ and $N$. The coverage rates are presented in @tbl-levelL-2. The Fourier basis does not give a correct rate. On the other hand, the Legendre basis gives very satisfactory coverage rates for $L>11$. ```{r} #| label: tbl-levelL-2 #| tbl-cap: "Empirical coverage of confidence bands under model misspecification. The empirical confidence level is estimated over 1000 repetitions. Data are generated using a Splines basis, while confidence bands are computed using the Fourier (top) and Legendre (bottom) basis families. Each row corresponds to a different value of L, and each column to a different pair of sample sizes (n,N)." library(knitr) library(kableExtra) run = FALSE vec.L = c(5,7, 11,15) if (run == TRUE){ nb.repeat = 1000 n.test = 5000 vec.n = c(40, 150) vec.N = c(10, 25) basis = "Splines" basis.2 = c('Fourier', 'Legendre') time = seq(min(data$time), max(data$time), length.out = n.test) perf = array(NA, dim = c(max(vec.L),2,2,2)) for (ind.n in 1:length(vec.n)){ for (L in vec.L){ for (ind.N in 1:length(vec.N)){ for (ind.basis in 1:2){ cpt = 0 for (rep in 1:nb.repeat){ set.seed(rep) if (basis == "Splines"){ data = dgp(n=vec.n[ind.n], N = vec.N[ind.N], basis = basis, sd = 2) } else {data = dgp(n=vec.n[ind.n], N = vec.N[ind.N], basis = basis)} conf.band = CB1(data, basis.2[ind.basis], L) if (basis == 'Fourier'){ if (L%%2 == 0){ L = L+1} } B.old = func.basis(data$time, L = 7, basis) B = func.basis(time, L = 7, basis) f.true = t(B %*% solve(t(B.old) %*% B.old) %*% t(B.old) %*% data$f.true) f.hat.up = stats::spline(x = data$time, y = conf.band$f.hat.up, xout = time, method = "natural")$y f.hat.low = stats::spline(x = data$time, y = conf.band$f.hat.low, xout = time, method = "natural")$y if (sum(f.true< f.hat.up)+sum(f.true> f.hat.low) == 2*n.test){ cpt = cpt+1 } } perf[L,ind.n, ind.N, ind.basis] = cpt/nb.repeat } } } } save(perf, file = "res_data/Res_Tab_other.RData") } else {load("res_data/Res_Tab_other.RData") } ### charger les données perf.Fourier = cbind(perf[vec.L,,1,1],perf[vec.L,,2,1]) rownames(perf.Fourier) = vec.L colnames(perf.Fourier) = c(" 50/10", "150/10", "50/40", "150/40") perf.Fourier %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n/N" = 4)) pref.Legendre = cbind(perf[vec.L,,1,2],perf[vec.L,,2,2]) rownames(pref.Legendre) = vec.L colnames(pref.Legendre) = c(" 50/10", "150/10", "50/40", "150/40") pref.Legendre %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n/N" = 4)) ``` Next, we illustrate the $\tilde L$ dimension selection method and compare it to the cross-validation method. @tbl-levelL-3 presents the coverage rates of the corresponding confidence bands estimated with the Fourier and Legendre basis, in the case $N=25$ and $n\in \{40,150\}$. Once again, we see that the Fourier basis does not give good results, either by cross-validation or by $\tilde L$. On the other hand, with the Legendre basis, the $\tilde L$ method gives a satisfactory coverage rate, even if it is underestimated, whereas the cross-validation method is very poor. Moreover, the widths of the confidence bands selected with $\tilde L$ and by cross validation are represented by boxplot in @fig-width-3. It can be seen that the cross-validation approach provides wider bands, even though their confidence level is not guaranteed. The method proposed in this paper provides a narrower band with a correct level of confidence. We thus recommend to use the Legendre family with the criteria $\tilde L$. ```{r} #| label: tbl-levelL-3 #| tbl-cap: "Empirical coverage of confidence bands across selection methods under model misspecification. The empirical confidence level is estimated over 1000 repetitions. Data are generated using a Splines basis, while confidence bands are computed using the Fourier (top) and Legendre (bottom) basis families. The rows correspond to different model selection criteria and dimensions L, and the columns to various combinations of sample sizes (n,N)." library(knitr) library(kableExtra) run = FALSE vec.L = c(5,7, 11,15) vec.n = c(40, 150) if (run == TRUE){ nb.repeat = 1000 n.test = 5000 vec.N = c(25) basis = "Splines" basis.2 = c('Fourier', 'Legendre') time = seq(min(data$time), max(data$time), length.out = n.test) perf.sel = array(0, dim = c(2,2,1,2)) err = coeff.L0 = c() L.mod.sel = mod.sel.CV = length.CBL.CV = c() mod.sel = matrix(NA, nrow = nb.repeat, ncol = 2) width = array(NA, dim=c(2, nb.repeat)) dfBandWidth = data.frame(Band.width = double(), basis = character(), dimension = integer()) for (ind.n in 1:length(vec.n)){ for (ind.N in 1:length(vec.N)){ for (ind.basis in 1:2){ cpt = 0 for (rep in 1:nb.repeat){ set.seed(rep) if (basis == "Splines"){ data = dgp(n=vec.n[ind.n], N = vec.N[ind.N], basis = basis, sd = 2) } else {data = dgp(n=vec.n[ind.n], N = vec.N[ind.N], basis = basis)} data.train = data data.train$y = data$y[, 1:(dim(data$y)[2]/2)] data.test = data data.test$y = data$y[,(dim(data$y)[2]/2) : (dim(data$y)[2])] tot.VC <- rep(1:10, each = dim(data.train$y)[2] / 10) err <- sapply(vec.L, function(L) { mean(sapply(1:10, function(VC) { train <- data.train train$y <- train$y[,-which(tot.VC == VC)] test <- data.train test$y <- test$y[, which(tot.VC == VC)] est.L <- estimator(train, basis, L) sqrt(mean((test$y - est.L$f.hat)^2)) })) }) for (L in vec.L){ CB2.L = CB2(data, basis = basis, L, Lmax = 25, alpha = sqrt(0.05), beta = sqrt(0.05)) length.band = c(length.band, CB2.L$width) conf.band.L = CB1(data, basis, L, alpha = (0.05)) c.L[L] = max(conf.band.L$c.L) } mod.sel[rep,1:2] = c(min(length.band), vec.L[which.min(length.band)]) lambda = 1 crit = abs(rep(c.Lmax, length(vec.L)) - c.L[vec.L]) + lambda * vec.L/((dim(data$y)[1])) L.mod.sel[rep] = vec.L[which.min(crit)] CB3 = CB1(data, basis = basis.2[ind.basis], L = L.mod.sel[rep], alpha = 0.05) mod.sel.CV[rep] = vec.L[which.min(err)] conf.band.CV = CB1(data.test, basis = basis.2[ind.basis], mod.sel.CV[rep], alpha = (0.05)) ## coverage n.test = 5000 time = seq(min(data$time), max(data$time), length.out = n.test) B.old = func.basis(data$time, L = 7, basis) B = func.basis(time, L = 7, basis) f.true = t(B %*% solve(t(B.old) %*% B.old) %*% t(B.old) %*% data$f.true) models = list(CB3, conf.band.CV) width[1,rep] = CB3$width width[2,rep] = conf.band.CV$width for (mod in 1:2){ f.hat.up = stats::spline(x = data$time, y = models[[mod]]$f.hat.up, xout = time, method = "natural")$y f.hat.low = stats::spline(x = data$time, y = models[[mod]]$f.hat.low, xout = time, method = "natural")$y if (sum(f.true< f.hat.up)+sum(f.true> f.hat.low) == 2*n.test){perf.sel[mod,ind.n, ind.N, ind.basis] = perf.sel[mod,ind.n, ind.N, ind.basis] +1} } } dfBandWidth = rbind(dfBandWidth, data.frame(Band.width = c(width[1,], width[2,]), basis = rep(basis.2[ind.basis], nb.repeat*2), dimension = rep(c("L tilde", "L.CV"), each = nb.repeat))) } } } perf.sel = perf.sel / nb.repeat save(perf.sel, file = "res_data/Res_Tab_other_2.RData") save(dfBandWidth, file = "res_data/Res_Tab_other_2_width.RData") } else {load("res_data/Res_Tab_other_2.RData") } ### charger les données perf.Fourier = matrix(perf.sel[,,1,1], nrow=2, ncol=2) rownames(perf.Fourier) = c('L tilde', 'L.CV') perf.Fourier %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n=40, N=25" = 1, "n=150, N=25" = 1)) pref.Splines = matrix(perf.sel[,,1,2], nrow=2, ncol=2) rownames(pref.Splines) = c('L tilde', 'L.CV') pref.Splines %>% kable("latex", booktabs = TRUE) %>% add_header_above(c("L" = 1, "n=40, N=25" = 1, "n=150, N=25" = 1)) ``` ```{r, out.height = "25%", fig.width=7} #| label: fig-width-3 #| fig-cap: "Width of confidence bands with model selection under model misspecification. The average width of the confidence bands is evaluated over 1000 repetitions. Data are generated using a Splines basis, while bands are computed with the Fourier (top) and Legendre (bottom) basis families. Rows correspond to combinations of model selection criteria and dimensions L, and columns to different sample size pairs (n,N)." load("res_data/Res_Tab_other_2_width.RData") ggplot(dfBandWidth, aes(x = dimension, y = Band.width)) + geom_boxplot() + facet_wrap(~basis, scales = "free_y") ``` # Real data analysis {#sec-realdata} In this section, we illustrate the proposed method on the Berkeley Growth Study data. It consists of the heights in centimeters of 39 boys at 31 ages from 1 to 18. We approximate these curves by the 3 basis Legendre, Splines and Fourier. We select the level of each basis using the method introduced in @sec-modsel. ```{r, out.height = "25%", fig.width=7} #| label: fig-realdata #| fig-cap: "Real data analysis example. We display the confidence bands for Fourier (left), Legendre (middle) and Splines (right) basis on the Berkeley Growth Study data. Black curves correspond to the confidence bands with $L_{max}$, while colored one are the confidence bands with L tilde. " #| data(growth) data = list() data$time = (growth$age - min(growth$age))/(max(growth$age) - min(growth$age)) data$y = growth$hgtm L.max = 25 vec.L = 3:24 alpha = 0.2 dfBandlow = dfBandup = df.max = res_growth = c() for (basis in c("Legendre", "Splines", "Fourier")){ mod.sel= c() if (basis == 'Fourier'){vec.L = seq(3,24, by=2)} c.L = c.L.Lmax = c() conf.band.L.max = CB1(data, basis, L = L.max, alpha = alpha) c.L.Lmax = max(conf.band.L.max$c.L) df.max = rbind(df.max, data.frame(Time = growth$age, band.up = t(conf.band.L.max$f.hat.up), band.low = t(conf.band.L.max$f.hat.low), basis = rep(basis, length(data$time)))) for (L in vec.L){ conf.band.L = CB1(data, basis, L, alpha = (alpha)) c.L[L] = max(conf.band.L$c.L) } lambda = 1 crit = abs(rep(c.Lmax, length(vec.L)) - c.L[vec.L]) + lambda * vec.L/((dim(data$y)[1])) mod.sel = vec.L[which.min(crit)] #print(paste0('model selected:', mod.sel)) conf.band.L.star.2 = CB1(data, basis, mod.sel, alpha = alpha) dfBandup = rbind(dfBandup, data.frame(Time = growth$age, bandup = t(conf.band.L.star.2$f.hat.up), basis = rep(basis, length(data$time)))) dfBandlow = rbind(dfBandlow, data.frame(Time = growth$age, bandlow = t(conf.band.L.star.2$f.hat.low), basis = rep(basis, length(data$time)))) #print(paste0('length L sel:', conf.band.L.star.2$c.L)) res_growth = round(cbind(res_growth, c(conf.band.L.max$c.L, conf.band.L.star.2$c.L, mod.sel)),2) } colnames(res_growth) = c("Legendre", "Splines", "Fourier") row.names(res_growth) = c("Width Lmax", "Width selected", "Model selected") df.data = data.frame(Time = rep(growth$age,dim(data$y)[2]), f = c(data$y), ind = rep(1:(dim(data$y)[2]), each = dim(data$y)[1])) p1 <- ggplot()+ geom_line(data=df.data, aes(x=Time, y =f, group=ind), color = "lightgrey",linewidth= 0.5)+ geom_line(data = dfBandup, aes(x = Time, y = bandup,col = basis, group = basis), linewidth = 0.4) + geom_line(data = dfBandlow, aes(x = Time, y = bandlow,col = basis, group = basis), linewidth = 0.4) + geom_line(data = df.max, aes(x=Time, y = band.up, group=basis), linewidth = 0.4) + geom_line(data = df.max, aes(x=Time, y = band.low, group=basis), linewidth = 0.4) + facet_grid(~ basis) p1 ``` ```{r, out.height = "25%", fig.width=7} #| label: tbl-realdata #| tbl-cap: "Real data analysis example, Berkeley Growth Study data. We display the width of the confidence bands for Fourier, Legendre and Splines basis for the confidence band of @sec-bandSun with Lmax and the confidence band of @sec-modsel. We also precise the dimension of the selected model." res_growth %>% kable("latex", booktabs = TRUE) %>% add_header_above(c(" " = 1, "Basis" = 3)) ``` In @fig-realdata, we display the confidence bands associated with @sec-bandSun in black and those associated with @sec-modsel, for the three basis. As the data is not periodic, the Fourier basis is meaningless, as is the associated confidence band, whatever the dimension considered. Splines and Legendre basis give similar confidence bands. Analyzing the width of the bands in @tbl-realdata, compared to that obtained with $L_{\max}$, we find that they are less smooth but also smaller, and from our empirical study we guess that it makes a trade-off between bias and variance. # Conclusion {#sec-conc} This paper discusses the construction of confidence bands when considering a functional model. Depending on the nature of the family (an orthogonal or orthonormal basis, or simply a vector space), the theoretical guarantees of the linear estimator are recalled and illustrated. Several confidence bands are then proposed. An extensive experimental study on Fourier, Legendre, and Spline basis illustrates the theoretical and methodological propositions, and a real data study is proposed to conclude the paper. First, when considering a functional family with fixed dimension, we discuss the confidence band derived from @sun1994. It is biased if the dimension is not high enough to approximate well the true function. We then propose a new confidence band that corrects this bias. To do this, the bias is estimated and the additional randomness is controlled. A selection criterion is proposed to select the best dimension. Unfortunately, the two types of randomness lead to a wider confidence band, and this result is therefore no more interesting than the naive one, which consists of taking the largest possible dimension $L_{\max}$. Finally, a heuristic selection criterion is proposed to select the dimension on the first confidence band, which has not corrected the bias. It takes into account the bias as well as the variance, to select a moderate dimension. Numerical experiments show that this criterion, combined with the Legendre basis, achieves the best performance when considering the confidence level and the width of the corresponding simultaneous confidence band. An interesting next step, but out of the scope of this paper, is a theoretical study of this criterion. No result, to our knowledge, exists for confidence bands with the supremum norm. The Euclidean norm has been extensively studied, but is not of interest here, where we want to ensure that the tube is valid as a whole. The supremum norm, on the other hand, is difficult to study theoretically. Furthermore, a key point here is the randomness of the criterion, which must also be taken into account, through an oracle inequality for example. # References {.unnumbered} ::: {#refs} ::: # Appendix: proofs ## Proof of @prp-error Let us prove the first point. We have $$ \mathbb{E}(\hat{\underline{\mu}}^{L}) = (\mathbf{B}_L^T \mathbf{B}_L)^{-1} \mathbf{B}_L^T \mathbb{E}(\mathbf{y}) =(\mathbf{B}_L^T \mathbf{B}_L)^{-1} \mathbf{B}_L^T \mathbf{B}_{L^*}\mu^{L^*}=:\underline{\mu}^{L}.$$ The theory of the linear model gives that the variance of $\hat{\underline{\mu}}^L$ is equal to $\sigma^2 (\mathbf{B}^T\mathbf{B})^{-1} \mathbf{B}^T\Sigma \mathbf{B}(\mathbf{B}^T\mathbf{B})^{-1}$ with $\Sigma=Diag(\Sigma_1, \ldots, \Sigma_N)$ the $nN \times nN$ covariance matrix of $\mathbf{y}$. So finally, we have $$ \hat{\underline{\mu}}^{L} \sim \mathcal{N}\left(\underline{\mu}^{L}, \sigma^2\Sigma_{B}^{L, L^ \varepsilon} \right). $$ Now we can easily deduce the distribution of $\hat{\underline{f}}^{L}(t)$, for each $t\in [0,1]$: $$\hat{\underline{f}}^{L}(t)- \mathbf f^{L}(t) \sim \mathcal{N}\left(0, \sigma^2 B(t)\Sigma_{B}^{L, L^\varepsilon} B(t)^T\right).$$ To prove that $(\hat{\underline{f}}^{L}-f^{L})$ is a Gaussian process, we consider any finite sequence of times $(t_1, \ldots, t_d)\in [0,1]$. The sequence $(\hat{\underline{f}}^{L}(t_1)-f^{L}(t_1), \ldots, \hat{\underline{f}}^{L}(t_d)-f^{L}(t_d))$ is Gaussian, centered, and the covariance is equal to $cov(\hat{\underline{f}}^{L}(t_1)-f^{L}(t_1), \hat{\underline{f}}^{L}(t_2)-f^{L}(t_2)) = \sigma^2 B(t_1)\Sigma_{B}^{L, L^\varepsilon} B(t_2)^T$. Thus, the process is Gaussian. ## Proof of @thm-CB_Liebl_asymptotic For all $t$, for all $\omega \in \Omega$, $$\lim_{n\rightarrow +\infty} \underline{f}_n^{L}(t) - f^{L}(t) =0$$ which means for all $\varepsilon >0$, there exists $N_0$ such that for all $n>N_0$, $$ |\underline{f}^{L}(t) - f^{L}(t)| \leq \varepsilon.$$ Then, we have, with probability $1-\alpha$, $$|\hat{\underline{f}}^{L}(t)-\underline{f}^{L}(t)|+ |\underline{f}^{L}(t)- f^{L}(t)| \leq \hat d^L(t) +\varepsilon$$ with $\hat d^L(t) = \hat c^L \sqrt{\hat C_L(t,t)/N}$ and $\hat c^L$ defined as the solution of @eq-cL. Then, with probability larger than $1-\alpha$, $$|\hat{\underline{f}}^{L}(t)-\underline{f}^{L}(t) + \underline{f}^{L}(t)- f^{L}(t)| \leq \hat d^L(t) +\varepsilon$$ ## Proof of @prp-CBf To simplify the notations, let us denote $a(t) = \underline{f}^{L}(t) - \underline{\hat{f}}_1^{L}(t)$ and $b(t) = \underline{f}^{L_{\max},L^*}(t)- \underline{f}^{L}(t)-(\underline{\hat f}_2^{L_{\max}, L^*}(t)-\underline{\hat f}_2^{L}(t))$. We have \begin{align*} P\left( \exists t |a(t)+b(t)|\geq \hat d_1^L(t) + \hat d_2^{L,L_{\max}}(t)\right) &\leq P\left( \exists t |a(t)|+|b(t)|\geq \hat d_1^L(t) + \hat d_2^{L,L_{\max}}(t)\right)\\ &= P\left( \exists t |a(t)| \geq \hat d_1^L(t) \right)P\left( \exists t |b(t)|\geq \hat d_2^{L,L_{\max}}(t)\right) = \alpha\beta. \end{align*} The last equality holds thanks to the independence of the two sub-samples. # Appendix: more experiments The properties of the coefficients are illustrated in @fig-coefficients. The true dimension is $L^*=11$. Three families are considered, Fourier, Legendre and Splines. The plots display the absolute difference between the coefficients $\mu_\ell^{L^*}$ and the projected coefficients $\underline{\mu}^{L}$, for different $\ell$ in x-axis and for different values of $L$ and $n$ in the y-axis, namely a case with $L<L^*$ and two values of $n$: $L=7, n=20$ and $L=7, n=100$; and a case with $L>L^*$ and two values of $n$: $L=15, n=20$ and $L=15, n=100$. The absolute difference is represented as a gradient of color, this gradient being adapted to each functional family. We can see that as Legendre (resp. Fourier) are orthonormal (resp. orthogonal) families, the differences are close to $0$ when $L=15$, whatever the values of $n$. When $L<L^*$, the difference is close to $0$ when $n$ is large. This property does not hold for the spline family, which is not orthogonal. ```{r, fig.height=5, fig.width=7} #| label: fig-coefficients #| fig-cap: "Illustrative example. The true dimension is 11, we generate the coefficients with three families, Fourier (which is orthogonal), Legendre (which is orthonormal) and the splines (which are not orthogonal wrt the standard scalar product). In the y-axis, two dimensions of the family (7 or 15) and two numbers of timepoints (20 or 100) are compared. We plot in x-axis the value of the absolute difference between the true coefficients and their approximations for the first 7 coefficients of the basis. The color scale is adapted to each functional basis. " set.seed(1) library(RColorBrewer) cols = brewer.pal(9, "BuPu") pal <- colorRampPalette(cols) L.star = 10 Time = seq(data$time[1], data$time[length(data$time)], length.out = 100) L.star = 11 vec.L = c(7,15) par(mfrow=c(3,1)) for (basis in c("Fourier", "Legendre", "Splines")){ if (basis == "Splines"){ data = dgp(n=20,N=40,sd=2*1,L.star=L.star, L.eps = 20, alpha.mu = NULL, basis = basis, f.true = NULL) } else { data = dgp(n=20,N=40,sd=1,L.star=L.star, L.eps = 20, alpha.mu = NULL, basis = basis, f.true = NULL) } alpha.tot = c() for (L in vec.L){ set.seed(1) B = func.basis(data$time, L, basis) B.n.large = func.basis(Time, L, basis) B.Lstar.n.large = func.basis(Time, L.star, basis) alpha.true.L = solve(t(B) %*% B) %*% t(B) %*% data$f.true alpha.true.L.n.large = solve(t(B.n.large) %*% B.n.large) %*% t(B.n.large) %*% B.Lstar.n.large %*% data$alpha.mu alpha.tot = cbind(alpha.tot, alpha.true.L[1:min(c(L.star, vec.L))]-data$alpha.mu[1:min(c(L.star, vec.L))], alpha.true.L.n.large[1:min(c(L.star, vec.L))] - data$alpha.mu[1:min(c(L.star, vec.L))]) } image.plot(abs(alpha.tot), main = paste0(basis, "'s basis"), col = pal(20), yaxt="n", xaxt = "n", xlab = "index of the basis" ) axis(2, at = seq(0, 1, length = ncol(alpha.tot)), labels = c("L=7, n=20", "L=7, n=100", "L=15, n=20", "L=15, n=100"), las = 2, cex.axis = 0.6) axis(1, at = seq(0,1,length.out = 7), labels = c("1", "2", "3", "4", "5", "6", "7")) } ```