Dynamic Factor Models with Inﬁnite-Dimensional Factor Spaces: One-Sided Representations

,


Dynamic factor models
High-dimensional factor model methods can be traced back to two seminal papers by Chamberlain (1983) and Chamberlain and Rothschild (1983). The recent and fastly growing literature on the subject, however, is starting with the contributions by Forni et al. (2000), Forni and Lippi (2001), Stock and Watson (2002a,b), Bai and Ng (2002) and Bai (2003). Fostered by their success in applications, factor model methods since then have attracted considerable attention. The recent literature in the area is so abundant that even a brief review is impossible here, and we restrict ourselves to a short and unavoidably somewhat subjective selection of "representative" references. Applications Apart for some minor features, most factor models considered in the literature are particular cases of the so-called Generalized Dynamic Factor Model (GDFM) introduced in Forni et al. (2000). Consider a countable set {x it }, i ∈ N of observable stationary stochastic processes. The GDFM relies on a decomposition of the form 1) i ∈ N, t ∈ Z, where u t = (u 1t u 2t · · · u qt ) is a q-dimensional orthonormal unobservable white noise vector and b if (L), i ∈ N, f = 1, . . . , q are square-summable filters (L, as usual, stands for the lag operator). Moreover: (I) u t is orthogonal to ξ i,t−k for all i ∈ N, t ∈ Z and k ∈ Z; (II) cross-covariances among the ξ it 's are "weak".
By "weak", we mean that, while some cross-covariance among the ξ's is allowed, all sequences of weighted cross-sectional averages of the form n i=1 w ni ξ it such that lim n→∞ n i=1 w 2 ni = 0 tend to zero in mean square as n → ∞ (the sequence of arithmetic averages n −1 n i=1 ξ it being a particular case). 1 Note that E(ξ 2 it ) ≤ M for all i and E(ξ it ξ jt ) = 0 for all i = j, is sufficient, but not necessary for (II) to hold (we refer to Section 2 for a detailed presentation and discussion).
Weak covariance of the ξ it 's motivates calling them idiosyncratic, while the χ it 's, being driven by the low-dimensional vector of common shocks u f t , f = 1, 2 . . . , q, are called common components. The model implies that cross-covariances among the observable variables x it are essentially accounted for by the common components χ it .
The problem consists in recovering the unobserved common and idiosyncratic components χ it and ξ it , the common shocks u t and the filters b if (L), from finite realizations 1 Weak cross-covariance among the ξ's, as opposed to cross-sectional orthogonality (that is, the much stronger assumption of no cross-covariances at all), is the reason for using the term "generalized" in the denomination of the GDFM. It constitutes a major difference with respect to the dynamic factor models studied in Sargent and Sims (1977), Geweke (1977), Quah and Sargent (1993), which, being based on a finite number n of equations of the form (1.1), require strict cross-sectional orthogonality.
(i = 1, . . . , n; t = 1, . . . , T ) of the process {x it }, as n and T both tend to infinity. The main tool so far has been a principal component analysis (PC) of the variables x it , either standard or in the frequency domain (Brillinger's concept of dynamic principal components), depending on the assumptions made. The results obtained can be summarized as follows.
(i) The finite-dimension assumption. Most authors assume that, denoting by span( . . . ) the space generated by a collection of random variables, 2 span(χ it , i ∈ N), for given t, has finite dimension r, where r ≥ q. Under that assumption, model (1.1) can be rewritten as (1. 2) i ∈ N, t ∈ Z. This is fairly easy to prove, see Forni et al. (2009), Remark R, Section 2. In this case, we say that (1.1) admits a static representation. If, in addition, N(L) = N(0), so that F t is a white noise vector, then (1.1) is a static factor model. Criteria to determine r consistently are given in Bai and Ng (2002) (see also Alessi et al. 2010). The vectors F t and the loadings λ ij can be estimated consistently using the first r standard principal components, see Stock and Watson (2002a,b), Bai and Ng (2002). Moreover, the second equation in (1.2) is usually specified as a singular VAR, so that (1.2) becomes x it = λ i1 F 1t + λ i2 F 2t + · · · + λ ir F rt + ξ it where the matrices D j are r × r while R is r × q. Under (1.3), Bai and Ng (2007) and Amengual and Watson (2007) provide consistent criteria to determine q.
VAR estimation, and therefore, up to multiplication by an orthogonal matrix, estimation of u t in (1.3), is standard. 2 More precisely, span(ζ i , i ∈ N), where ζ i belongs to the Hilbert space of square-summable random variables defined over some probability space, equipped with the corresponding L 2 norm, is the closed Hilbert space of all mean-square convergent linear combinations of the ζ i 's and limits of convergent sequences thereof.
(ii) Obtaining the static representation. Let us point out that (1.2) or (1.3) are convenient "reduced forms" of other, more explicitly dynamic, representations. For example, an interesting dynamic factor model is where f t is a q-dimensional stationary vector, µ µ µ ij is 1 × q and D(L)f t = u t .
Bai and Ng (2007)  The moot point is that such assumptions are far from being innocuous. For instance, (1.2) is so restrictive that even the very elementary model where q = 1, u t is scalar white noise, and the coefficients α i are drawn from a uniform distribution over (−1, 1), is ruled out. Indeed, the space spanned, for a given t, by the common components χ it , i ∈ N, is easily seen to be infinite-dimensional. Infinitedimensional span(χ it , i ∈ N)'s a fortiori occur if the AR common component in (1.5) is replaced by more general ARMA ones.
But even when the dimension of span(χ it , i ∈ N) is finite there are interesting cases for which the dynamically unrestricted model and related methods provide an advantage over the static approach. Consider the model where u t is a scalar white noise, and suppose that we are interested in the first variable x 1t . Of course this model, unlike (1.5), can be written in the static form (1.2), with F 1t = u t and F 2t = u t−1 . However, it does not fulfill a basic assumption of the static two-factor model, since u t−1 is "non-pervasive" (see Assumption B.2, Section 2). As a consequence, the impulse response function of x 1t , i.e. 1 − aL, cannot be obtained with the standard principal component method. By contrast, as shown in Section 2, model (1.6) can be easily accommodated within the dynamic approach proposed here. 4 Such examples provide a strong theoretical motivation for solving the one-sidedness problem in model (1.1) without turning to the finite-dimension restriction and the related assumptions and methods. This is done in the present paper under assumptions that include rational spectral density for the common components χ it . 5 On the other hand, we must also point out that, even when the finite-dimension is a ratio of polynomials in L. More precisely, we assume the following representation for the common components: The assumption that s 1 and s 2 , the degrees of c if (L) and d if (L) respectively, are assumed to be independent of i is very convenient, though not necessary. As for the idiosyncratic components we do not make any parametric assumptions, nor restrict their cross-covariance structure-except of course for the "weak cross-correlation assumption" that characterizes idiosyncrasy, as described above. Our model, in that sense, is a semiparametric one, with a huge nuisance; in particular, the autocorrelation structures of idiosyncratic components remain completely unspecified.
Thus (1.8) is made up of (a) obtaining an autoregressive representation for each of the vectors χ χ χ k t , and then (b) knitting together such autoregressive representations.
(ii) As regards (a), each of the subvectors has dimension (q + 1) and rank q (i.e. its spectral density has rank q for all θ ∈ [−π π]), and is therefore singular (i.e. its dimension is greater than its rank). For singular (or reduced-rank) vectors, with rational spectral density, existence of a finite-degree autoregressive representation, for generic values of the parameters, has been proved in Anderson and Deistler (2008a, b). We contribute to this literature by showing that, when the dimension is equal to q + 1, the minimum-lag autoregressive representation is generically unique. As regards (b), obtaining the same u t for all the vectors χ χ χ k t requires the additional assumption that, for each k, span(χ χ χ k t−h , h ≥ 0) = span(χ χ χ t−h , h ≥ 0). We will motivate this restriction by a genericity argument.
(iii) The matrices A k (L) and R k can be obtained starting with the spectral density matrix of the observable variables x it . The vector z t results from the application of one-sided filters to the variables x it , see (1.10). Lastly, u t can be obtained using the first q principal components of the variables z it , i.e. only current values of the variables z it . Our procedure thus solves the one-sidedness problem.
(iv) Moreover, the matrices A k (L) and R k , which are (q + 1) × (q + 1) and (q + 1) ×q respectively, result from separate low-dimensional calculations. Thus we do not run into "curse of dimensionality" problems.
In Section 2, we state the main assumptions underlying the GDFM and review some basic results from previous literature. In Section 3, we prove some general results on stochastic vectors that are infinite-dimensional with finite rank, like χ χ χ t , under the assumption of rational spectral density. Rational spectral density is assumed for χ χ χ t throughout the paper. In Section 4, we present results on autoregressive representations of singular stochastic vectors. Such results are then used to construct the blockwise autoregressive representation (1.8) for χ χ χ t and to transform the original variables x it into another set of variables for which a static factor model holds. Lastly, we briefly outline the correspondence between our representation result here and the estimation procedure that we study in the companion paper

Notation
The GDFM (1.1) can be thought of as (i) a double-indexed stochastic process {x it , i ∈ N, t ∈ Z}, (ii) a family of stationary processes {x it , t ∈ Z} indexed by i ∈ N, or (iii) a stationary family of cross-sections {x it , i ∈ N} indexed by t ∈ Z, i.e. a stationary infinite-dimensional stochastic process 6 . We find the third option convenient, and accordingly write x t for (x 1t x 2t · · · x nt · · · ) . The notation χ χ χ t , ξ ξ ξ t and x t = χ χ χ t + ξ ξ ξ t is used in a similar way, with obvious componentwise counterparts. Associated with this infinite-dimensional vector notation, we also consider infinite-dimensional matrices, such as A(L) or R (see (1.10)), which are ∞ × ∞ and ∞ × q, respectively.
The reader will easily check that we never produce infinite sums of products, so that our infinite-dimensional matrices are no more than a notational conve- Given the infinite-dimensional process y t = (y 1t y 2t · · · y nt · · · ) , we use the following notation: (i) y st is the s-dimensional process (y 1t y 2t · · · y st ) ; If y t is s-dimensional we use the notation H y = span(y it , i ≤ s, t ∈ Z), H y t = span(y iτ , i ≤ s, τ ≤ t) (we never need sub-vectors of finite-dimensional vectors).
It is convenient, though not necessary, to assume throughout the paper that all white-noise vectors are orthonormal.

Basic assumptions
All the stochastic variables x it , χ it and ξ it below have mean zero and finite variance.
Assumption A.1 For all n ∈ N, the vector x nt is weakly stationary (stationary henceforth), and has a spectral density (an absolutely continuous spectral measure).
, the nested spectral density matrices of the vectors x nt = (x 1t x 2t · · · x nt ) . The matrix Σ Σ Σ x n (θ) is Hermitian, non-negative definite and has therefore non-negative real eigenvalues for all is used in a similar way. Our second assumption is

Forni and Lippi (2001) prove that
Theorem A Assumptions A.1 and A.2 imply that x t can be represented as in (1.1), i.e.
Conversely, if x t can be represented as in (2.1) with χ χ χ t and ξ ξ ξ t fulfilling (i), (ii) and (iii), then x t satisfies Assumptions A.1 and A.2.
An infinite-dimensional vector satisfying (i) is called an idiosyncratic vector.
Under the restriction that the dimension of span(χ it , i ∈ N) is finite, so that the model has representation (1.2), or (1.3), the basic assumptions are: Theorem B (Chamberlain and Rothschild, 1983) Assumptions B.1 and B.2 imply that x t can be represented as where F t is a weakly stationary r-dimensional vector. Moreover, (i) ξ ξ ξ t satisfies Assumption B.1 andμ ξ 1 < ∞; (ii) χ χ χ t satisfies Assumption B.1 andμ χ r = ∞ (note thatμ χ r+s = 0 for all s > 0); (iii) ξ ξ ξ t and F t are uncorrelated for all t ∈ Z; (iv) the integer r and the components χ it and ξ it are unique.
Conversely, if x t can be represented as in (2.2) with χ χ χ t and ξ ξ ξ t fulfilling (i), without resorting to spectral techniques.

nicely highlights a noticeable difference between Assumptions A.2 and B.2,
corresponding to a basic difference between the dynamic and the static approaches.
Using the dynamic approach, we see that the first eigenvalue of the spectral density matrix diverges and Asssumption A.2 is fulfilled with q = 1. Hence the common component of the first variable is u t + au t−1 and its idiosyncratic component is ξ 1t .
Using the techniques of the present paper, the (bivariate) VAR corresponding to the while all other bivariate blocks (x 2j+1,t , x 2(j+1),t ) (j = 1, 2, . . .) have A k (L) = I 2 and R k = (1 1) , so that we obtain the correct representation (1 + aL)u t for χ 1t , that is, the correct response of x 1t to the common shock u t .
On the other hand, using the static approach, we find that only the first eigenvalue of the variance-covariance matrix diverges. Assumption B.2 is fulfilled with r = 1, namely, by Theorem B, the model has a static factor representation with just one factor, i.e. u t , whereas u t−1 , being non-pervasive, is not a common factor. The common component of the first variable is u t and the term au t−1 is absorbed by the idiosyncratic component, so that the model fails to correctly represent the reaction of x 1t to the shock u t . 7 7 The resulting lagged covariance between the common and the idiosyncratic component of x 1t is ignored within the static approach.

Infinite-dimensional processes with finite rank
Of course, uniqueness of χ χ χ t and ξ ξ ξ t in (2.1) does not imply that u t or b(L) are unique.
where Q is an arbitrary q×q orthogonal matrix, or, more generally, More importantly, Theorem A does not ensure that χ χ χ t admits a one-sided movingaverage representation, i.e., a representation of the form χ χ χ t = e(L)w t , where w t is q-dimensional orthonormal white noise and e(L) = e 0 + e 1 L + · · · . For example, if where u t is one-dimensional white noise (q = 1), then statement (ii) of Theorem A holds true, so that χ χ χ t is the common component of some process x t satisfying A.1 and A.2, but χ χ χ t has no one-sided representations (this is quite obvious from Lemma 1 below). 8 The existence of one-sided moving average representations of infinite-dimensional stochastic vectors is analyzed in Lemmas 1 and 2 under the assumptions of rational spectral density and finite rank. A precise statement of those lemmas requires giving some further definitions and recalling a few results on rational-spectrum finitedimensional stochastic vectors.
Assume that y t fulfills Assumption A.1. We say that y t has rank q if there exists a positive integer s such that rank(Σ Σ Σ y n (θ)) = q, for n ≥ s and almost all θ in [−π π].
Definition 2 Let y t denote an infinite-dimensional stationary stochastic vector with a moving average representation where v t is q-dimensional orthonormal white noise and b(L) is an ∞ × q squaresummable filter. We say that (3.2) is a fundamental representation if (1) b(L) is one-sided, and (2) v t belongs to H y t . In that case, we also say that the white noise v t is fundamental for y t . Note that if v t is fundamental for y t , then Now suppose that y t is n-dimensional with representation where v t is q-dimensional orthonormal white noise and b(L) is an n × q squaresummable filter. Fundamentalness of (3.3) and v t are defined as in Definition 2. Moreover, orthonormal, is another fundamental representation, then w t has dimension q, (Rozanov 1967, pp. 56-57); (II) if (3.3) is fundamental, then rank(b(z)) = q for all complex z such that |z| < 1 (Rozanov 1967, p. 63, Remark 3). In particular, rank(b 0 ) = rank(b(0)) = q.
A finite-dimensional stationary process with a spectral density does not necessarily possess a fundamental representation (see footnote 8). However, (III) if y t has rational spectral density, then it has fundamental representations. If (II ) suppose that y t has rational spectral density, that n × q, rational, square-summable and one-sided, v t is q-dimensional orthonormal white noise, and that rank(b(z)) = q for all z such that |z| < 1: then, is fundamental (Hannan, 1970, pp. 62-67).
We say that the infinite-dimensional process y t has rational spectral density if y nt has rational spectral density for all n.
Lemma 1 Suppose that the infinite-dimensional process y t has rational spectral density and rank q. The following statements are equivalent: (i) y t has a one-sided rational moving average representation y t = b(L)v t (the en- (ii) There exists a positive integer s such that H ys t = H y t .
Proof. Assume (ii). By (III) there exists a one-sided rational fundamental represen- By assumption, y s+k,t ∈ H ys t and, therefore, y s+k,t ∈ H v t , so that The white noise v t is fundamental for y st , hence also for (y st y s+k,t ) . Thus representation (3.4) is fundamental, so that, by (III), b s+k (L) must be rational. The conclusion follows. Assume now that (i) holds. We say that β is a zero of b(L) if the determinants of the q × q submatrices of b(β) all vanish. Assume that α is a zero of b(L) and that |α| < 1. There exists a unitary q × q matrix B α such that all the entries of the first column of b(L)B α vanish at α. Defining γ γ γ α (L) as the q × q diagonal matrix with diagonal entries ((1 − αL)(L − α) −1 1 · · · 1), we have where a tilde denotes transposition and conjugation. This is an alternative one-sided Lemma 2 Suppose that the infinite-dimensional process y t has rational spectral density and rank q. Then, (i) if y t has a one-sided rational representation y t = b(L)v t , then y t has a fundamental (rational) representation; (ii) if y t = b(L)v t and y t = c(L)w t are fundamental, with v t and w t q-dimensional and orthonormal, then Proof. Statement (i) is part of the proof of Lemma 1. As for (ii), suppose that y t = b(L)v t and y t = c(L)w t both are fundamental. By Lemma 1, there exists s such that H ys t = H y t . As a consequence, both v t and w t belong to H ys t , and therefore are fundamental for y st . This implies that w t = Q v t , where Q is orthogonal. Thus Q.E.D.
Summing up, given the infinite-dimensional vector y t , assuming A.1, finite rank, rational spectral density, and the existence of a one-sided rational moving average representation, we obtain the existence of a rational fundamental representation for y t , which is unique up to multiplication by an orthogonal matrix. Moreover, for some s, the space spanned by the current and past values of y st coincides with the space spanned by current and past values of the whole vector y t (equivalently, a fundamental white noise of y st is a fundamental white noise of y t ).
Adding to a rational spectral density the assumption that χ χ χ t has a one-sided rational representation or, equivalently, that H χs t = H χ t for some s, so that cases like (3.1) cannot occur, Lemma 2 ensures that χ χ χ t has a rational fundamental representation.
More precisely, for i ∈ N, where c if (L) and d if (L) are polynomials in L, and u t is fundamental for χ χ χ t .
However, in Assumption A.3 (see Section 4.2), we will require more than the existence of an integer s such that H χs t = H χ t . Rather, we suppose that the space spanned by χ i 1 τ , χ i 2 τ , . . . , χ i q+1 ,τ , τ ≤ t, coincides with H χ t for all (q + 1)-tuples i 1 < i 2 < · · · < i q+1 . Thus, u t in (3.5) is fundamental for any (q + 1)-dimensional subvector of χ χ χ t , not only for the subvector χ χ χ st associated with some s. This stronger requirement is motivated in Section 4. We prove that, under a quite general parameterization, the stronger condition holds generically, i.e. outside of a negligible subset, as defined in Section 4, of the parameter space.

General results for singular stochastic vectors
Consider an n-dimensional vector y t such that We assume that for any i the filters in (II) d if (L) has no root of modulus smaller than or equal to one, for f = 1, 2, . . . , q.
Thus, there exists a real φ > 1 such that all the roots of the polynomials d if (L) are of modulus greater than or equal to φ.
As a consequence, the vector y t is described by a parameter vector taking values in Π n = Π × Π × · · · × Π n , which is the closure of a non-empty open subset of R µ , with µ = nν.
We are interested in the case n > q. Such "tall systems" have been studied recently by Anderson, Deistler and their coauthors (see in particular, Anderson and Deistler, 2008a and b). One of their results is that when n > q, there exists a nowhere dense set N ⊂ Π n , i.e. a set whose closure has no interior points, such that if the parameter vector lies in Π n − N , y t has an autoregressive representation of the form where (i) R is n × q, with rank(R) = q; (ii) A(L) is an n × n finite-degree matrix polynomial.
When a property holds in Π n − M and M is nowhere dense in Π n , we say that the property holds generically in Π n . As R has generically full rank, (4.3) implies that, generically, v t is fundamental for y t . 9 To provide an intuition for this result and Proposition 1 below, let us consider the following elementary example, in which n = 2, q = 1, and with parameter (a 1 , b 1 , a 2 , b 2 ) in R 2 × R 2 . Outside of the nowhere dense subset in which Using (4.5) to get rid of v t−1 in (4.4), we obtain the AR(1) representation where d = 1/(a 1 b 2 − a 2 b 1 ). Note that 9 Results on the existence of autoregressive representations for singular vectors are given in Miamee and Pourahmadi (1987). Without assuming rational spectral density, they provide sufficient conditions. However, the existence of finite-degree autoregressive representation is not considered.
(i) If a 1 b 2 −a 2 b 1 = 0, no finite-degree autoregressive representation exists, unless b 1 = b 2 = 0. Moreover, fundamentalness of v t for y t requires that the root of a 1 + b 1 L (which is also the root of a 2 + b 2 L) has modulus larger than one.
(ii) However, as soon as a 1 b 2 − a 2 b 1 = 0, v t is fundamental for y t even if both the roots of a i + b i L, i = 1, 2, are smaller than one in modulus.
(iii) Quite obviously, a 1 b 2 − a 2 b 1 = 0 if and only if y 1t−1 and y 2t−1 are linearly independent. Therefore, generically, the projection (4.6) is unique, i.e. generically no other autoregressive representation of degree one exists.
(iv) But higher-degree autoregressive representations do exist. Rewriting (with obvious definitions of A and a) (4.6) as y t = Ay t−1 + av t , we get y t = A 2 y t−2 + Aav t−1 + av t . Using (4.5) to get rid of v t−1 , we obtain another autoregressive representation, of degree two. Such non-uniqueness does not occur for square systems (when n = q).
However, the variables y it−1 , i = 1, 2, 3, are not linearly independent, so that such minimum-lag autoregressive representation is not unique.
Let us show that remark (iii) can be generalized. Precisely, if n = q + 1, then, generically, there exists only one minimal-lag autoregressive representation.
Proposition 1 Consider an n-dimensional vector y t with representation (4.1)-(4.2), and assume that n = q + 1. There exists a set N ⊂ Π q+1 , nowhere dense in Π q+1 , such that, if the parameter vector lies in Π q+1 − N , (i) y t has a finite-degree AR representation A(L)y t = Rv t , where R is (q + 1) × q, R if = c if (0), rank(R) = q, A(L) is (q + 1) × (q + 1) and has degree not exceeding S = qs 1 + q 2 s 2 . This implies that v t is fundamental for y t .
(ii) Suppose that (a) A * (L) is a (q + 1) × (q + 1) polynomial matrix whose degree does See Appendix A for the proof.
Part (i) of Proposition 1 has already been proved in the papers by Anderson and Deistler, as we have mentioned above. However, the parameters in Anderson and Deistler's papers are the entries of the matrices in the state-space representation of the rational-spectrum vector y t , whereas our parameters are the coefficients of the rational functions in representation (4.1).
Note that Proposition 1 does not claim that, generically, the process y t corresponding to a parameter vector in Π q+1 has no non-fundamental representations. What it claims is that, generically, such non-fundamental representations are not parameterized in Π q+1 . For example, representation (4.4) is generically fundamental in R 2 × R 2 . On the other hand, given any a with |a| > 1, the process y t also has the representation for i = 1, 2, where is white noise (this is easily proved by showing that its spectral density is constant).
Thus, y t has the non-fundamental representation (4.7). The latter, however, is param- Now assume that y t is infinite-dimensional with y it modeled as in (4.1) for i ∈ N.
The vector y t is parameterized in Π ∞ = Π × Π × · · · . We define negligible sets and genericity in Π ∞ with respect to the product topology 10 . We say that a subset of Π ∞ is negligible if it is meagre, i.e. the union of a countable set of nowhere dense subsets, and that a property holds generically in Π ∞ if the subset where it does not hold is meagre. 10 Let us recall that a basis for the open sets in Π ∞ in the product topology is the family of all Define the set M m , for m ≥ q + 1, as the set of points in Π ∞ such that all vectors y i 1 ,i 2 ,...,i q+1 t = (y i 1 t y i 2 t · · · y i q+1 t ), with i 1 < i 2 < · · · < i q+1 ≤ m, admit a representation of the form where A i 1 ,i 2 ,...,i q+1 (L) is at most of degree S and unique in the sense of Proposition 1(b).
From Proposition 1, we see that N m = Π ∞ − M m is a nowhere dense subset in the product topology of Π ∞ , so that the set N = ∪ ∞ m=q+1 N m , being a countable union of nowhere dense subsets of Π ∞ , is a meagre subset. We thus have the following.
Lemma 3 Assume that y t is infinite-dimensional, modeled as in (4.1) for i ∈ N and parameterized in Π ∞ . Generically in Π ∞ , all the vectors y with i 1 < i 2 < · · · < i q+1 , can be represented as in (4.8), where A i 1 ,i 2 ,...,i q+1 (L) is at most of degree less than S and unique in the sense of Proposition 1(b).
Definining negligible subsets of Π ∞ as meagre subsets has a good motivation in the fact that (i) the complement of a meagre subset of Π ∞ is not meagre, (ii) if a subset of Π ∞ is not meagre, obtaining it as the union of a family of nowhere dense subsets requires an uncountable family. 11 Moreover, assuming that the parameter space indexing the polynomials c ij (L) and d ij (L) does not depend on i, as we do in (4.1), is convenient but not necessary.
With the dimension of the parameter space depending on i, a more general version of Proposition 1 holds as well as the meagreness result for infinite-dimensional vectors y t .
However, the gain in generality does not seem to justify the substantial additional complications in the proof of Proposition 1 and the determination of the degree of A(L). 11 Let us recall that: (I) because Π is a closed subset of R ν , the space Π ∞ is the Cartesian product of a countable family of complete metric spaces and is therefore a complete metric space (Dunford and Schwartz (1988), p. 32, Lemma 4); (II) in complete metric spaces the complement of a meagre subset is not meagre (same reference, Baire Category Theorem, p. 20).

Existence of AR representations of χ χ χ t
Let us now turn our attention to the common-component vector χ χ χ t . As we have seen, assuming that χ χ χ t has rational spectral density and a one-sided rational representation implies, by Lemma 2, that χ χ χ t has a fundamental rational representation of the form (4.1). The meagreness argument developed in Section 4.1, as summarized in Lemma 3, provides a motivation for assuming more.
(ii) Representation (4.9) is unique in the sense of Proposition 1(ii).
An immediate consequence of Assumption A.3 is that χ χ χ t can be represented as in (1.8), that is, .

Construction of the AR representations of χ χ χ t
Assumption A.3 ensures existence and uniqueness of the autoregressive representation (4.10). We now show how (4.10), i.e. the matrices A k (L) and (up to multiplication by an orthogonal matrix) R k , can be constructed from the spectral density of the χ's.
(i) Assume that the population spectral density of the vector χ χ χ t is known, i.e. that the nested spectral density matrices Σ Σ Σ χ n (θ), n ∈ N, are known.
(ii) Denote by χ χ χ k t the k-th (q + 1)-dimensional subvector of χ χ χ t appearing in (4.10), and write Σ Σ Σ χ jk (θ) for the (q + 1) × (q + 1) cross-spectral density between χ χ χ j t and χ χ χ k t . Then, denoting by Γ Γ Γ χ jk,s the covariance between χ χ χ j t and χ χ χ k t−s , (iii) Using the autocovariance function Γ Γ Γ χ kk,s , we obtain the minimum-lag matrix polynomial A k (L) and the autocovariance function of the unobservable vectors (4.14) and We have where C ad stands for the adjoint of a square matrix C. Invertibility of C χ kk , hence of (C χ kk ) ad , is a consequence of Assumption A.3.
(iv) The ∞ × ∞ matrix Γ Γ Γ Ψ obtained by piecing together the matrices Γ Γ Γ Ψ jk is of rank q (see Lemma 2(iii)) and can therefore be represented as Γ Γ Γ Ψ = S S , where S is an ∞×q matrix. On the other hand, Γ Γ Γ Ψ is the covariance matrix of the right-hand side terms in (4.10), so that S = RH, where H is q × q and orthogonal.
Lastly, using x t = χ χ χ t + ξ ξ ξ t , letting Z t = A(L)x t and Φ Φ Φ t = A(L)ξ ξ ξ t , we obtain (4.17) In conclusion, starting with the spectral density of the χ's, we obtain the filter A(L),

Normalization of Z t
Under our assumptions, the dynamic factor model for the variables x it has been transformed into model (4.17), which has the form (2.2) for the variables Z it , with r = q and F t = u t . Application of standard principal components to estimate u t and R requires that Assumptions B.1 and B.2 be fulfilled. The latter are equivalent to statements (i), (ii) and (iii) of Theorem B, see Section 2.2. In particular, the first eigenvalue of the variance-covariance matrix of Φ Φ Φ nt should be bounded. We show below that this is not a consequence of our assumptions so far.
To see this, let us resort again to the simple case in which q = 1 and the common components are MA(1), Considering the 2-dimensional vectors χ χ χ k t , we have, see (4.6): Assumption 3 implies that c k − c k−1 = 0 for all k (and all possible groupings), but no more. In particular, it does not imply that |c k − c k−1 | ≥ d for some d > 0 and all k.
As a consequence, the variance of the components of Φ Φ Φ t = A(L)ξ ξ ξ t is not necessarily bounded, as it should be if Φ Φ Φ t were idiosyncratic. (ii) V as the ∞ × ∞ diagonal matrix with w −1 i in entry (i, i);

Equation (4.17) becomes
(4.18) Adding the following assumption is sufficient, though not necessary, to prove that φ φ φ t fulfills statement (i) of Theorem B.
Proposition 2 Let Γ Γ Γ φ n the variance-covariance matrix of φ φ φ nt and µ φ n1 its first eigenvalue. Under Assumptions A.1, A.2, A.3 and A.4, there exists a real number M such that µ φ n1 ≤ M for all n.
Proof. It is convenient here to assume, without loss of generality, that the number n of variables increases by blocks of size q +1. Thus n = m(q +1), where m is the number of blocks. Let b be a 1 × n vector with |b| = 1. The notation b = (b 1 b 2 · · · b m ) and is used in an obvious way. We denote by Σ Σ Σ ξk (θ) the spectral density matrix of ξ ξ ξ k t and by a k j (e −iθ ) the j-th row of A k (e −iθ ), for j = 1, 2, . . . , q +1. Let c = (c 1 c 2 · · · c m ), and suppose that c j = 0 if j = k. Then cΣ Σ Σ ξ n (θ)c = c k Σ Σ Σ ξk (θ)c k . As a consequence, if d is 1 × (q + 1), then which implies that µ φ n1 = max |b|=1 bΓ Γ Γ φ n b is bounded. Q.E.D.
Let us now consider statements (ii) and (iii) of Theorem B. The definition of φ φ φ t and statement (i) of Theorem A imply that φ φ φ t and η η η t = ru t fulfill statement (iii). As regards statement (ii), let again q = 1 and The corresponding representation (4.18) is .
We have .
We see that divergence of λ χ n1 (θ) almost everywhere in [−π π] does not imply divergence of µ η n1 . However, convergence of µ η n1 occurs only if var(Φ it )/c 2 i0 diverges. Sufficient conditions for this are (1) var(Φ it ) → ∞ and c 2 i0 bounded away from zero, (2) var(Φ it ) bounded away from zero and c 2 i0 → 0. Regarding (1), though we do not assume that var(Φ it ) is bounded, divergence of var(Φ it ) requires a very special sequence of coefficients (c i0 , c i1 ). Regarding (2), even if we do not assume a positive lower bound for c i0 , convergence to zero of c 2 i0 can be ruled out as very special. Even more farfetched are the cases in which the ratio var(Φ it )/c 2 i0 diverges though neither (1) nor (2) holds, like the ratio α 1 /β 1 with Extending these considerations to q > 1 and more complex models for χ χ χ t does not seem worthwhile. We believe that the analysis of the simple example above is sufficient to motivate the following assumption on the q-th eigenvalue of the variance-covariance matrix of z t .

Estimation
The construction leading from the x's to the z's has a natural counterpart in the estimation procedure developed in the companion paper Forni et al. (2014).
(I) We start with an estimate of Σ Σ Σ x n (θ), the spectral density of the observable variables x it ; callΣ Σ Σ x n (θ) such an estimate. (II) An estimate of the spectral density of the common components, call itΣ Σ Σ χ n (θ), is then obtained using the first q dynamic principal components ofΣ Σ Σ parameters λ if and the matrices D(L) and R. It is easily seen that an a priori assess-ment of the relative merits of the two methods is impossible, the situation being much more complicated than the problem we face when deciding which ARMA specification should be chosen for a medium-size stochastic vector.
A simple illustration of the difficulty can be obtained by considering example (1.5) again. In this case the dynamic approach seems definitely superior. Even though a good approximation can be obtained using the static approach, we may argue that there is no good reason to use a moving average when the data have been generated suggest that the model proposed in the present paper is a competitive specification for dynamic factor models.

Conclusion
We have argued that assuming a finite-dimensional factor space strongly restricts the applicability of dynamic factor models, as even models as simple as ξ it are ruled out. On the other hand, without that assumption, only two-sided estimators have been proposed in the literature so far.
The present paper provides a solution to this problem by means of a feasible autoregressive representation of the high-dimensional common-component vector χ χ χ nt . The key result is that if a stochastic vector χ χ χ nt has dimension n and rank q, where q is fixed whereas n is huge and growing, then, under some mild assumptions, for generic values of the parameters, an autoregressive representation for χ χ χ nt can be determined piecewise. We do not need a huge, unfeasible, n × n VAR, in which each y it is projected on all y jt−k , j = 1, 2, . . . , n. A sequence of small (q + 1) × (q + 1) VAR's is sufficient.
Using the autoregressive representation of χ χ χ nt , we transform the original variables x it into variables z it that are governed by a static factor model. All the steps of our construction have a natural counterpart in an estimation procedure.
is orthonormal white noise and let for i = 1, 2, . . . , n, where the filters γ if (L) are square-summable. In compact form, where Γ(L) is n × q. For R ≥ 1 consider the nR-dimensional stack Y t = (y t y t−1 · · · y t−R+1 ) and the 1 × q filter for all i or that the degree of β i (L) is greater than R − 1 for some i.
Proof. Consider the stack The matrix P R is (q + 1)R × q(R + r). Setting R = rq, P R is square. By assumption, the entries Y t−1 are linearly independent. Thus the matrix P R is non singular, so that . . , v t−r into (A.5), we get (A.6). Q.E.D.
We have and in the resultant of ν k (L) and δ(L) contain powers c S−h q+1,i,s 1 , with 0 < h ≤ S. Thus the three-term product within square brackets in the right-hand side of (A.12) is the coefficient of c S q+1,1,s 1 in the representation of the resultant as a polynomial in c q+1,1,s 1 . As each of the three terms is generically non zero, the coefficient is generically non zero, so that the resultant is generically non zero.
The results above on τ k (L) imply that generically the degree of β q+1 (L) and β k (L) is at least S. Q.E.D.
We now can proceed with the proof of Proposition 1. Rewrite (A.7) as The polynomials β j (L)h j (L) have degrees not greater thanSq −1+s 2 q = s 1 q +s 2 q 2 −1. Lemma A.2, G(L)v t generically has an autoregressive representation of degree s 1 q + s 2 q(q−1), so that, by (A.13)-(A.14), y t generically has an autoregressive representation y t = K 1 y t−1 + K 2 y t−2 + · · · + K S y t−S + E(0)v t (A. 17) of degree S = s 1 q +s 2 q 2 . Moreover, Lemma A.3 proves that generically the components of the stack (y t−1 y t−2 · · · y t−S ) are independent. The uniqueness part of the proposition follows. Q.E.D.