Exact and heuristic approaches for the index tracking problem with UCITS constraints

Index tracking aims at determining an optimal portfolio that replicates the performance of an index or benchmark by investing in a smaller number of constituents or assets. The tracking portfolio should be cheap to maintain and update, i.e., invest in a smaller number of constituents than the index, have low turnover and low transaction costs, and should avoid large positions in few assets, as required by the European Union Directive UCITS (Undertaking for Collective Investments in Transferable Securities) rules. The UCITS rules make the problem hard to be satisfactorily modeled and solved to optimality: no exact methods but only heuristics have been proposed so far. The aim of this paper is twofold. First, we present the first Mixed Integer Quadratic Programming (MIQP) formulation for the constrained index tracking problem with the UCITS rules compliance. This allows us to obtain exact solutions for small- and medium-size problems based on real-world datasets. Second, we compare these solutions with the ones provided by the state-of-art heuristic Differential Evolution and Combinatorial Search for Index Tracking (DECS-IT), obtaining information about the heuristic performance and its reliability for the solution of large-size problems that cannot be solved with the exact approach. Empirical results show that DECS-IT is indeed appropriate to tackle the index tracking problem in such cases. Furthermore, we propose a method that combines the good characteristics of the exact and of the heuristic approaches.

ones provided by the state-of-art heuristic Differential Evolution and Combinatorial Search for Index Tracking (DECS-IT), obtaining information about the heuristic performance and its reliability for the solution of large-size problems that cannot be solved with the exact approach. Empirical results show that DECS-IT is indeed appropriate to tackle the index tracking problem in such cases. Furthermore, we propose a method that combines the good characteristics of the exact and of the heuristic approaches.
Keywords Index tracking · Mixed integer quadratic programming · Stochastic search heuristics · Differential evolution · Cardinality constraints

Introduction
In the last decade, passive management products have increasingly attracted much attention both from small and large investors. The recent crises and the disbelief in the alpha's persistence of asset managers have prompted even further the investors' desires of investing in cheap and transparent financial products, such as Exchange-Traded Funds (ETFs). The idea of implementing quantitative approaches for index tracking (IT) or benchmark replication, not new in the financial literature, has then become even more appealing, given that it could allow to build simple, transparent, and low-cost passive investment products. In fact, IT aims at replicating a given index or benchmark by selecting a subset of assets, or constituents, together with their weights so as to best track the performance of the index. Considering only a smaller number of constituents allows to reduce the administrative, monitoring, and transaction costs and avoids holding small and illiquid positions.
The basic IT problem can be formulated as a constrained optimization problem: minimize a given distance between the index and the tracking portfolio using at most K assets out of the n available. From a complexity viewpoint, even this basic version of cardinality constrained IT with a quadratic tracking error measure was shown to be NP-Hard in Ruiz-Torrubiano and Suárez (2009) via a reduction from the Subset Sum problem. Furthermore, choosing only a small number of constituents could result in ineffective passive management and in holding too large amounts of few assets. Thus, in order to take into account practical limitations on the portfolio composition, other constraints should be considered. For instance, to guarantee a cheap rebalancing strategy, a turnover constraint is imposed in order to avoid large changes in asset weights values over time. It is also reasonable to require that the weights of the assets included in the tracking portfolio should be larger than a given lower bound, to reduce monitoring costs due to small and illiquid positions, and smaller than a given upper bound, to avoid holding too large positions. The latter requirement is also stated in official regulations, such as the European Union Directive UCITS (Undertaking for Collective Investments in Transferable Securities) rules, that require, among others, that the sum of all asset weights exceeding 5 % must be smaller than 40 %.
In this work, we fill a gap in the literature by proposing an exact approach to the IT problem with the hard real-world constraints previously described. More precisely, we propose the first Mixed Integer Quadratic Programming (MIQP) formulation for the index tracking problem subject to the cardinality constraint, the turnover constraint, the buy-in thresholds, and to the additional constraint imposed by the UCITS rules compliance. We note that UCITS rules are often binding for many portfolio managers in the EU, and should be therefore taken into account in real-world practice. However, to our knowledge, so far they have been modeled in the literature as general nonlinear constraints, and have been dealt with only by means of heuristic approaches (see Derigs and Nickel 2003;Derigs and Nickel 2004;Krink et al. 2009). This implies that the exact solution for the IT problem with UCITS constraints has not yet been obtained, even for markets with few assets, before this study. In fact, using our new MIQP formulation we can solve the index tracking problem for small and medium size markets with up to 225 assets by using a standard solver (CPLEX). In particular, we apply our method to financial data from the Dow Jones 65 (65 assets), the Dax 100 (98 assets), the S&P 100 (98 assets), and the Nikkei 225 (225 assets). For each dataset, we adopt a rolling window scheme to evaluate the in-sample and the out-of-sample tracking performance of the portfolios obtained. We then compare the results with those obtained with the Differential Evolution and Combinatorial Search for Index Tracking (DECS-IT) developed in Krink et al. (2009). We confirm that DECS-IT is indeed an efficient and accurate heuristic approach that can be fruitfully used to tackle IT problems, and is one of the few tools available for large-size problems. Furthermore, we observe that for large-size multi-period problems one could devise a hybrid approach that uses the heuristic method to provide a good solution for the first period in a reasonable time and the exact solver to efficiently find optimal solutions for all subsequent periods.
The index tracking problem has received large attention in the literature. Comprehensive reviews can be found in Beasley et al. (2009) and, more recently, in Canagkoz and Beasley (2008). In the last decade, there has been an increasing complexity in index tracking models due to the inclusion of new application oriented constraints in the effort of better adapting the models to real-world practice. Much debate has also been focused on establishing which tracking error measure is more appropriate (see, e.g., Beasley et al. 2009;Canagkoz and Beasley 2008;Coleman et al. 2006;Lobo et al. 2007).
Currently, there is no generally accepted mathematical model for IT (see Canagkoz and Beasley 2008). For instance, in Canagkoz and Beasley (2008) the authors propose a mixed integer linear programming formulation for the IT problem with transaction costs and a cardinality constraint, and use a standard solver (CPLEX) for providing optimal tracking portfolios. Numerical examples are presented for datasets taken from Beasley's OR Library involving up to 2151 stocks. In Okay and Akman (2003) a constraint aggregation method for the mathematical formulation introduced in Beasley et al. (2009), Canagkoz and Beasley (2008) is presented, thus obtaining a mixed integer non linear programming problem that is solved for the Hang Seng 31 stock market. Fang and Wang (2005) formulate the Index Tracking problem as a bi-objective programming problem with the aim of optimizing the excess return and the absolute downside deviation from the index return. A fuzzy approach leading to the solution of a linear programming problem is presented and applied to the Shanghai 180 index with 30 stocks.
Other authors consider hybrid approaches for tackling IT problems (see, e.g., Bianchi and Gargano 2011;Maringer and Kellerer 2003). Jansen and van Dijk (2002) introduce an objective function that takes into account both the tracking error and the (integer) number of assets to be included in the portfolio. Once the number of stocks has been decided, the amount of total budget invested in each asset is found by solving a quadratic programming problem. Some experiments have been reported for financial datasets of up to 250 stocks. A similar strategy is adopted in Ruiz-Torrubiano and Suárez (2009), where the authors combine an evolutionary algorithm with quadratic programming for solving IT problems with respect to some financial datasets from the OR Library (Beasley et al. 2009).
Stochastic programming has also received consensus in the literature. For instance, Yu et al. (2006) provide a downside risk approach for the index tracking problem formulated via a Markowitz model. Small examples based on subsets of five assets taken from the Hang Seng stock market are reported. Multistage stochastic programming is another recent field of research for IT problems (see Barro and Canestrelli 2004 and the references therein).
In Barro and Canestrelli (2004) the authors formulate and solve a multistage tracking error model in a stochastic programming framework, and test their model by dynamically replicating the MSCI Euro index using an increasing number of scenarios, although with very few assets (up to 9). A two-stage stochastic program is also presented in Stoyan and Kwon (2007) and applied to track a Canadian S&P/TSX index composed by a universe of 1150 stocks.
In general, the exact solution of the IT problem through mathematical programming formulations and methods poses serious challenges in terms of the computational time required, especially for large-size problems. As a consequence, several heuristic procedures have been proposed in the literature. In particular, much attention has been devoted to stochastic search heuristics. These procedures are usually more flexible to deal with the increased complexity of the models due to the introduction of new real-world constraints. For example, Beasley et al. (2009) proposes to use Genetic Algorithms (see also Holland 1975); Gilli and Kellezi (2002), Dueck and Scheuer (1990) and Dueck and Winker (1992) rely on a Threshold Accepting heuristic procedure and, more recently, Krink et al. (2009), Maringer andOyewumi (2007), Fastrich et al. (in press), and Storn and Price (1997) suggest to use a Differential Evolution procedure for the index tracking problem with the cardinality constraint. In particular, in Krink et al. (2009) the authors test their algorithm on two well-known financial dataset, namely Dow Jones 65 and Nikkei 225. The reader is referred to Gilli and Winker (2009) and Gilli and Schumann (2012) for a recent and comprehensive description and discussion on several well-known heuristic techniques and their use in tackling actual financial problems.
The paper is organized as follows. Section 2 is the methodological core of this work. Here, we state the IT optimization problem and we provide its first reformulation as a mixed integer quadratic programming problem. Section 3 briefly describes the state-of-art stochastic search heuristic we use for comparison with our exact approach, and Sect. 4 reports the main empirical findings on real-world datasets and presents a hybrid approach that combines the exact and the heuristic methods.

Problem formulation
We set up the IT problem as a minimization problem with the tracking error volatility as objective function. Formally, we have where n is the number of available assets composing the index; w is the 1 × n vector whose entries w i are the fractions of a given capital invested in asset i; B t is the index or benchmark value at time t , with t = 1, . . . , T ; t is the 1 × n vector of the n assets log-returns at time t = 1, . . . , T .
Constraints (2a) and (2b) are the so-called budget and no-short selling constraints, while (2c) and (2d) limit the quantity and the number of assets to be included in the tracking portfolio, where ε i , ξ i ∈ [0, 1], with ε i < ξ i for i = 1, . . . , n, are the lower and upper bounds for each individual asset weight w i ; L and K are the lower and upper bounds on the number of assets. These two constraints are the well-known Buy-in Threshold and Cardinality constraints, respectively. Constraint (2e), where w i refers to the fraction of capital invested in asset i in the previous time period, is the so-called Turnover constraint. It sets an upper bound T O ∈ (0, 1) on the total change in the portfolio composition, expressed in asset weights, between two consecutive time periods. It allows to control for transaction costs when updating the tracking portfolio in time. Finally, constraint (2f) models the concentration limits of the UCITS rules, where Lb is the lower bound threshold that characterizes an asset weight to be "very large", and Ub is the allowed maximal percentage of the sum of "very large" asset weights (Krink et al. 2009).
The above optimization problem is non linear and non convex due to the presence of constraints (2c)-(2f). Such problems are typically quite difficult to solve, particularly when the problem size is large. However, we now show that the index tracking problem can be reformulated as a more structured problem for which exact solution methods are available.

Problem reformulation
We present here the first Mixed Integer Quadratic Programming (MIQP) formulation for the Index Tracking problem subject to real-world constraints including the UCITS rule described by constraint (2f).
First, note that constraints (2c) and (2d) are commonly modeled (see, e.g., Jobst et al. 2001;Speranza 1996) by using binary variables y i ∈ {0, 1}, with i = 1, . . . , n, so that we can re-write them as: The sum of absolute values in the Turnover constraint (2e) can be linearized as in Konno and Yamazaki (1991) by adding new variables γ i ≥ 0 bounding |w i − w i | for i = 1, . . . , n, so that constraint (2e) becomes equivalent to the following system of linear constraints: We now describe the first mixed integer linear reformulation for the UCITS constraint To this end, we need to introduce new variables v i ∈ {0, 1}, and u i ≥ 0, for i = 1, . . . , n, such that: and We can now replace the nonlinear UCITS constraint (2f) with the following linear ones: The proposed reformulation is justified by the following new result.
Thus, in all cases we have Due to the monotonicity of the square root, the objective function (1) can be squared without changing the set of global minimizers of the IT problem so that (1) can be equivalently rewritten as a quadratic function of the form: Thus, we can reformulate the Index Tracking problem as a Mixed Integer Quadratic Programming problem that minimizes function (14) subject to the linear constraints (2a), (2b), (3), (4), (5), (6), and (9)-(13). The resulting model has 3n continuous variables and 2n binary variables. Obviously, the theoretical complexity status of the MIQP problem does not change as it still falls into the class of NP-Hard problems. However, the above problem can now be tackled by using some standard efficient optimization solvers like CPLEX, which allows to find exact solutions in a reasonable amount of time for small-and medium-size problems as reported in Sect. 4.

A heuristic approach
Stochastic search heuristics work by iteratively refining candidate solutions to an optimization problem using mathematical operators, usually inspired by natural and biological processes. They work by promoting the survival of the fittest, which corresponds to the optimal solution at the end of the run. Among them, Simulated Annealing (SA) (Kirkpatrick et al. 1983) and Threshold Acceptance (Dueck and Scheuer 1990) work by refining a single solution until convergence, while Genetic Algorithms (GA) (Fogel et al. 1966 andHolland 1975), Particle Swarm Optimization (PSO) (Kennedy and Eberhart 1995) and Differential Evolution (DE) (Storn and Price 1997) work by simultaneously evolving a population of candidate solutions. One of their main advantages is their capability of tackling an optimization problem as complex as it is, without requiring any rigid mathematical assumption, such as linearity and continuity, or any additional information, such as the gradient or the Hessian matrix. They have often been criticized because of the need of long runtime, extensive parameter tuning and the lack of theoretical guarantee of converging to the global optimum. However, the current progress in hardware development and the possibility of parallel running have contributed to speed-up their runtime by orders of magnitude, decreasing the gap with alternative methods. Moreover, despite often been considered as a cons, the possibility of using different parameter settings could also provide significant advantages, given that it often allows a better control of the exploitation-exploration trade-off, avoiding premature convergence to local optima. Furthermore, it has been shown that some heuristics, such as Differential Evolution, are rather insensitive to parameter tuning. They are currently becoming standard tools in many application fields, including finance. The reader is referred to Gilli and Winker (2009) and Gilli et al. (2011) for a more comprehensive introduction and discussion of their usage in financial and economic applications. Index tracking is no exception. As Beasley et al. (2009) and Canagkoz and Beasley (2008) report in the literature review sections, many studies have shown that different heuristics could allow to effectively tackle such problem: for example, among the many contributions, Beasley et al. (2009) proposes to use Genetic Algorithms, Gilli and Kellezi (2002) proposes to use Threshold Accepting, and Maringer and Oyewumi (2007) rely Differential Evolution for index tracking. More recently, in Krink et al. (2009) the authors have provided a new heuristic, Differential Evolution and Combinatorial Search for Index Tracking (DECS-IT), which combines Differential Evolution and an ad hoc combinatorial search operator. DECS-IT is a population-based heuristic based on Differential Evolution, originally introduced by Storn and Price (1997), which requires little parameter tuning and has shown better performances in continuous numerical problems and less sensitivity to parameter tuning, when compared to other heuristics, such as genetic algorithms and particle swarm optimization (see, e.g., Paterlini and Krink 2006 for clustering problems and Krink et al. 2009 for IT). One of the main challenges in index tracking is given by the presence of the cardinality constraint in a large search space. This requires to select the optimal subset of assets (a combinatorial problem) and then to fine-tune the optimal asset weights (a continuous problem). In Krink et al. (2009) a position-swapping operator is introduced, which allows to avoid premature convergence of the DECS-IT and to better explore different asset subsets by swapping active weights with zero weights using a probabilistic scheme. After generating and evaluating an initial population, different evolutionary operators are iteratively applied until a termination condition is satisfied. First, the population undergoes mutation and recombination, according to the 'Rand/1/Exp' scheme (Storn and Price 1997). Second, solutions are scaled to sum to one to satisfy the budget constraint. Third, unfeasible solutions are repaired (see Krink et al. 2009, Sect. 3.3, for a description of the constraint handling mechanism). Fourth, the position swapping is applied. Finally, solutions in the current population are evaluated and if they are better than the ones in the previous iteration, they survive to the next iteration. The reader is referred to Krink et al. (2009) for a detailed description of DECS-IT, extensive investigation on parameter tuning, and empirical results on the IT problem with real-world data.

The data
The empirical analysis has been performed on the daily log-returns of four equity indexes and their constituents.  Fig. 1 shows, while the Dax100 and the S&P100 show  DECS-IT has been implemented in MATLAB initializing the population randomly at the first iteration and setting the population size equal to 200, the number of iterations to 10000, the scaling factor to 0.3 and the crossover rate to 0.8. We note however that the DECS-IT performance, as shown in Appendix A of Krink et al. (2009), is rather insensitive to parameter tuning. The largest cost in terms of runtime depends on the number of function evaluations, which corresponds to the product between the population size and the number of iterations, and on the repairing of unfeasible solutions during constraint handling. On the other hand, increasing the problem size while keeping the same number of iterations, does not require much longer runtime. The reader is referred to Appendix A in Krink et al. (2009) for an extensive discussion regarding the parameter tuning.
The Mixed Integer Quadratic Programming problem has been solved using the toolbox TOMLAB/CPLEX 11.2 in a MATLAB 7.11.0 environment on a workstation with Intel Core2 Duo CPU (T7500, 2.2 GHz, 4 GB RAM) under the Windows operating system (MIQP-CPLEX). We also set the value of 10 −8 as absolute and relative tolerances on the gap between the best integer objective found and the objective of the best node remaining in the branch and bound tree.

Heuristic versus exact method
Our analysis relies on two different sets of experiments. In the first set we aim at comparing DECS-IT and MIQP-CPLEX by considering single-period optimization, K = 20 and K = 40, and the different problems sizes of the four datasets (n = 65 for DJ65, n = 98 for Dax100 and S&P100, and n = 225 for the Nikkei225 market). In the second set of experiments we use a rolling window mechanism: by using a window size of 200 observations, we solve the IT problem for overlapping windows built by moving forward in time with step size 20 for 12 time periods. This corresponds to monthly rebalancing. We then compare the objective function values of the portfolios obtained by DECS-IT and by MIQP-CPLEX in each time period. Furthermore, we propose a hybrid method using both approaches that works best for the multi-period case.

Single-period results
The first set of experiments provides some insights regarding the on-going debate between the use of heuristics versus exact methods in IT problems. Here, we solve the Index Tracking optimization problem on the first time window, thus excluding the turnover constraint. In other words, we determine the optimal solutions by only using the first 200 observations of our datasets. Since DECS-IT is a stochastic search heuristics, we run the algorithm 30 times, re-starting each time from a randomly chosen initial population. On the contrary, a single run of MIQP-CPLEX is performed.
To compare the performance of DECS-IT and MIQP-CPLEX, we consider the Percentage Relative Error (PRE) between the optimal solution found by MIQP-CPLEX (MIQPopt) and the optimal solutions of DECS-IT (bestDECS j , j = 1, . . . , 30) in the single-period optimization problem. The Percentage Relative Error in the j -th run (PRE j ) is defined as: Table 2 reports the minimum, mean, median, maximum, and standard deviation values of PRE for K = 20 and K = 40 for the four datasets. We notice a great variability on the maximum PRE that ranges from 11.93 % to 91.23 % but, on the other hand, the mean and especially the median values are small, pointing out that the heuristic procedure is able on average to obtain solutions very close to the optimal (or nearly optimal) ones found by CPLEX. Furthermore, the minimum values are very small, always less than 5.30 %, thus giving further evidence of the good performance of the DECS-IT heuristic. Actually, in one case (i.e., S&P100 for K = 20) the PRE is slightly negative, showing that the best solution found by the DECS-IT heuristic is better than the best solution found by CPLEX within two hours of computing time. It should be observed, however, that running the heuristic 30 times requires around 27000 seconds (i.e., 900 seconds per 30 runs) in this case, which is much longer than the time allocated to CPLEX. We also observe that standard deviations are relatively large (more than 8 %) only in two of the most challenging cases, reaching at most the value of 18.73 %. The last two rows of Table 2 report the time spent for finding an optimal solution using MIQP-CPLEX and DECS-IT (average time in 30 runs), respectively. As expected, when the problem size n increases, the time requested by MIQP-CPLEX to find an optimal solution increases considerably. Hence, we decided to stop the runs after two-hours Table 3 Descriptive statistics of the differences and the sum of the absolute differences in the assets weights composition between the portfolios found by MIQP-CPLEX and the best portfolios found by DECS-IT of computing (7200 seconds) for the S&P100 and the Nikkei225 datasets thus obtaining a good (possibly optimal) solution but without a guarantee for optimality. On the other hand, increasing the problem size has a smaller effect on the DECS-IT average time. The good performance of DECS-IT is also remarked in Table 3 where we report some statistics about the (absolute) differences between the weights of the portfolios found by CPLEX and the best portfolios found by DECS-IT. The maximum, mean and the standard deviation values in the first three rows of the table account for this comparison. The minimum of such differences is always equal to zero. The last row also reports the sum of the absolute differences of the portfolios weights. Table 3 points out that the gap found in the objective function values are indeed mostly due to the small differences in the assets weights composition.

Multi-period results and a hybrid approach
In the second set of experiments, we compare the tracking errors of the portfolios found in all time windows by MIQP-CPLEX and by DECS-IT. It is important to notice that some caution is needed in order to provide a fair comparison between the values of the solutions obtained by MIQP-CPLEX and by DECS-IT in a given time window. Indeed, we recall that the turnover constraint depends on the values w i , i = 1, . . . , n, of the fractions of the capital invested in each asset in the previous window. Hence, such values determine the shape of the feasible region for all windows from 2 to 12. Recall that the values w i are the solutions of the optimization problem in the previous window, and hence could be different if computed by MIQP-CPLEX or by DECS-IT. Thus, to correctly compare the quality of the solutions found by the two methods for all windows, the idea is to first implement DECS-IT for finding the solutions (portfolios) in all the 12 time periods, and then use the weights of the portfolios obtained in the corresponding turnover constraint when implementing MIQP-CPLEX. This guarantees that, in each window, the structure of the turnover constraint is always the same as the one considered in the heuristic approach, thus not affecting the feasible region. More precisely, in order to perform more experiments, among the 30 runs of the DECS-IT heuristic we choose the five sets of 12 solutions that give the best values of the objective function in the first window, and, as mentioned above, we use these solutions in the turnover constraints of the MIQP model to compute the optimal solution for the subsequent window with CPLEX.
In Table 4, for each dataset, for K equal to 20 or 40, and for each time period, we report the average Percentage Relative Errors computed with respect to the five sets of portfolios used, along with the cumulate times required by MIQP-CPLEX to solve all problems for windows 2-12. We observe that now the average relative error is considerably smaller and no more than 10% in all cases. Furthermore, in this case CPLEX was able to solve to optimality all 11 problems in about one tenth of the time needed to solve to optimality (when possible) the single problem regarding the first window, which does not include the turnover constraints. These results suggest that the DECS-IT heuristic could be combined with the exact MIQP-CPLEX approach in a more efficient and accurate hybrid method: in the first window one obtains with the heuristic procedure nearly optimal solutions in a reasonable time. Such solutions are then used as reference points in the turnover constraint of the MIQP model that is solved to optimality for the second window with CPLEX in a reasonable time even for medium/large problems. Then, for windows 3 to 12 the exact optimal solutions are still found with CPLEX using as reference points in the turnover constraint the exact solutions of the previous window. This approach combines the good features of both methods for a fast and accurate solution for the IT problem.

Performance analysis
In this section, we aim to provide some insight regarding the in-sample and out-of-sample financial performance of the optimal portfolios obtained by solving the IT problem with MIQP-CPLEX and with DECS-IT. First, we solve the MIQP problem in each of the twelve 200 observations time window (in-sample analysis), and we hold the optimal portfolio found for the following 20 periods (out-of-sample analysis). This is the same rolling window scheme used in Krink et al. (2009). The out-of-sample performance of MIQP-CPLEX is evaluated first by comparing it with the out-of-sample performance statistics of DECS-IT provided in Krink et al. (2009) for DJ65 and Nikkei225, where the optimal portfolio is held unchanged for the 20 observations consecutive to the in-sample period. Then, we further investigate the relationship between the in-and out-of-sample errors of MIQP-CPLEX and of DECS-IT finding that the out-of sample errors are, as expected, larger than the in-sample errors (see, Fig. 2). Furthermore, the out-of-sample performances of the in-sample optimal portfolios found by MIQP-CPLEX are good but not always better than the out-of-sample performances of the 30 portfolios found by DECS-IT.
In Tables 5 and 6 we compare for DJ65 and Nikkei225 the performances of the optimal portfolios found with MIQP-CPLEX applied to our exact model to those of the portfolios found with the DECS-IT heuristic in Krink et al. (2009). Similar results have been obtained  with MIQP-CPLEX for the Dax100 and for the S&P100 indexes. Details are available upon request. We point out that, due to the problem dimension, for the Nikkei225 dataset we stopped MIQP-CPLEX after two-hours of computing. However, for both datasets, we ob- serve that the in-sample tracking volatility values are lower than the corresponding ones presented in Krink et al. (2009). The in-sample annualized Tracking Error volatility and the Excess Return correspond to the average of the annualized in-sample tracking error volatilities and to the annualized insample excess returns, respectively, for the 12 time periods considered in our experiments (see also Krink et al. 2009). The out-of-sample beta relates to the regression of the portfolio returns during the holding period against the returns from the index (Canagkoz and Beasley 2008). The closer to 1, the better the tracking performance of a portfolio. Indeed, a portfolio that perfectly tracks a market index has beta exactly equal to 1. The reader is referred to Krink et al. (2009) for a detailed description of the reported statistics.
From a tracking performance viewpoint we notice that, as expected, the in-sample tracking errors decrease when increasing the parameter K, with a steeper slope for smaller values of K that eventually flattens for larger values. Clearly there is no guarantee of such behavior for the out-of-sample tracking error. However, this decrease has been observed in all our experiments except for a slight inversion with K = 35 and K = 40 for the S&P100 index. In spite of this, from a practical viewpoint increasing the number of assets in the tracking portfolio leads to a more expensive investment in terms of transaction and managing costs. We also note that no monotonic behavior seems to show up for the in-sample or out-of-sample excess returns, and for the beta and correlation with respect to the index. However, the latter two parameters turned out to be close to 1 for all values of K thus confirming the excellent tracking properties of the optimized portfolios. Fig. 2 provide further evidence of the fact that in-sample optimization does not necessarily guarantee best out-of-sample performance. Indeed, considering only the first 200 observations, we compare the in-sample and out-of-sample tracking errors of the optimal portfolios found with MIQP-CPLEX to those of the portfolios found by the DECS-IT heuristic in 30 runs. We observe that three out of four cases (i.e., Scatterplots (b), (c) and (d)) suggest the presence of a positive linear relationship between the in-sample and the out-of-sample tracking error, but such relationship is negative for DJ65 (i.e., Scatterplot (a)). Hence, further investigation on different dataset, time periods, and on a larger number of cases is advisable to ascertain the presence of a systematic relationship between the in-sample and the out-of-sample tracking performance. Furthermore, when the in-sample tracking error of the optimal portfolio is the smallest, its corresponding out-ofsample performance is the best one just in one case (Nikkei225), although it is almost in all cases among the best ones. This should partially justify the search for the optimal in-sample solution with MIQP-CPLEX when possible. However, since DECS-IT is able to find in reasonable time near-optimal portfolios for the first 200 observations time window (i.e., without turnover constraints), it could be fruitfully used in combination with MIQP-CPLEX, which is faster and exact, in the remaining time windows to obtain in a timely manner near-optimal IT portfolios that have good out-of-sample performances.