Mathematical formulations for scheduling jobs on identical parallel machines with family setup times and total weighted completion time minimization

This paper addresses the parallel machine scheduling problem with family dependent setup times and total weighted completion time minimization. In this problem, when two jobs j and k are scheduled consecutively on the same machine, a setup time is performed between the finishing time of j and the starting time of k if and only if j and k belong to different families. The problem is strongly NP-hard and is commonly addressed in the literature by heuristic approaches and by branch-and-bound algorithms. Achieving proven optimal solution is a challenging task even for small size instances. Our contribution is to introduce five novel mixed integer linear programs based on concepts derived from onecommodity, arc-flow and set covering formulations. Numerical experiments on more than 10000 benchmark instances show that one of the arc-flow models and the set covering model are quite efficient, as they provide on average better solutions than state-of-the-art approaches with shorter computation times, and solve to proven optimality a large number of open instances from the literature.


Introduction
Let J = {1, . . . , n} be a set of jobs to be scheduled on a set M = {1, . . . , m} of identical parallel machines, with m ≤ n. Each job j ∈ J has an integer processing time p j and an integer penalty weight w j . In addition, each job j ∈ J belongs to a family i ∈ F = {1, . . . , f }, with f ≤ n, so the set of jobs can be partitioned as J = ∪ i∈F J i , in such a way that each set J i contains the jobs of family i ∈ F . Let i(j) denote the index of the family of job j. An integer setup time s i is associated with each family i ∈ F , meaning that if a job j is processed on a machine immediately after a job k and i(j) = i(k), then a setup of s i units of time must be performed before starting processing j. Preemption is not allowed, so once the processing of a job is started it cannot be interrupted. The objective is to schedule all jobs on the machines by minimizing the sum of their weighted completion times. In the three field classification of Graham et al. (1979), this problem is referred to as P |s i | w j C j , where C j stands for the completion time of job j. The P |s i | w j C j is a strongly N P-hard problem, as the simpler problem with unitary weights is known to be N P-hard (see Webster 1997).
The problem is interesting because models many real-world situations, where some activities must be performed by a set of agents in parallel, and each activity has a certain level of importance expressed by the weight. Many production problems can be indeed modeled as a P |s i | w j C j , and in such cases the weight usually defines a monetary importance of a job. Other relevant applications arise in the context of health-care, where, for example, patients have to be assigned to surgery rooms that must be equipped by considering the type (i.e., the family) of surgery to be performed. In such cases, the weight usually models a level of urgency for the patient. As stressed in the surveys of Potts and Kovalyov (2000), Allahverdi et al. (2008) and Allahverdi (2015), the P |s i | w j C j has not received much attention in the recent literature. To our knowledge, the most recent works on parallel machine scheduling problem with family setup and total weighted completion time minimization are the ones of Liao et al. (2012) and Tseng and Lee (2017). The former article proposes an heuristic that combines elements from tabu search, least loaded processor rule and the comparison with schedules without setups, whereas the latter proposes an electromagnetism-like based metaheuristic. Regarding exact methods, mainly branch-andbound (B&B) algorithms have been developed, and the last relevant papers date back to the early 2000s (Webster and Azizoglu, 2001;Chen and Powell, 2003;Dunstall and Wirth, 2005b;Bettayeb et al., 2008). According to Allahverdi et al. (2008) and Allahverdi (2015), the papers of Chen and Powell (2003) and Dunstall and Wirth (2005b) still represent the state-of-the-art exact approaches to solve the P |s i | w j C j .
Some closely related problems, such as the 1|| C j , the 1|| w j C j , the P || C j and the P || w j C j are well studied in the literature. The first three problems are polynomially solvable by the (weighted) Smith's ratio rule.The P || w j C j , instead, was proven to be N P-hard by Bruno et al. (1974). Kramer et al. (2018) recently addressed this problem by means of an arc-flow (AF) formulation that models the sequences of jobs on the machines as paths in a network of pseudo-polynomial size. The AF formulation solved instances with up to 1000 jobs in a few minutes of computing effort, consistently improving previous results in the literature. These considerations motivated us to propose new mathematical formulations and exact methods, as a tentative to fill the lack in the literature for exact methods for the P |s i | w j C j . In particular, we propose and investigate five mixed integer linear programs (MILPs). All proposed formulations are novel and include different ways to tackle the difficulties of the problem at hand. The first one is a one-commodity (OC) formulation that has polynomial size in the number of jobs. The next three MILPs are AF models that formulate the problem by making use of pseudo-polynomial numbers of variables and constraints. The last one is a set covering (SC) formulation that uses an exponential number of variables. Enumerating all variables in the SC model is unpractical, so we solve it by a tailored branch-and-price (B&P) approach. In addition to that, we also investigate simple but effective valid inequalities.
We performed extensive computational experiments on sets of benchmark instances from the literature. As shown below, our results prove that one of the AF formulations and the SC one are able to solve to proven optimality several open instances from the literature. Despite these good results, the problem remains very challenging, especially because of the family setup times. All benchmark instances of the P || w j C j with up to 400 jobs were solved to proven optimality by Kramer et al. (2018) with an AF. Despite our attempts, instances of the P |s i | w j C j with just 40 jobs remain unsolved.
The remainder of this work is organized as follows. A concise review of the literature on the P |s i | w j C j and closely related problems is provided in Section 2. The newly proposed mathematical formulations are introduced and described in Section 3. The extensive computational experiments that we performed are presented and discussed in detail in Section 4, then Section 5 reports the concluding remarks and future research directions.

Brief literature review
Despite its importance in practice, the P |s i | w j C j has not received much attention in recent years. Nevertheless, interesting theoretical results have been achieved in the 90s and early 2000s. Many of these results were attained by extending known properties of closely related problems, such as the 1|| C j , the P || C j , the 1|| w j C j , the P || w j C j and the 1|s i | w j C j . All these problems have in common the fact that they can benefit from the well known Smith's rule, also known as shortest processing time (SPT) rule. The first and the second problems are polynomially solvable by the direct application of the SPT. The 1|| w j C j in turn, is efficiently solved by a modified version of the SPT, denoted weighted shortest processing time (WSPT), in which jobs are sorted by non-decreasing order of their p j /w j ratio. For what concerns the P || w j C j , the 1|s i | w j C j and the P |s i | w j C j itself, we mention that the first and the last problem were proven N P-hard by Bruno et al. (1974) and Webster (1997), respectively, while the second can be solved in O(f 2 n f ) by a dynamic programming (DP) algorithm proposed by Ghosh (1994). It is well-known (see, e.g, Elmaghraby and Park 1974) that there exists an optimal solution of the P || w j C j where the sequences of jobs on each machine follow the WSPT order. Numerous algorithms take advantage of this property. Among these, we highlight the B&B methods of Azizoglu and Kirca (1999), Chen and Powell (1999) and van den Akker et al. (1999) (improved very recently by Kowalczyk and Leus 2018), as well as the enhanced AF formulations by Kramer et al. (2018).
The WSPT rule is also useful for the 1|s i | w j C j . Monma and Potts (1989) showed that in an optimal solution jobs of the same family must be scheduled according the WSPT order. Mason and Anderson (1991) proved that there exists an optimal solution where the batches appear in the shortest weighted mean processing time (SWMPT) order. Some B&B and heuristic algorithms based on these properties were proposed by Crauwels et al. (1997Crauwels et al. ( , 1998 and Dunstall et al. (2000).
The above results on the 1|s i | w j C j were also extended to the parallel machine case, where for an optimal solution the sequences on any machine must fulfill the two mentioned conditions. Webster and Azizoglu (2001) proposed two DP algorithms to solve the P |s i | w j C j . When the number of families and machines are fixed, their methods are polynomial in the sum of the job weights and in the sum of the family setup and job processing times, respectively. Azizoglu and Webster (2003) and Dunstall and Wirth (2005a) developed B&B algorithms. These B&Bs are based on list-scheduling techniques, where partial schedules are constructed at each node of the B&B tree, and on lower bounding techniques to solve 1|s i | w j C j and P || w j C j subproblems. These approaches have been able to solve instances with up to 25 jobs. Chen and Powell (2003) also tackled the problem, but by means of a B&P algorithm in which each node is solved by invoking a column generation (CG) algorithm. By using this approach, the authors were able to solve some instances with up to 40 jobs and 6 machines. More recently, Bettayeb et al. (2008) developed a B&B method that relies on tighter lower bounds than those adopted by Dunstall and Wirth (2005a) and is able to reduce the number of explored nodes, although at the expense of higher computational times. This method could solve some instances with 45 jobs and limited number of families and machines.
Heuristic and metaheuristic methods were also proposed in the literature. In the work of Dunstall and Wirth (2005b) multiple heuristic methods based on list-scheduling, single machine subproblems and improvement phases were designed. The authors evaluated their methods by comparison with exact methods and lower bounds from the literature on instances containing up to 80 jobs, 5 machines and 16 families. They showed that their methods were able to provide small optimality gaps for all such instances. More recently, Liao et al. (2012) and Tseng and Lee (2017) developed new heuristic and metaheuristic approaches, respectively. The approach by Liao et al. (2012) is based on a tabu search that solves parallel machine subproblems without setups with the SWMPT rule, while Tseng and Lee (2017) proposed an electromagnetism-like based metaheuristic that relies on the attraction-repulsion mechanism from the electromagnetic theory. In both works, the performances of the methods were evaluated on benchmark instances with up to 80 jobs.

Mathematical formulations
This section presents the novel mathematical models that we developed for solving the P |s i | w j C j . It also includes some simple but effective strengthening constraints for one of the AF models.
Before presenting our formulations, let us introduce a simple example, called Example 1 in the following, with 6 jobs divided into 3 families to be scheduled on 2 machines. The input data of Example 1 are given in Table 1, where T = 26 represents a valid upper bound on the completion time of the activities. An optimal solution of value z * = 2 × 11 + 1 × 7 + 3 × 5 + 3 × 12 + 4 × 7 + 2 × 20 = 148 is provided in Figure 1.

Families Jobs
i s i j p j w j w j /p j 1 2 1 4 2 2.00 2 2 1 2.00 3 3 3 1.00 2 4 4 5 3 1.67 5 3 4 0.75 3 3 6 6 2 3.00 Note that in the depicted optimal solution jobs belonging to the same family are scheduled consecutively on the same machine, but this is not mandatory. This situation is likely to occur often once it allows avoiding the use of setup times. A family setup is only performed between successive jobs on the same machine if the jobs belong to different families.
In the solution, a family setup of 3 units of time is added between job 1 and job 6, both processed on machine 1. We notice that before starting processing the first job on each machine, its family setup must be performed. Let us also notice that the batches of consecutive jobs of the same family are scheduled considering the WSPT rule, which is compliant with the propositions of Monma and Potts (1989) and Dunstall and Wirth (2005a). The value T = 26 in Example 1 represents a time horizon estimation, i.e., an upper bound on the completion time of all activities (makespan). The value of T can be estimated by using Property 2 of Chen and Powell (2003) as follows. For each job j, we compute an upper bound T j on its completion time by first determining with K i,j representing the set of jobs of family i(j) that appear before job j in the WSPT order, where i(j) represents the family of job j. Then T j is simply obtained by setting T j = min(α j , β j ), for j ∈ J, and the time horizon value T is calculated as T = max j∈J {T j }.
In the remainder of this work, the value of T is computed in this way for all the developed formulations.

One commodity formulation
Our OC formulation relies on the idea first introduced by Gavish and Graves (1978) for the traveling salesman problem, and widely used since then to formulate many variants of scheduling (e.g., Unlu and Mason 2010), traveling salesman (e.g., Salazar-González and Santos-Hernández 2015) and vehicle routing problems (e.g.,Bruck and Iori 2017), among many. Essentially, the idea is to use a flow variable that models the time when ending the process of a job and starting a new one.
Let us build a graph G = (V, A), with V = J ∪ {0, n + 1} and A = {(j, k) : j ∈ V \ {n + 1}, k ∈ V \ {0}, j = k}, where 0 and n + 1 represent origin and destination dummy nodes, respectively. Let σ jk for (j, k) ∈ A, represent the setup times between jobs j and k, assuming value s i(k) (where s i(k) is the setup related to the family i(k) of job k) if i(j) = i(k), and 0 otherwise. The variables used by the OC model are: (i) a binary decision variable x jk , assuming value 1 if job j is scheduled before job k, 0 otherwise, for all (j, k) ∈ A; (ii) a non-negative continuous variable C j representing the completion time of job j; and (iii) a continuous variable τ jk that represents the starting time of job k if scheduled immediately after job j, for all (j, k) ∈ A. The OC formulation is then: k∈V \{0} x jk = 1 j ∈ J k∈V \{n+1} The objective function (1) seeks the minimum total weighted completion time. Constraints (2)-(5) are flow constraints ensuring that all jobs are processed exactly once and that m machines are used. Constraints (6) and (7) define an ordered sequence of finishing and starting times of the jobs and their completion times. Constraints (8) define the domain of the x jk variables, and constraints (9) impose the bounds of variables τ jk and their relation with variables x jk . Note that the model contains a polynomial number of variables O(n 2 ) and constraints O(n). Note also that the C j variables are simply used as support variables as they could be replaced by the summation in (7). Regarding Example 1, the OC optimal solution is: C 1 = 11, C 2 = 7, C 3 = 5, C 4 = 12, C 5 = 7, C 6 = 20; x 05 = x 54 = x 47 = x 03 = x 32 = x 21 = x 16 = x 67 = 1; τ 05 = 0, τ 54 = 7, τ 47 = 12, τ 03 = 0, τ 32 = 5, τ 21 = 7, τ 16 = 11, τ 67 = 20 (and all other variables take the value 0).

Arc-flow formulations
AF formulations represent combinatorial optimization problems by means of flows on a capacitated network. Regarding scheduling problems, these flows can be easily associated with schedules. AF formulations were successfully applied to bin packing and cutting stock problems (see, e.g., Valério de Carvalho 1999; Delorme et al. 2016) and vehicle routing problems (Macedo et al. 2011). Very recently, good results were also obtained in the scheduling field, but without setup times, by Kramer et al. (2018) and Mrad and Souayah (2018). To the best of our knowledge, this is the first time AF are used to model a scheduling problem with setup times. We propose three different ways for achieving this.

Family layered arc-flow formulation
Our first AF models the m sequences on the machines as paths on a network composed of a source node, a sink node and intermediate nodes distributed on f layers, one for each family. The nodes are connected by arcs on a set composed by: job arcs, that model the processing of a job; setup arcs, that model the transition from a family to another; and loss arcs, that simply lead to the sink node. Each path starts at the source node and finishes at the sink node.
Formally, for this AF model, that we call family layered arc flow (AF f l ), we are given a direct acyclic multi-graph G = (N , A). Each node in N is defined by a pair (i, q), where i ∈ F ∪ {0} either takes the index of a family or the dummy index 0, and q stands for the time. Instead of simply using the values of q in 0 ≤ q ≤ T , we consider a reduced set by using q ∈ N , where N ⊆ {0, . . . , T } represents the set of normal patterns, i.e., the set of all feasible combinations of job processing times and family setup times up to T (we refer the reader to Côté and Iori 2018 for a formal definition of normal patterns). The set of nodes is partitioned in such a way that N = ∪ i∈F N i ∪ {(0, 0), (0, T )}, with N i = {(i, q) : q ∈ N } representing all possible starting and ending times of jobs in layer i, for all i ∈ F , and (0, 0) and (0, T ) representing source and sink nodes, respectively. The set of arcs is partitioned as A = A ∪S ∪L∪I. The set A = {(i, j, q, r) : i ∈ F ; j ∈ J i ; q, r ∈ N ; r = q +p j } contains the job arcs from node (i, q) to node (i, r), representing the fact that job j belonging to family i is processed during the time interval [q, r). Note that the arcs in A remain in the same layer i. Differently from A , the set S = {(i, h, q, r) : i, h ∈ F ; i = h; q, r ∈ N ; r = q + s h } is the set of setup arcs from node (i, q) to node (h, r), that represent a transition between the processing of two jobs from different families. In other words, an arc (i, h, q, r) states a change from family i to family h starting at time q, lasting s h and consequently finishing at r = q + s h . The set L = {(i, 0, q, T ) : i ∈ F ; q ∈ N } models the loss arcs that link node (i, q) to the sink node (0, T ). Finally, the set D = {(0, i, 0, s i ) : i ∈ F } contains the arcs from the origin node (0, 0) to node (i, s i ), thus modeling the initial setup times and meaning how many times each family has the first job on the machines.
Our AF f l model is based on the following variables: (i) binary variable x ijqr assuming the value 1 if job arc (i, j, q, r) ∈ A is taken, 0 otherwise; (ii) an integer variable s ihqr representing the number of times setup arc (i, h, q, r) ∈ S is chosen; (iii) a continuous variable l iq indicating how many loss arcs (i, 0, q, T ) ∈ L are selected; and (iv) a continuous variable d i giving the number of times dummy arc (0, i, 0, s i ) ∈ D is used. The P |s i | w j C j can be modeled as an AF f l as follows: Constraints (11) guarantee that all jobs are scheduled at least once; constraints (12) and (13) ensure that m machines are used. Constraints (14) impose flow conservation and thus ensure that only feasible schedules are obtained. The domains of the variables are defined by constraints (15)-(18). The AF f l formulation has a pseudo-polynomial number of variables O( (n + f 2 )T ) and constraints O(f T ). Figure 2 shows an optimal solution of Example 1 by AF f l formulation (10)-(18). The depicted solution contains two paths, each representing the schedule of a machine. The first path starts with an arc from the origin (0, 0) to node (1, 2), then all jobs from family 1 are sequenced and, before sequencing job 6 from family 3 and closing the path with variable l 3,20 , the setup arc (1, 3, 11, 14) ∈ S representing the change from family 1 to family 3 is used. The second path concerns the sequencing of jobs from family 2, where d 2 models the initial setup time for the family.

Arc-flow formulation with alternating paths
Our second AF formulation, denoted as AF ap , relies on the idea of shrinking the AF f l layers into a unique layer. Our AF ap still models the P |s i | w j C j as the problem of finding m paths from source node 0 to sink node T in such a way that all jobs are scheduled. This time, however, Each path should alternate between (i) a job arc and (ii) either a setup, a dummy or a loss arc. To this aim, we consider a direct acyclic multigraph G = (N, A) where the set of nodes N ⊆ {0, . . . , T } represents the set of normal patterns previously discussed. Set A contains job arcs A = {(q, r, j); j ∈ J, q ∈ N, r = q + p j }, representing the fact that job j starts at time q ∈ N and finishes at r = q + p j ; setup arcs S = {(q, r, i); i ∈ F, q ∈ N, r = q + s i }, representing a transition to family i starting in q ∈ N and finishing in r = q + s i ; dummy arcs D = {(q, q, i); i ∈ F, q ∈ N }, representing a dummy setup of zero units of time between two jobs from the same family; and loss arcs L = {(q, T ); q ∈ N \ {0, T }}, leading the path to the sink node T . In addition, let us define sets A j , δ + (q) and δ − (q) as the sets of job arcs of j ∈ J, the set of all arcs starting at time q ∈ N , and the set of all arcs ending at time q ∈ N , respectively.
The AF ap formulation makes use of four sets of variables, namely: (i) a binary variable x qrj taking value 1 if arc (q, r, j) ∈ A is selected, 0 otherwise; (ii) an integer variable s qri indicating how many setup arcs (q, r, i) ∈ S are selected; (iii) a continuous variable d qqi for each dummy arc (q, q, i) ∈ D, giving the number of times a dummy arc from q to q of family i is selected; and (iv) a continuous variable l qT representing the number of loss arcs (q, T ) ∈ L that are taken. Then, the P |s i | w j C j can be modeled as: Constraints (20) guarantee that all jobs are scheduled at least once; constraints (21)- (22) force m machines to be used, starting with a setup arc and ending with either a loss or a job arc. Constraints (23)-(26) impose the flow conservation (in the way described in detail below), while in constraints (27) (23) and (26), respectively. From Figure 3, it is possible to note that constraints (23) state that every time a job arc finishes at time q, it must be followed by either a setup, a dummy or a loss arc starting at q. Following the same line of reasoning constraints (26), depicted in Figure 4, model the fact that every time a setup or a dummy arc of family i is used, it should be necessarily be followed by a job arc associated with this family . Concerning Example 1, an AF ap optimal solution is shown in Figure 5. It is possible to identify two paths (one for each machine) starting with setup arcs and finishing with loss arcs. When two jobs belonging to the same family are scheduled consecutively one after the other, a dummy arc is present between them (this is the case for d 5,5,1 , d 7,7,2 and d 9,9,1 ). If instead two jobs from different families are sequenced one after the other, then a setup ar is needed (this is the case of (s 11,14,3 ) that models the change from family 1 to family 3).

Arc-flow formulation with doubled jobs
Our third AF formulation, called AF with doubled jobs (AF dj ), still seeks for m paths on a capacitated network, but is supported by the idea of creating additional "long" job arcs that model both setup and processing of a job. The AF dj formulation makes use of a direct acyclic multigraph G = (N, A). The set of nodes N has the same meaning than the set of nodes in Section 3.2.2, whereas the set of arcs is now partitioned as A = A ∪ A ∪ L. Again, sets A and L have the same meaning as the sets of job and loss arcs in Section 3.2.2, while A = {(q, r, j); j ∈ J; q ∈ N ; r = q + p j + s i(j) } is the set of long job arcs. Each long job arc has a binary variable x qrj associated with, that assumes the value 1 if arc (q, r, j) ∈ A is taken, 0 otherwise. Similarly to AF ap , sets δ + (q) and δ − (q) are the sets of all arcs (job and long job ones) starting at time q ∈ N , and of all arcs ending at time q ∈ N , respectively. The P |s i | w j C j can be then modeled as: s.t. (q,r,j)∈A x qrj + (q,r,j)∈A x qrj ∈ {0, 1} (q, r, j) ∈ A (41) Constraints (35)  An AF dj optimal solution for Example 1 is presented in Figure 6, where x 5,9,1 , x 7,12,4 and x 9,11,2 are job arcs; x 0,7,5 , x 0,5,3 and x 11,20,6 are long job arcs; and l 12,26 and l 20,26 are loss arcs. Note that the long arc x 0,7,5 models the setup time to family i(5) = 2 (s i(5) = 4) plus the processing time of job 5 (p 5 = 3). 0 T = 26 5 7 9 11 12 20 x 0,7,5 x 0,5,3 x 7,12,4 x 5,9,1 x 9,11,2 x 11,20,6 l 12,26 l 20,26 Figure 6: AF dj optimal solution of Example 1

Set covering formulation
SC formulations are commonly used to model various combinatorial optimization problems, including but not limited to: bin packing (Valério de Carvalho, 1999), cutting stock (Delorme et al., 2016), vehicle routing (Feillet, 2010) and scheduling (van den Akker et al., 1999). In our case, a set Ω containing all possible feasible single machine schedules, upper bounded by the time horizon T and respecting family and setup constraints, is given. Each schedule ω ∈ Ω has a cost c ω . Furthermore, let a jω be a coefficient that takes the value 1 if job j ∈ J is present in schedule ω, 0 otherwise, and let x ω be a binary variable assuming the value 1 if schedule ω is taken, 0 otherwise. The SC formulation is: x The objective function (44) (47) contains an exponential number of variables. Hence, explicit enumerating all of them is unpractical. Therefore, we propose a branch-and-price (B&P) algorithm to solve it. In general words, a B&P algorithm is a B&B in which each node of the tree is solved by means of a column generation (CG) approach. The CG method basically decomposes the model into a master problem(MP), which is initialized with a small subset of columns of Ω, and one or more subproblems, called pricing problems. To find promising columns, the CG algorithm benefits from the linear program (LP) dual information, given by the optimal solution of the continuous relaxations of the MP. The columns obtained by the resolution of the pricing problems are added on the fly to the MP. Thus, master and pricing problems are solved iteratively until no more attractive columns exist. In this way, it is possible to solve the SC formulation without the need of explicitly enumerating all columns. A detailed explanation of CG algorithm can be found in, e.g., Lübbecke and Desrosiers (2005).
In our B&P algorithm, the CG pricing subproblem consists of finding columns in Ω, i.e, feasible single machine schedules in the time interval [0, T ], that have negative reduced costs. To this aim, the first option would consist in choosing as subproblem the N P-hard elementary shortest path problem with resource constraints (ESPPRC). As finding columns with the ESPPRC is a difficult task, we opted instead, on obtaining columns in a larger set Ω ⊇ Ω, by solving a relaxed version of the ESPPRC where a job can be included more than once in a column, but not tow or more times consecutively one after the other. The relaxed subproblem is defined as the problem of finding paths with negative reduced costs on a network G = (N , A) where N = {(j, q) : j ∈ J; s i(j) ≤ q ≤ T − p j } ∪ (0, T ), is the set of nodes, with (0, T ) representing the sink node, and A = {(j, q, k, r) : (j, q), (k, r) ∈ N ; r = q + p j , if i(j) = i(k), r = q + p j + s i(k) , otherwise} is the set arcs. This problem can be solved efficiently in pseudo-polynomial time by a modified version of the DP algorithm presented in Pessoa et al. (2010) for the P || w j T j . The DP algorithm that we use is shown in Algorithm 1.
In Algorithm 1, λ 0 and λ j , j ∈ J, are the dual variables associated with constraint (46) and constraints (45), respectively. State F (j, q) represents the reduced cost of a path that finishes with the processing of job j starting at time q.
Our B&P is initialized with the set of columns corresponding to the WSPT heuristic solution and the set of columns in the solution obtained by the metaheuristic by Kramer and Subramanian (2017), which we invoke before executing our B&P. The WSPT heuristic solution is obtained by iteratively sequencing the jobs (previously sorted according the WSPT order) on the current least-loaded machine, without considering the setup times. Once all jobs are sequenced, the setup times are included, thus obtaining a feasible solution for the P |s i | w j C j . Then, at each iteration of the CG algorithm, the column corresponding to the pricing solution with most negative reduced cost, given by Algorithm 1, is introduced in the MP. If the node LP solution is not integer and its lower bound is smaller than the overall lower bound, then branching is performed using the same branching rule as in van den Akker et al. (1999). The rule first identifies a job j satisfying ω∈Ω * C j (ω)x * ω > min{C j (ω) : x * ω > 0}, where: Ω * is the set of columns in the optimal solution of the MP; x * ω is the value of variable x ω in this optimal solution; and C j (ω) is the completion time of job j in column ω. It then creates two nodes, one by limiting C j ≤ min{C j (ω) : x * ω > 0} and the other by imposing C j ≥ min{C j (ω) : x * ω > 0} + 1. In practice, branching is obtained by imposing new release dates and deadlines on the jobs at each node. Then, the nodes are explored using the best bound strategy.

Computational experiments
In this section, we present the computational experiments carried out to assess the performance of the methods proposed in Section 3. The models were code in C++ and solved using Gurobi Optimizer 7.0. The tests were executed by using a single thread on a computer equipped with a quad-core Intel Xeon E5530 2.40GHz processor, 20GB of RAM and Ubuntu 14.04.5 LTS operating system. In the experiments, all our proposed methods received an initial upper bound as cutoff value. This cutoff was obtained by the application of the iterated local search (ILS) metaheuristic of Kramer and Subramanian (2017) originally devise to deal with a large class of scheduling problems. For each instance, we ran the ILS with a time limit of n/30 seconds. In Section 4.1, we discuss the considered benchmark instances from the literature. The computational results are presented in Section 4.2.

Benchmark instances
In our experiments, we considered three sets of benchmark instances. The first and the second were originally proposed by Dunstall and Wirth (2005a) and Dunstall and Wirth (2005b), respectively. The first set is composed by 4800 instances, with n ∈ {10, 15, 20, 30}, m ∈ {2, 3, 5} and f ∈ {1, 3, 5, 8}. The job processing times and weights were randomly drawn from the intervals [p min , p max ] = [1, 100] and [w min , w max ] = [1, 10], respectively. The family setup times were randomly generated in the range [0,50]. For each combination of (n, m, f ), 100 instances were created. The second set is composed by large sized instances with n ∈ {40, 80}, m ∈ {2, 3, 5} and f ∈ {1, 3, 5, 8, 12, 16}. The processing times and weights of the jobs were created as in the previous set, but the setup times are now random integer numbers in the range [0, s max ], with s max ∈ {50, 100}. For each combination of (n, m, f, s max ), 100 instances were created, resulting in a total of 7200 benchmark instances.
The three sets of instances from the literature used in this work are available for download at http://www.or.unimore.it/site/home/online-resources.html.

Results on benchmark set 1
Concerning the experiments on benchmark set 1, a time limit of 600 seconds has been imposed on each execution. First, we provide in Tables 2 and 3 a comparative analysis among our proposed formulations. The tables report, for each method and group of 100 instances, the number of instances solved to the proven optimality, #opt, the average final gap, gap(%) and the average gap between the linear programming (LP) relaxation lower bound and the best upper bound, gap lp (%).
It can be noticed that the OC performance is very poor, as the model fails in solving most of the instances with n > 15. This can be explained, by its weak LP relaxation, whose gap is slightly above 50% on average. On the contrary, the continuous relaxations of the flow based models AF f l , AF ap , AF + ap and AF dj are much stronger, specially AF dj , whose LP bound is around 1% for most of the cases. From these tables, it can also be noticed the positive impact of the proposed strengthening constraints (31)-(33), as indeed AF + ap produces LP bounds around 1% better than the ones of AF ap , on average. Regarding the number of problems solved to the proven optimality, AF dj and SC clearly outperform all other formulations. Both methods managed to solve all instances with up to 20 jobs. Concerning the problems involving 25 jobs, AF dj solved to the proven optimality all of them with 5 machines, but missed 4 with 3 machines and 66 with 2 machines. The SC model, in turn, solved all 25 jobs instances but one.
We now compare, in Tables 4 and 5, our best methods, AF dj and SC, with the ones by Dunstall and Wirth (2005a). In their work, the authors proposed two B&B methods, both based on a list-scheduling approach where partial schedules are built at each node of the tree, but differing in the branching scheme. The first method is based on a bestprocessor (BP) strategy, whereas the second one on a least-loaded-processor (LLP) strategy. As stopping criterion, the authors imposed a limit of 10 million for the number of nodes created during the execution of their B&B algorithms. Results have not been reported  According to the single thread results presented in https://www.cpubenchmark.net/, the Intel Xeon E5530 2.40 GHz processor used in our experiments is about 2.5 times faster than an Intel Pentium III 1.40GHz processor, which is similar to the one used by Dunstall and Wirth (2005a) in their experiments. Thus, in Tables 4 and 5, the reported execution times by Dunstall and Wirth (2005a) have been divided by 2.5 to facilitate the comparison. For each method and group of instances in a line, the tables report the number of instances solved to the proven optimality #opt, the average number of nodes enumerated #nodes and the average execution time elapsed in seconds t(s).
With regard to the results in Table 4, all methods were able to solve very quickly all considered instances with 10 and 15 jobs. The main difference among the methods concerns the number of explored nodes required to prove the optimality. While AF dj and SC prove very often the optimality at the root node, the B&B methods need a higher effort, exploring  a larger amount of nodes. The same trend is also observed in Table 5, but on a larger scale. The results in this table indicate that, when the instance size and the number of machines increase, the number of explored nodes by BP and LLP grows very rapidly. Regarding the instances with 20 jobs, it can be noticed that AF dj and SC managed to solve all of them in reduced computational times, thus closing 3 open instances. Concerning the results on instances with 25 jobs, all 1200 instances have been solved either by AF dj or SC. Overall, SC performed better than AF dj on this group, as it managed to solve all instances but one.

Results on benchmark set 2
The second set of instances was proposed by Dunstall and Wirth (2005b). In their work, the authors developed heuristic algorithms and evaluated them by comparison with lower bounds from the literature. In Tables 6 and 7 we thus opted to compare the results of our best methods with the best lower and upper bounds of Dunstall and Wirth (2005b). Table 6 shows the results obtained for all 3600 instances with n = 40 within a time limit of 600 seconds, whereas Table 7 presents the aggregated results for a subset of 180 out of 3600 instances with n = 80, solved within a larger time limit of 1 hour. In these tables, the best results by Dunstall and Wirth (2005b) are reported under the label D&W . Columns gap(%) shows the average percentage gap for each method. Concerning the results from the literature, the gaps are calculated taking into consideration the best lower and upper bounds available. Finally, columns lb lp , lb and ub under the tag #best, indicate how many times the referred method provided the best continuous relaxation (only for AF dj and SC), and the best lower and upper bounds, respectively. From Table 6, it can be noticed that AF dj and SC are both able to improve most of the best lower and upper bounds from the literature. Indeed, all such values were improved or at least equaled by either AF dj or SC. From a total of 3600 instances, AF dj and SC solved to the proven optimality 1847 and 3395 of them, respectively. This result indicates the superiority of SC when compared to AF dj . In part, this is due to the stronger LP relaxation provided by the SC formulation. Indeed, SC always provided the best LP bound. The performance of SC seems to worsen when the number of families decreases (but with f > 1) and the number of machines increases. In turn, AF dj seems to perform better when the number of machines increases. Thus, one could affirm that these methods somehow complement each other. In general, both methods provided very low average gaps (around 1.1% for AF dj and 0.1% for SC), within reasonable average execution times (around 5 minutes for AF dj and less than a minute for SC). By considering together the best results of AF dj and SC, we were able to solve to the proven optimality 3490 out of 3600 instances with n = 40. The results on the instances with n = 80 are provided in Table 7. Again, it can be noticed that AF dj and SC improve the best lower and upper bounds by Dunstall and Wirth (2005b). Regarding these instances, AF dj solved 32 of them to the proven optimality (where 30 solved instances have a single family of jobs). SC, in turn, was able to prove the optimality of 135 out of 180 instances. In addition, SC provided all the best continuous bounds and most of the best final lower and upper bounds. On average, for the instances with 80 jobs, AF dj provided a gap of 2.5% and SC a gap lower than 0.1%. This section present the computational results for the benchmark instances by Liao et al. (2012). This set was also used by Tseng and Lee (2017) to evaluate the performance of their metaheuristic. We found some small numerical inconsistencies in the results reported in both works, so we opted not to compare our results with theirs. Thus, we only performed a comparative analysis between our two best methods, AF dj and SC. The instances of this  average gap lower than 0.1% and 4595 instances solved to proven optimality. AF dj solved all instances with n ≤ 25, while SC missed three of them. To obtain these results, AF dj needed an average computational time of 42 seconds. SC, in turn, required around 36 seconds on average. In total, from the 4800 instances considered, only 81 of them are left open.

Conclusions
In this work, the problem of scheduling jobs on identical parallel machines with family setup times to minimize the total weighted completion time, P |s i | w j C j , has been tackled by exact methods. Five novel formulations have been proposed to solve the problem, namely, a one-commodity (OC), three arc-flow (AF f l , AF ap and AF dj ) and a set-covering (SC) formulation. Extensive computational experiments on benchmark instances from the literature have been performed and our results have been compared with the ones by the state-of-the-art methods. Among the proposed models, AF dj and SC proved to have a better performance, being able to deal with instances containing up to 80 jobs in reasonable computational times. These good results can be explained, in part, by the strong continuous bounds provided by these models.
The P |s i | j w j C j demonstrated to be a very challenging problem, nevertheless the AF and SC formulations obtained relevant results and seem to be promising methods in this area. Therefore, our research is now directed on the application of AF and SC formulations to other machine scheduling problems involving different features, such as, e.g., release dates and unrelated parallel machines.