Using a lag-balance property to tighten tardiness bounds for global EDF

Several tardiness bounds for global EDF and global-EDF-like schedulers have been proposed over the last decade. These bounds contain a component that is explicitly or implicitly proportional to how much the system may be cumulatively lagging behind, in serving tasks, with respect to an ideal schedule. This cumulative lag is in its turn upper-bounded by upper-bounding each per-task component in isolation, and then summing individual per-task bounds. Unfortunately, this approach leads to an over-pessimistic cumulative upper bound. In fact, it does not take into account a lag-balance property of any work-conserving scheduling algorithm. In this paper we show how to get a new tardiness bound for global EDF by integrating this property with the approach used to prove the first tardiness bounds proposed in the literature. In particular, we compute a new tardiness bound for implicit-deadline tasks, scheduled by preemptive global EDF on a symmetric multiprocessor. According to our experiments, as the number of processors increases, this new tardiness bound becomes tighter and tighter than the tightest bound available in the literature, with a maximum tightness improvement of 29 %. A negative characteristic of this new bound is that computing its value takes an exponential time with a brute-force algorithm (no faster exact or approximate algorithm is available yet). As a more general result, the property highlighted in this paper might help to improve the analysis for other scheduling algorithms, possibly on different systems and with other types of task sets. In this respect, our experimental results also point out the following negative fact: existing tardiness bounds for global EDF, including the new bound we propose, may become remarkably loose if every task has a low utilization (ratio between the execution time and the minimum inter-arrival time of the jobs of the task), or if the sum of the utilizations of the tasks is lower than the total capacity of the system.


Introduction
Many time-sensitive applications have soft real-time requirements, i.e., tolerate deadline misses, provided that some appropriate service-quality requirement is met. Smallscale examples range from infotainment to non-safety-critical control systems, while large-scale examples range from financial to IPTV services. In many cases, a sufficient service-quality requirement is that an application-specific, maximum tardiness is guaranteed with respect to deadlines (Kenna et al (2011)). Meeting this requirement may even allow all deadlines to be met, where buffers can be used to compensate for fluctuations of job completion times.
Guaranteeing deadlines, or at least a bounded tardiness, to time-sensitive applications is complicated by the fact that the growing computational demand of these applications can now be met only using multiprocessors. Optimal multiprocessor scheduling algorithms, guaranteeing all deadlines to feasible task sets, have been devised by Anderson and Srinivasan (2004); Baruah et al (1996); Megel et al (2010); Regnier et al (2011). These algorithms are relatively complex, as guaranteeing all deadlines on a multiprocessor is not a trivial task. Sub-optimal but simpler algorithms are available too (Devi and Anderson (2005); Valente and Lipari (2005b); Erickson and Anderson (2012); Erickson et al (2014)). One such algorithm is Global Earliest Deadline First (G-EDF). G-EDF has been proved to guarantee a bounded tardiness to feasible task sets by Devi and Anderson (2005); Valente and Lipari (2005b). From empirical results, Bastoni et al (2010) have also inferred that G-EDF is an effective solution for soft real-time (SRT) tasks on (clusters of) up to 8 processors. Variants of G-EDF have been defined as well (Erickson and Anderson (2012); Erickson et al (2014)). In these variants, named G-EDF-like (GEL) schedulers, each job is assigned a fixed priority as in G-EDF, but, differently from G-EDF, this priority is not a function of only the deadline of the job. In more detail, the priority of each job is computed so as to optimize further goals in addition to guaranteeing a bounded tardiness.
Both G-EDF and GEL schedulers, besides being simpler than optimal scheduling algorithms, cause a lower overhead than the latter (apart from possible GEL schedulers with job priorities computed through complex formulas or algorithms). Finally, differently from optimal algorithms, G-EDF and GEL schedulers enjoy the job-level static-priority (JLSP) property, which is required in most synchronization solutions (Brandenburg (2011)).
On the downside, being that G-EDF and GEL schedulers are not optimal, they may or may not succeed in meeting the time requirements of an application, depending on the maximum tardiness that they can guarantee to the tasks of the application. Maximum tardiness also determines buffer sizes where buffers are used to conceal deadline misses, and memory may be a critical resource in, e.g., embedded systems. As a consequence, tightness is an important property of tardiness bounds for these schedulers. In this respect, in this paper we compute a new tardiness bound for G-EDF and implicit-deadline tasks. This bound proved to be tighter than existing ones within the scope of our experiments. Additionally, although we focus only on the just-mentioned pair of scheduling algorithm and task model, we leverage a property that holds for any work-conserving scheduler, as discussed in the description of our contribution.

Related work
Initial tardiness bounds for G-EDF were computed by comparing G-EDF against an ideal schedule (Devi and Anderson (2005); Valente and Lipari (2005b)). 1 Devi and Anderson (2005) also proposed, without proof, an improvement over the original bound reported in the paper. The same authors subsequently proved this improvement in (Devi and Anderson (2008)).
A new technique for computing worst-case tardiness bounds for G-EDF was then proposed by Erickson et al (2010). Using this technique, named compliant-vector analysis (CVA), the authors obtained tardiness bounds that dominate the one proved in (Devi and Anderson (2005)). They also obtained bounds that dominate the one computed in (Devi and Anderson (2008)), by combining CVA with the same improvement proved in (Devi and Anderson (2008)). Erickson and Anderson (2012) then presented an alternative improvement over CVA, which they named PP Uniform Reduction Law. Finally, Erickson et al (2014) turned the computation of the smallest-possible tardiness bounds with CVA and the PP Uniform Reduction Law into a linear program. More precisely, the last two papers focused on job lateness, defined as just the difference between completion time and deadline, but the provided results can of course be used also to compute bounds to the tardiness (defined, instead, as the maximum between 0 and the lateness).
To discuss the tightness of the bounds described so far, we can consider the following common, best-case definition: a tardiness bound for a scheduling algorithm S is tight, for a number of processors M, if there exists at least one task set such that, if the task set is scheduled with the algorithm S on M processors, then the maximum tardiness experienced by at least one task of the set is equal to the value of the bound for that task. Unfortunately, except for the case M = 2 (Devi and Anderson (2008)), existing tardiness bounds for G-EDF have not been proved to be tight, even with respect to this best-case definition. This leaves room for the existence of tighter bounds, which is exactly what we show, experimentally, in this paper.

Contribution
In this paper we show how to get a new tardiness bound for G-EDF, by integrating the approach used by Devi and Anderson (2008) with a balance property that G-EDF shares with any work-conserving scheduling algorithm. In particular, we compute such a bound for the following case: (1) implicit-deadline sporadic tasks scheduled by preemptive G-EDF, (2) a system made of M identical, unit-speed processors, and (3) no synchronization among tasks.
To compute this new bound, we use the same approach as in Anderson (2005, 2008); Valente and Lipari (2005b)). As in these works, we compute a bound 1 Unfortunately, the proofs in (Valente and Lipari (2005b)) contain an error. As shown in an amended, but not peer-reviewed version of the paper (Valente and Lipari (2005a)), if the part of the proofs containing that error is removed, then the rest of the proofs still allow a tardiness bound to be proved. But the latter bound is larger than both the problematic bound in (Valente and Lipari (2005b)) and the bound in (Devi and Anderson (2005)). made of two components. The first component accounts for the fact that, after a new job is released, G-EDF may fail to fully utilize all the processors while executing the jobs scheduled before the new job and the new job itself. This may lower the number of jobs per second executed in parallel, i.e., the total speed of the system. In particular, the system may become slow enough, in executing jobs, to finish the new job later than its deadline.
In addition, exactly because of the above issue, there may be several late jobs at some point in time. Suppose that a new unlucky job arrives at such a time instant. Some of the late jobs may have a higher priority (earlier deadline) than this new job. The new job may then suffer from a tardiness caused not only by the above sub-optimal job-packing issue, but also by the need to wait for the completion of several late jobs before being executed. The second component accounts for the latter additional waiting. 2 The first component is quite easy to compute, whereas most of the paper is devoted to computing the second component. To this purpose, we turn the last qualitative consideration into the following quantitative relations. First, given the extra work that the system has to do to complete late jobs, we prove (trivially) that the time that the new job has to wait before being executed increases with this extra work. Then, we prove that this extra work grows, in its turn, with how much the system is cumulatively lagging, when the new job is released, behind an ideal schedule, or equivalently an ideal system, in which every job is completed by its deadline. In the end, the second component is proportional to this cumulative lag. According to this fact, we obtain the second component by computing an upper bound to this cumulative lag (more precisely to the cumulative lag for a special subset of tasks, as we explain in Section 4.5). The peculiarity of the proofs reported in this paper lies in how we compute an upper bound to the cumulative lag.
To highlight this peculiarity, we need to add the following piece of information: the cumulative lag is defined as the sum of individual per-task lags, where each pertask lag measures how much the system is lagging behind the ideal system in executing the jobs of the task. We get an upper bound to the cumulative lag by the same two steps as in Anderson (2005, 2008): first, we compute an upper bound to each per-task lag, and then we sum these per-task upper bounds. But, differently from Anderson (2005, 2008), we do not compute the upper bounds to per-task lags in isolation from each other. In contrast, we prove and use the following lag-balance property: for each task, the bound to its lag contains a negative component proportional to the sum of part of the other contributing lags. This sort of internal counterbalancing brings down the value of the whole sum. Finally, as for a comparison against CVA analysis (Erickson et al (2010); Erickson and Anderson (2012); Erickson et al (2014)), in CVA analysis the components of a special vector 2 Actually, the situation is a little bit more complex, because: 1) some processor may become available to execute the new job even before all higher-priority late jobs have been completed, and 2) the execution of the new job may happen to be suspended and restarted several times (as it may happen to any job). But, as we show formally through the lemmas proved in this paper (and as it has been already proved in the literature), the essence of the problem is the same: the more remaining work the system has to do, to complete higher-priority late jobs, the more time the new job may have to wait before being executed. play a similar role as the lags contributing to Σ . But no balance property is used to upper-bound the sum of these components.
We tag the whole tardiness bound as harmonic, because of the relation, highlighted in Section 3, between the dominant term in the bound and a harmonic number. We could have computed an even tighter bound for task sets with a total utilization strictly lower than the system capacity, but this would have made formulas more complicated and proofs longer (see the comment after Lemma 10 for details). Finally, the value of the bound apparently remains the same regardless of whether the improvement proved in (Devi and Anderson (2008)) is applied, i.e., whether it is assumed that one of the lags contributing to the cumulative lag is not higher than the maximum job length. We do not investigate this issue further in this paper.
We evaluated, experimentally, the tightness of the harmonic bound and of the other bounds available in the literature. To this purpose, we simulated the execution of a large number of random task sets. According to our results with up to 8 processors, the bound obtained by combining CVA with the improvement proved in (Devi and Anderson (2008)), which we name CVA2, is the tightest one available in the literature. On the opposite end, the bound computed in (Devi and Anderson (2008)), named DA DA in our results, quickly becomes substantially looser than CVA2 as the number of processors M increases, until it becomes up to about 40% looser than CVA2 with M = 8. Nevertheless, the harmonic bound, obtained by integrating the lagbalance property with the approach used to compute DA, is always at least as tight as CVA2, and becomes tighter and tighter than CVA2 as M increases. In particular, the harmonic bound is from 18% to 29% tighter than CVA2 with M = 8.
On the opposite end, a negative characteristic of the harmonic bound is that its formula is quite complex, and a brute-force algorithm takes an exponential time to compute the second component of the bound (fortunately, this component has to be computed once per task set, as it is the same for all the tasks in a task set). The constants in the formula of the cost of the algorithm are however so small that computing the harmonic bound was feasible for almost every group of 1000 task sets considered in our experiments (Section 9). The only exceptions were some of the cases where using a tardiness bound is apparently not necessary (discussed below). In addition, the fact that in this paper we show only an exponential algorithm does not imply that polynomial-time, exact or approximate algorithms cannot be devised. Investigating these algorithms is out of the scope of this paper.
As a general result, the lag-balance property described in this paper might help to reduce pessimism on worst-case response-times also with other scheduling algorithms, systems and types of task sets. In fact, the property is so general that one has the impression that it might be possible to extend the harmonic bound to nonpreemptive G-EDF or stochastic task-set models, since in both cases existing bounds are already computed using the same approach as the harmonic bound itself (Devi and Anderson (2008); Mills and Anderson (2010)). It would also be interesting to compare the harmonic bound with bounds for improved versions of G-EDF, such as G-FL ( (Erickson and Anderson (2012); Erickson et al (2014)).
Finally, our experiments also provide the following general, negative information: all the bounds considered in the experiments, including the harmonic bound, become quite loose when all tasks have a low utilization, or when the total utilization of the task set is lower than the total capacity of the system. Fortunately, as we discuss in detail in Section 9, in the first case tardiness bounds are apparently not very relevant, whereas the second case is exactly the one for which there is room for improvement for the harmonic bound.

Organization of this paper
In Section 2 we describe the system model. Then we report the harmonic bound in Section 3. In Section 4 we provide the minimal set of definitions needed to give a detailed outline of the proof of the bound. We provide such an outline in Section 5. Although the lag-balance property is the key property that allows us to compute the harmonic bound, the proof of the lag-balance property does not highlight the general characteristics of G-EDF and a multiprocessor system that enable us to prove this property. For this reason we devote Section 6 to the intuition behind this property, or, more precisely, behind the time-balance property, which is the preliminary property from which we derive the lag-balance property. The core of the paper then follows: after proving a set of preliminary lemmas in Section 7, we report the proof in Section 8. Finally we report our experimental results in Section 9.

Task and service model
In this section we introduce the basic notations used in the paper. All notations are also summarized in Table 1. To justify an equality or an inequality, we often write the reason why that relation holds on the equality/inequality sign. In most cases, we write just the number of the equation or lemma by which that relation holds (see, e.g., the first and last equalities in (2)). As for time intervals, we write [t 1 ,t 2 ], [t 1 ,t 2 ) or (t 1 ,t 2 ), to refer to all the times t such that t 1 ≤ t ≤ t 2 , t 1 ≤ t < t 2 or t 1 < t < t 2 . For brevity, we often use the notation where f (t) is a any function of the time and any ordering between t 1 and t 2 may hold, i.e., t 1 may be either smaller than or larger than t 2 . From this definition, it follows that regardless of the ordering among the three time instants. Similarly: To simplify the notation, in summations over set of tasks we use the symbol of the set of tasks to denote, instead of the set of tasks, the set of indexes of the tasks in the set. Finally, to avoid special considerations for corner cases, we assume that, for any expression x and any pair of integers n 1 and n 2 , ∑ n 2 i=n 1 x = 0 if n 1 > n 2 (note that x may or may not be a function of the index i). In the following two subsections we describe the task and service models.
Worst-case computation time of the jobs of task τ i T i Period/Minimum inter-arrival time of the jobs of task τ Release time and absolute deadline of the j-th job of task τ i f j i Completion time of the j-th job of task τ i in the MPS

MPS
Real system, made of M unit-speed processors M Number of processors of the MPS DPS Ideal, reference system, made of one processor per task; the speed of each processor is equal to Amount of service (Subsection 4.3) given to task τ i by the MPS/DPS up to time t l j i Length of the j-th job of task τ i L i = max j l j i Maximum length of the jobs of task τ i ; numerically equal to C i , but measured in service units Blocking tasks, i.e., subset of tasks owning, at time t, pending jobs with a deadline earlier than or equal to that ofĴ in the MPS τ DPS (Ĵ,t) ⊆ τ Subset of tasks generated by the algorithm in Definition 6 τ(Ĵ,t) ⊆ τ Extended set of blocking tasks, defined as Maximum total lag up to time b k , with the function λ (p) defined by (24) τ Subset of tasks in τ(Ĵ, b k ) with a positive lag at time b k G Cardinality ofτ

Task model
We consider a set τ of N tasks τ 1 , τ 2 , . . ., τ N , with each task τ i consisting of an infinite sequence of jobs J 1 i , J 2 i , . . . to execute. The j-th job J j i of τ i is characterized by: a release (arrival) time r j i , a computation time c j i , equal to the time needed to execute the job on a unit-speed processor, a finish time f j i , an absolute deadline d j i , within which the job should be finished, and a tardiness defined as max(0, f j i − d j i ). Each task τ i is in its turn characterized by a pair (C i , T i ), where C i ≡ max j c j i and T i is the minimum inter-arrival time of the jobs of the task, i.e., ∀ j r j+1 i ≥ r j i + T i . No offset is specified for the release time of the first job of any task. There is no synchronization between tasks. The deadline of any job J j i is implicit, i.e., equal to r j i + T i . We say that a job is pending during [r j i , f j i ]. Note that, according to this definition, a job is pending also while it is being executed.
We define as tardiness of a task the maximum tardiness experienced by any of its jobs. Finally, for each task τ i we define its utilization as U i ≡ C i T i ≤ 1. We denote by U sum the total utilization ∑ i∈τ U i of the task set (note that, according to what was previously said, in the summation we use the symbol τ to denote the set of the indexes of the tasks in τ). Figure 1.A shows a possible sequence of job arrivals for a four-task set with total utilization 3. The tasks are characterized by the following pairs (C i , T i ): (3, 4), (3, 4), (2, 3) and (5, 6). Every job is represented as a rectangle, with an up-arrow on top of its left side: the projection onto the x-axis of the left extreme of the rectangle represents the release time of the job, whereas the length of the base of the rectangle is the time needed to complete the job on a unit-speed processor. Finally, every job is followed by a down arrow, representing its deadline. The first job of every task is released at time 0, except for J 1 3 . The job J 3 2 is the only non-first job that is released later than the deadline of the previous job of the same task.

Service model
We consider a symmetric multiprocessor comprised of M identical, unit-speed processors, and name this system MPS (MultiProcessor System). We assume that M < N, that U sum ≤ M and that jobs are scheduled according to the global and preemptive EDF (G-EDF) policy: 1) each time a processor becomes idle, the pending (and not yet executing) job with the earliest deadline is dispatched on it, 2) if a job with an earlier deadline than some of the jobs in execution arrives when all the processors are busy, then the job with the latest deadline among those in execution is preempted, i.e., it is suspended, and the newly arrived job is started. Ties are arbitrarily broken. From this policy, the following property immediately follows.
Lemma 1 In the MPS, at all times the jobs in execution have the earliest deadlines among all pending jobs. Ties are arbitrarily broken. We say that a task is being served while one of its jobs is being executed. In general, the execution of a job may be interrupted several times to grant the processor to other jobs with earlier deadlines. For any job J j i , we use the term portion as a short form to refer to any of the maximal parts of the job that are executed, without interruption, during [r j i , f j i ]. We call start time of a portion the time instant at which the portion starts to be executed, and use the notationĴ ⊆ J j i to mean thatĴ is a portion of a job J j i . The portionĴ is pending from time r j i to its completion. For brevity, hereafter we say deadline ofĴ to refer to the deadline d j i of the job the portion belongs to. As can be deduced from this notation, we assume thatĴ may also be the only portion of J j i , which happens if, once started, J j i is executed without interruption until it is finished. Accordingly, for brevity, in the rest of the paper we use the term portion to refer both to a proper contiguous slice of a job and to a whole job. To avoid ambiguity, we stress that with the term job we always refer to a whole job.
We say that a pending job portion is blocked if it cannot start to be executed. In our model a job portion can be blocked for only one of the following two reasons.
1. The first portion of a job J j i cannot be started before the preceding job J j−1 i is finished. In this respect, we say that a pending job portionĴ ⊆ J j i is blocked by precedence at time t if a previous portion of J j i or any portion of J j−1 i is still pending at time t. 2. If a pending portionĴ ⊆ J j i is not blocked by precedence at a given time t, then the only other reason why it could not be started at that time is because all processors are busy serving portions of jobs with deadlines earlier than or equal to d j i . In this case, we say that the portion is blocked by priority at time t.
For the sake of clarity, we emphasize that only portions not blocked by precedence may be deemed blocked by priority. Finally, we often use the following phrase for brevity: we say just that a portion is blocked by priority, without specifying the time instant at which, or the time interval during which the portion is in such a state, to mean that the portion happens to be blocked by priority (at least) right before it starts to be executed.
Before discussing the above definitions further, we show them through Figure 1.B, which shows the execution of the jobs depicted in Figure 1.A, on an MPS with three unit-speed processors. For each processor, each rectangle represents the execution of a job portion on that processor, with the projection onto the x-axis of the left/right side of each rectangle representing the start/finish time of the portion. Finally, each (whole) job completion time is highlighted with a ⊤. As an example of job portions, when J 1 3 arrives at time 0.5, J 1 4 is suspended. The second and last portion of J 1 4 is then started at time 2.5 after being blocked by priority during [0.5, 2.5), and it is finished at time 7. The only other portion blocked by priority is the whole job J 2 2 , while J 3 2 is an example of job portion blocked by precedence.
Finally, given a maximal time interval during which a task τ i is continuously served, we define as chain (of portions of jobs) of task τ i , the sequence of portions of jobs of τ i executed during the time interval. An example of a chain, in Figure 1.B, consists of J 2 2 and J 3 2 , executed back-to-back during [5.5, 10]. As a degenerate case, a chain may be made of only one portion. In this respect, recall also that we use the term portion as a short form to refer to both an actual portion and a whole job. In addition, a job that starts to be executed right after being blocked by precedence, necessarily belongs to a chain, as it starts to be executed right after the previous job of the same task. Instead, we define as head of a chain the job portion executed for first in the sequence. We conclude this section with the following note.
Note 1 If a chain contains at least two job portions, then the head of the chain is necessarily the last portion of a job. In fact, by definition of job portions, the two job portions cannot belong to the same job, and thus a new job of the same task starts to be executed right after the head.

The harmonic bound
In this section, we report, first, the harmonic bound, and then a brute-force algorithm to compute it. The formula of the harmonic bound is relatively complex. Then, to help the reader better understand the formula and get an idea of the order of magnitude of the bound, we also instantiate the formula for two extreme cases in which it becomes simpler. To the same purpose, we show, through a numerical example, how to compute the bound with the brute-force algorithm. Finally, we provide a brief analysis of the computational cost of the algorithm, plus some example of the execution time of an implementation of the algorithm.
To write the harmonic bound, we start by introducing the following definition and symbols. The motivation for introducing the following symbols is only ease of notation, and none of them represents, alone, a relevant quantity (they are just components of relevant quantities). The only exception is Ω , for which we describe the quantity it represents when commenting on the harmonic bound.
Definition 1 For any set of tasks τ ′ ⊆ τ, containing G tasks, and for all g ∈ {1, 2, . . . , G}, let g ′ denote the index of the g-th task in τ ′ .
Using this notation, we can define the terms that appear in the formula of the bound (in the next equation, v ′ is the index of the v-th task in τ ′ according to Definition 1): and We can now state the harmonic bound. In particular, first we report the bound, and then we show that its value is positive. After that, we explain where the components of the bound stem from. Finally, we analyze the bound in two extreme cases, and show why we name it harmonic.
Theorem 1 (Harmonic Bound) For every job J j i , As stated in the following lemma, the right-hand side (RHS) of (7) is not negative. Therefore it is not necessary to take the maximum between its value and 0 to turn it into a tardiness bound (recall that tardiness is a non-negative quantity).
Lemma 2 The RHS of (7) is not negative, and is equal to 0 if and only if M = 1.
Proof First, since in both (5) and (6), we have The above inequality, plus U v ′ > 0 and C v ′ > 0, imply that all the terms in (5) and (6), including the sums, are non-negative. In particular, if M = 1, then all the sums are null. In fact, in this case |τ ′ | = 0 in both (5)  In the introduction we said that the tardiness bound reported in this paper contains two components. The term M−1 M C i in the RHS of (7) is the first component mentioned in the introduction. That is, this term accounts for the maximum tardiness that can be caused by the fact that the MPS is sub-optimal in packing jobs on processors, and hence in utilizing the full speed of the system. For example, the MPS fails in utilizing all the processors while the job J 1 4 is pending in Figure 1.B, and (the second portion of) J 1 4 ends up being completed after its deadline. By properly breaking jobs into smaller portions and scheduling these portions in a more clever way, it would have been possible to fully utilize all processors and complete J 1 4 by its deadline (Anderson and Srinivasan (2004)).
Instead, the quantity Ω is the second component of the bound we mentioned in the introduction. As we already stated, this component stems from the following issue: exactly because of the above sub-optimal job-packing problem, the MPS may be already late in executing other higher-priority jobs when a new job arrives. By Lemma 1, the MPS may have to complete part of these late jobs before one processor becomes available to execute the new job. The component Ω accounts for the additional delay caused by this fact. In particular, as we also already said, given an ideal system that completes every job by its deadline, this additional delay is proportional to how much the MPS may be cumulatively lagging behind the ideal system, in terms of amount of service received by tasks, when a new job arrives. Therefore, we obtain Ω by computing an upper bound to this cumulative lag.
As a final consideration on Ω , we can note that Ω has the same value for every task. This may pave the way to even tighter bounds. In fact, for example, the bound computed by Erickson et al (2010) contains, for each task, a component equivalent to Ω , but whose value varies (also) with the parameters of the task. That bound is tighter than previous ones exactly because of this characteristic.
Finally, we consider two extreme cases to get an idea of the order of magnitude of the bound (7). First, U g ′ → 0 for all τ ′ and for all g ≤ |τ ′ |. This implies M τ ′ g → M. Defining C ≡ max τ i ∈τ C i , and considering also that |τ ′ | ≤ M − 1 by (8), we get, As M increases, the RHS of (13) quickly becomes higher than the RHS of (10), because of the term C ∑ M g=2 1 g . In particular, as M increases, this term becomes the dominant one in the RHS of (13). In this respect, the sum ∑ M g=2 1 g is equal to the harmonic number H M minus 1, while the whole term C ∑ M g=2 1 g is an upper bound to the quantity to which the sum ∑ |τ ′ | g=1 (5) and (6), as U g ′ → 1. We name the bound (7) harmonic bound after these facts.
It is easy to show that the RHS of (13) grows less than linearly with M (a coarse upper bound to the order of growth of the RHS of (13) is log M). Similarly, one can see numerically that the RHS of (7), and in particular, its component Ω , grows less than linearly with M as well. The small slope of the RHS of (7) is the key property for which the harmonic bound proves to be tighter than existing ones in our experiments. As we highlight in Section 9.1, such a small slope follows from the fact that, differently from the literature, we use the lag-balance property when computing an upper bound to the cumulative lag of the MPS with respect to an ideal system.

A brute-force algorithm for computing the harmonic bound
The hardest part in computing the value of the harmonic bound is of course computing Γ and Ω . A brute-force algorithm for computing these values is the following. First, compute Γ . To this purpose, generate, as a first step, all the possible subsets τ ′ ⊆ τ of ⌈U sum ⌉ − 1 tasks. Then, for every subset τ ′ generated, compute the value of the sum in (5) for all the possible orderings of the tasks in τ ′ (i.e., all the |τ ′ |-tuples containing all the tasks in τ ′ ). In fact, for a given subset τ ′ and for every g ≤ |τ ′ |, the value of the sum in (5) may change depending on which task is considered to be the g-th task in τ ′ (this point should become clearer from the example that we give in a moment). As a final step, use the maximum value of the sum obtained from the previous steps and multiply it by M to get Γ . Once computed Γ , use the same approach to compute Ω , taking into account that, this time, all the subsets of at most ⌈U sum ⌉ − 1 tasks need to be generated (and not only all the subsets of exactly ⌈U sum ⌉ − 1 tasks).
This algorithm can be refined in such a way to reduce, when possible, the number of subsets to generate. To introduce this improvement, we can consider that, given two subsets of tasks τ ′ A and τ ′ B , it is not necessary that τ ′ A = τ ′ B holds for the two subsets to yield the same value for the max functions in (5) and (6). In fact, a sufficient condition is that |τ ′ A | = |τ ′ B |, and, for every g = 1, 2, . . . , |τ ′ A |, both the g-th task in τ ′ A and the g-th task in τ ′ B are characterized by the same pair (C i , T i ). As a consequence, to compute the max functions in (5) and (6) it is enough to restrict the search (and thus the generation of subsets) to any maximal set of subsets such that, for each element of the set, there is no other element for which the above condition holds.
We show now an example of the steps taken by this brute-force algorithm on a real task set (more in general, this provides also a practical example of how the harmonic bound can be computed). Consider a task set made of four tasks, with each task characterized by the following pairs (C i , T i ): τ 1 → (4, 5), τ 2 → (4, 5), τ 3 → (4, 5) and τ 4 → (3, 5). Accordingly, U sum = 3 and hence ⌈U sum ⌉ − 1 = 2. All the possible sets τ ′ in (5) consist therefore of two tasks, and the argument of the max function in (5), which we denote by Arg Γ (τ g , τ h ), can be rewritten as The whole formula can then be rewritten as According to the above considerations on how to reduce the number of subsets to generate, to compute the value of the RHS of (15) it is enough to consider any possible maximal set of pairs of tasks (τ g , τ h ), such that each pair in the set may yield a different value for the argument of the max function in (15). 3 Since τ 1 , τ 2 and τ 3 have the same parameters, one such possible maximal set is: {(τ 1 , τ 2 ), (τ 1 , τ 4 ), (τ 4 , τ 1 )}.
Using this maximal set, we get, from (15), The next step is computing Ω . According to (6) and to the fact that ⌈U sum ⌉−1 = 2, the sets τ ′ to consider may consist of one or two tasks. If τ ′ contains just one task, i.e., if τ ′ = {τ g } for some τ g ∈ τ, then the argument of the max function in (6) is a function of τ g . Denoting this argument by Arg Ω (τ g ), from (6) we get In a similar vein, if τ ′ contains two tasks, i.e., τ ′ = {τ g , τ h }, then the argument of the max function in (6), say Arg Ω (τ g , τ h ), becomes Accordingly, we can rewrite (6) as follows: Similarly to the steps taken to compute Γ from (15), we can compute the value of the first argument of the external max function in (19) by considering any maximal set of tasks such that the value of Arg Ω (τ g ) may vary over the elements of the set. A possible maximal set with this property is {τ 1 , τ 4 }. Instead, regarding Arg Ω (τ g , τ h ), a possible maximal set of pairs of tasks (τ g , τ h ) such that the value of Arg Ω (τ g , τ h ) may differ over the elements of the set is: {(τ 1 , τ 2 ), (τ 1 , τ 4 ), (τ 4 , τ 1 )}. Expanding the RHS of (19) according to these two sets, i.e., with the same procedure as that by which we obtained (16) from (15), and computing the value of the resulting expression, we get Ω = 3.15. Replacing this value in (7), we get the following tardiness upper bound for every job of τ 1 , τ 2 or τ 3 : 5.82, and the following bound for every job of τ 4 : 5.15.

Computational cost of the brute-force algorithm
We evaluate now the worst-case computational cost of the brute-force algorithm defined in Section 3.1. To this purpose, we suppose that all the possible subsets τ ′ of U sum − 1 tasks have to be considered to compute Γ in (5), and that all the possible subsets τ ′ of at most U sum − 1 tasks have then to be considered to compute Ω in (6).
In such a case, the order of the number of subsets to generate, and therefore the order of the cost of the brute-force algorithm as a function of N, is where the second inequality is one of the well-known simple upper bounds to the central binomial coefficient (Koshy (2008)). Finally, as an example of the actual execution time of the brute-force algorithm, we provide such an example through the C implementation we made for the experiments (Experiment-scripts (2014)). To compute the bound for all the tasks in a worst-case task set, this C implementation takes, on an Intel Core i72760QM, from about 23µs if M = 2 and N = 3, to about 17s if M = 8 and N = 17. Proposing also more efficient algorithms to compute or approximate the harmonic bound is out of the scope of this paper.

Preliminary definitions
Unfortunately, our proof of the harmonic bound is rather long. To help the reader not to get lost in such a long proof, we break it as much as possible into manageable pieces. To this purpose, we start by introducing, in this preliminary section, only the definitions needed to provide a detailed outline of the proof itself. The major part of these definitions basically coincide with corresponding definitions in (Devi and Anderson (2008); Valente and Lipari (2005b)). Some concepts are however expressed in a slightly different way, to better comply with the proof strategy adopted in this paper.
Most of the following time intervals and quantities are defined as a function of a generic portionĴ, belonging to a job J j i . These are the portion and the job that we use as a reference, in the proof, to prove (7).

Busy interval and blocking tasks
The following time interval basically coincides with that defined in (Devi and Anderson (2008)

[Definition 4]).
Definition 2 Given a job portionĴ ⊆ J j i blocked by priority and starting at timeŝ, we define as busy interval for the portionĴ, the maximal time interval, starting not before time r j i and ending at timeŝ, during which all processors are constantly busy executing jobs with a deadline earlier than or equal to that ofĴ.
For example, as shown in Figure 1.C, the busy interval for the second portion of J 1 4 is [0.5, 2.5). The busy interval for a job portionĴ is a superset of the time interval during which the portion is blocked by priority. In fact, by definition, whileĴ is blocked by priority, all processors are constantly busy executing jobs with a deadline earlier than or equal to that ofĴ. But the same may happen also whileĴ is blocked by precedence.
To be able to block the execution ofĴ, a task must own at least one pending job with a deadline earlier than or equal to d j i . This holds also for τ i itself, if the portion J is blocked by precedence. In fact, the jobs of τ i that precede J j i have an earlier deadline than d j i . To give a concise name to all the tasks that can cause a portionĴ to be blocked, we provide the following definition.
Definition 3 We define as set of blocking tasks τ MPS (Ĵ,t) ⊆ τ for the portionĴ ⊆ J j i at time t, the set of tasks owning pending jobs 4 , in the MPS and at time t, with deadlines earlier than or equal to d j i .
We add the superscript MPS in τ MPS (Ĵ,t) to distinguish this set from the extended set of blocking tasks that we define in Section 4.5. We stress that, by Definition 3, the set τ MPS (Ĵ,t) may include τ i itself. To help visualize this set, Figure 1.D shows both the cardinality of τ MPS (J 2 2 ,t) and the contents of the set (through the labels on the curve). Every job that is pending during [0, 5.5) happens to have a deadline earlier than or equal to d 2 2 = 8. Therefore τ MPS (J 2 2 ,t) consists, at every time instant in [0, 5.5), of the set of all the tasks with pending jobs.
Differently from the busy interval, the set of blocking tasks for a portionĴ ⊆ J j i is intentionally well-defined also before time r j i . We discuss the reason behind this choice in Section 4.5.

Service unit, processor speed and amount of service
Our proof of the harmonic bound contains several lemmas about amounts of service received by tasks, or about differences between amounts of service. To measure the amount of service received by a task, we assume, first, that any job portion is made of a number of service units to execute, equal to the number of time units needed to execute the portion on a unit-speed processor (as the processors of the MPS). We define this number as length of the portion. Accordingly, we define the length l j i of a job J j i as the sum of the lengths of its portions, and, for the overall task τ i , we define L i ≡ max j l j i . Of course, numerically, l j i = c j i and hence L i = C i (we use different symbols for lengths because the latter are measured in service units and not in time units). Finally, we (re)define the speed of a processor as the number of service units per time unit that the processor completes while executing a job.
As a next step to define a measure of the service received by a task τ i in a system S, we denote by I 1 i (t), I 2 i (t), . . ., I V i (t) the maximal time intervals, in [0,t], such that, during each of these time intervals, τ i is continuously served on the same processor of the system S. Finally, we denote by I i (t) the union of these time intervals. For example, according to Figure 1 . This quantity is in its turn equal to the length of the time interval I v i (t), multiplied by the speed of the processor on which the jobs of τ i are executed during I v i (t). Using this definition, we define the amount of service W S i (t) received by a task τ i in a system S during [0,t], as the sum of the amounts of service received by the task during the time intervals I 1

Dedicated-Processor System (DPS)
As we discussed in the introduction and in Section 3, we compute the component Ω in (7) by computing an upper bound to how much the MPS may be cumulatively lagging behind an ideal system when a new job arrives. In this subsection we introduce the ideal system by which we compute this cumulative lag. Such a system is equivalent to the PS schedule used, e.g., in Anderson (2005, 2008)) 5 . Definition 4 Given a task set τ, we define as Dedicated-Processor System (DPS), an ideal system containing, for each task in τ, a dedicated processor that executes only the jobs of that task, at a constant speed equal to the utilization U i of the task.
The execution time of a job J j i on the dedicated processor associated to τ i is then . This property, plus the fact that each processor executes only the jobs of its associated task, guarantee that the DPS completes every job no later than its deadline.
As an example, Figure 1.E shows both the characteristics of the DPS associated to the task set in Figure 1.A, and the execution of the jobs on that DPS. The execution of the jobs is represented with rectangles, whose left and right extremes have the same meaning as for the MPS in Figure 1.B. The height of each rectangle is equal to the speed of the dedicated processor executing the job, with the same scale as in Figure 1.B. In other words, the area of the rectangle representing the execution of a job in the DPS is equal to the sum of the areas of the rectangles representing the execution of the portions of the same job in the MPS. Since, for each task τ i , every job in Figure 1.A has an execution time equal to C i , apart from J 3 2 , the DPS completes every job exactly on its deadline, apart from J 3 2 .

Lag of a task
As we already said in the introduction, we define the cumulative lag of the MPS as the sum of per-task lags. We define the latter quantity as follows (the following definition is the counterpart of the lag as defined by Equation (8) in (Devi and Anderson (2008)). Definition 5 Given a task τ h , a generic time instant t, a generic deadline d of some job, and the maximum index m ≡ max d n h ≤d n, i.e., the index m of the latest-deadline job, of τ h , among those with a deadline earlier than or equal to d, we define where l n h is the length of the n-th job of the task τ h , and To measure how much the MPS is lagging behind the DPS in serving τ h , one would have probably expected the simpler and more natural difference Instead, we consider a lower-bounded difference that takes into account the service provided to τ h by the MPS only with respect to jobs with a deadline at most equal to d. We use this lower-bounded difference because it greatly simplifies several formulas and proof steps. To give some examples, we can continue the example we made at the end of Section 4.3. Using as a reference the deadline d 2 We already said that we use a generic job J j i as a reference in the proof. As a consequence, we use the quantity lag h d j i (t) very often. For this reason, for ease of notation, we define the following short form for lag h d j i (t): and, in the rest of the paper, we always use the short form lag h (t) instead of the full expression lag h d j i (t). In particular, we refer to the quantity lag h (t) also when we use the phrase lag of τ h at time t. Finally note that, though the terms tardiness and lag may be somehow interchangeable in natural language, the concept of tardiness, as defined in Section 2, must not be confused with the concept of lag, as defined in (22) (and measured in service units).

Total lag
As we already explained in Section 3, we compute Ω by computing an upper bound to the cumulative lag of the MPS with respect to the DPS. We define now precisely this cumulative lag. By Lemma 1, given the last portionĴ of the job J j i in (7), the only jobs that determine the delay by whichĴ is completed are the jobs with a deadline earlier than or equal to d j i . Accordingly, by Definition 3, the natural set of tasks to consider for computing the cumulative lag that influences the tardiness of J j i is τ MPS (Ĵ,t). Unfortunately, considering just this set of tasks further complicates proofs. Instead, we get simpler proofs if we define the cumulative lag as a function of a more artificial superset of τ MPS (Ĵ,t). This extended set may contain also some tasks that do not belong to τ MPS (Ĵ,t). To define this extended set, we start from the following intermediate set.
Definition 6 Given a job portionĴ ⊆ J j i blocked by priority and a generic time instant t, we define the set of tasks τ DPS (Ĵ,t) as the set generated by the following algorithm: Initial state. At system start-up, the set is empty, i.e., τ DPS ( if the lag of the task is non-positive. More formally, if t e is a generic time instant at which a task τ h exits τ MPS (Ĵ,t e ), but such that lag h (t e ) ≤ 0 holds, then Exit (extraction) of a task. A task exits τ DPS (Ĵ,t) if its lag becomes strictly positive. More formally, if t e is a generic time instant at which lag h (t) becomes strictly For example, according to Figure 1.D, the task τ 3 is the first task to exit τ MPS (J 2 2 ,t). It happens at time 2.5, because the task stops having pending jobs in the MPS. But, according to Figure 1.E, the task has still a pending job in the DPS at time 2.5. Thus, lag 3 (2.5) ≤ 0 and hence τ DPS (J 2 2 , 2. The first task to exit τ DPS (J 2 2 ,t) is τ 2 , right after time 4, because lag 2 (4 + ) > 0. Finally, it is pointless to consider the contents of the set τ DPS (J 2 2 ,t) after the start time, 5.5, of J 2 2 . Using the set τ DPS (Ĵ,t), we can define the set of tasks that we use to define the cumulative lag.
Definition 7 We define as extended set τ(Ĵ,t) of blocking tasks for a job portion J ⊆ J j i blocked by priority, the union τ MPS (Ĵ,t) ∪ τ DPS (Ĵ,t).
We can now define formally the cumulative lag of the MPS. We use the name total lag for the exact quantity that we use in the proof, to distinguish it from the informal quantity mentioned so far.
Definition 8 We define as total lag for a job portionĴ ⊆ J j i blocked by priority, the sum ∑ h∈τ(Ĵ,t) lag h (t).
The above quantity is the counterpart of the total lag in (Devi and Anderson (2008)[Equation (11)]). Recall that, by (23), lag h (t) = lag h d j i (t) in Definition 8. Figure 1.F shows the graph of the total lag for J 2 2 during [0, 5.5). As can be seen by comparing Figure 1.F with figures 1.D, 1.B and 1.E, the total lag for J 2 2 decreases or is at most constant while |τ MPS (J 2 2 ,t)| ≥ 3 holds, i.e., while |τ MPS (J 2 2 ,t)| is greater than or equal to the number of processors M. As it will be possible to deduce from the proof of Lemma 16, the reason is that during these time intervals the MPS provides the tasks in τ MPS (J 2 2 ,t), and hence in τ(J 2 2 ,t), with a total amount of service per time unit equal to M, while the total speed of the DPS can be at most equal to U sum ≤ M.
In contrast, the total lag happens to grow while |τ MPS (J 2 2 ,t)| ≤ 2, i.e., during [3, 4), because the sum of the speeds of the busy dedicated processors in the DPS happens to be higher than 2, i.e., than the sum of the speeds of the busy processors in the MPS.
As τ MPS (Ĵ,t), also τ(Ĵ,t) is well-defined also before the release time r j i of J j i . The purpose of having these sets well-defined also before r j i is having the total lag well-defined before r j i too. This is instrumental in our approach for computing an upper bound to the total lag, because we compute this upper bound by following the evolution of the total lag also during the time interval [0, r j i ]. In particular, denoting byŝ the start time ofĴ, we compute an upper bound to the total lag by induction on the beginning of every sub-interval of [0,ŝ] during which the total lag does not grow. In the next subsection we define the concepts we use to implement this strategy.

Growing-Lag and non-growing-Lag intervals
To compute an upper bound to the total lag, we consider, separately, the time intervals during which the total lag grows or does not grow, respectively. To give a shorter name to these intervals, we borrow from Devi and Anderson (2008) the capitalized term Lag as a synonym of total lag. Using this term, we give the following definition.

Definition 9
We define as growing-Lag or non-growing-Lag interval for a portion J ⊆ J j i blocked by priority and starting at timeŝ, every maximal time interval in [0,ŝ) such that the total lag forĴ (Definition 8) is, respectively, strictly increasing at all times in the interval or not increasing at any time in the interval.
During non-growing-Lag intervals the total lag may then even decrease. Figure 1.G shows non-growing-Lag intervals and, by difference, growing-Lag intervals for the job portions blocked by priority (for the moment do not consider the numbers above the non-growing-Lag intervals). The graph of the total lag for J 2 2 in Figure 1.F allows non-growing-Lag intervals to be immediately deduced for J 2 2 . On the other hand, the non-growing-Lag interval for the second portion of J 1 4 , namely [0, 2.5), can be deduced by the fact that the deadlines of the jobs released during [0, 2.5) are earlier than both d 4 1 and d 2 2 . Then, according to Definition 3, during [0, 2.5) the sets of blocking task for the second portion of J 1 4 and for J 2 2 coincide. Hence, during [0, 2.5) the graph of the total lag for the second portion of J 1 4 coincides with the graph of the total lag for J 2 2 in Figure 1.F.

Definitions used in the computation of an upper bound to the total lag
As we already stated, one of the main steps of the proof is computing an upper bound to the cumulative lag, i.e., to the total lag introduced in Definition 8. We compute this bound by induction on the start times of non-growing-Lag intervals. To this purpose, we order non-growing-Lag intervals, globally, by start times, as detailed in the following definition.
Definition 10 Consider the union of all the non-growing-Lag intervals for all the portions blocked by priority. Consider then the global sequence of the start times of these non-growing-lag intervals, i.e., the union of the sequences of the start times of the non-growing-lag intervals for each portion blocked by priority. Given this global sequence of start times, we define as k-th non-growing-Lag interval, with k ≥ 1, the non-growing-Lag interval whose start time is the k-th one in the global sequence. If two non-growing-Lag intervals start at the same time, we assume that the tie is broken arbitrarily.
The numbers reported above non-growing-Lag intervals in Figure 1.G are an example of a possible valid ordering. Hereafter we use the symbols b k and f k to denote the start and finish times of the generic k-th non-growing-Lag interval [b k , f k ). Note that we assume non-growing-Lag intervals to be right-open intervals, in accordance to the fact that a busy interval for a portion is a right-open interval, and, as we prove in Lemma 19, terminates at the same time instant as the last non-growing Lag interval for the portion.
To implement our proof by induction of an upper bound to the total lag, we also use a helper quantity Λ (k). To define this quantity, we define, firstly, the maximum total lag for the k-th non-growing-Lag interval as follows whereĴ k is the job portion for which the total lag at the start time b k of the k-th non-growing-Lag interval is maximum, among the portions whose last non-growing- , the deadline of the jobĴ k belongs to). In (24) we cannot use the short expression lag h (b k ) as in Definition 8, because the job to whichĴ k belongs may not be J j i as in Definition 8. As an example of λ (k), consider the third non-growing Lag interval in Figure 1.G, i.e., the interval [b 3 , f 3 ) = [4, 5.5). According to Figure 1.B, this interval is the last non-growing Lag interval for only J 2 2 . Moreover, from the graph of the total lag for J 2 2 in Figure 1.F and the values of the lags computed in the example right after Definition 5, we have that the total lag for J 2 2 is equal to 1.16 at time b 3 = 4. As a consequence, λ (3) = 1.16. Using the function λ (·), we define Λ (k) as the maximum among the values of λ (·) at times b 1 , b 2 , . . ., b k , i.e., (25) Figure 2 shows an example of how λ (k) and Λ (k) may vary with k. Note that, since Λ (k) is the maximum possible value of the total lag at the beginning of every non-growing-Lag interval starting in [0, b k ], it is easy to show that Λ (k) is more in general the maximum possible value of the total lag, for any job portion, at all times in [0, b k ]. Therefore, to compute an upper bound to the total lag for any possible job portion, we compute an upper bound to Λ (k) that holds for all k.
In particular, we achieve the latter goal by computing an upper bound to λ (k) that holds for all k, and, in more detail, we get an upper bound to λ (k) by computing an upper bound to ∑ h∈τ(Ĵ,b k ) lag h (b k ) that holds for any job portionĴ ⊆ J j i for Fig. 2: Possible graphs of λ (k) and Λ (k): λ (k) may fluctuate with k, whereas Λ (k) is monotonically non-decreasing. In particular, for all k and for all p ≤ k, Λ (k) ≥ λ (p).
which the k-th non-growing-Lag interval is the last non-growing-Lag interval (recall (23)). A non-trivial aspect related to this step is that τ(Ĵ, b k ) may contain up to all the N tasks in τ. Fortunately, exploiting the fact that the sum ∑ h∈τ(Ĵ,b k ) lag h (b k ) is of course at most equal to the sum of only its positive terms, we can upper-bound the sum ∑ h∈τ(Ĵ,b k ) lag h (b k ) by considering far less tasks. In particular, we can consider only the following set of tasks.
Definition 11 Assuming thatĴ ⊆ J j i is one of the job portions for which [b k , f k ] is a non-growing-Lag interval, we defineτ as the subset of τ(Ĵ, b k ) whose tasks have a positive lag at time b k , i.e., To simplify the notation, we have not included any dependence on k orĴ in the symbol of the set. In fact, in all lemmas we always use the setτ only in relation to the k-th non-growing-Lag interval and the portionĴ. From (26), we have It is then enough to compute an upper bound to the RHS of the last inequality to get an upper bound to the total lag at the beginning of the generic k-th non-growing-Lag interval. This simplifies the proofs, becauseτ contains at most ⌈U sum ⌉ − 1 tasks, as stated in Lemma 20.

Proof outline
In this section we outline the proof of the harmonic bound, which we provide then in Section 8. In particular, in this section we highlight the main lemmas. The proofs of these lemmas are then provided in Section 8, apart from the next preliminary lemma. Our proof of the harmonic bound moves from the following fact.

Lemma 3 If a job J j i is fully executed without interruption as it is released, then
Otherwise, suppose that J j i is still fully executed without interruption, but starts to be executed only (right) after being blocked by precedence. In this case, ifĴ is the head of the chain J j i belongs to, andĴ ∈ J e i , with e < j, we have that holds, where f e i and d e i are the completion time and the deadline of J e i .
Instead, suppose that the job J j i is blocked by precedence and then executed without interruption, as, e.g, J 3 2 in Figure 1.B. In this case, we prove (28) by induction, using the following relations: where the last inequality follows from r j i ≥ d j−1 i . As for a base case, if j − 1 = e, then (28) follows directly from (29). As for an inductive step, the inductive hypothesis is that In this case the thesis follows by substituting (30) in (29). ⊓ ⊔ From this lemma we can derive the following central corollary for the proof reported in this paper.
Corollary 1 A necessary condition for a job to experience a non-null tardiness is that either (a) the last portion of the job is blocked by priority, or (b) the whole job is executed after being blocked by precedence, and the head of the chain the job belongs to is in its turn blocked by priority.
Proof We prove the thesis by contradiction. Suppose that the condition is false. It implies that either the job starts as it is released (because it is not blocked for any reason), or the job is blocked by precedence but the head of the chain it belongs to is a whole job that starts as it is released (by definition of a head, this is the only possibility if the head is not blocked by priority; in fact the head of a chain cannot start right after being blocked only by precedence, because the task is not in service right before the head starts to be executed). In the first case, the tardiness of the job is 0 by Lemma 3, whereas in the second case it is the head to have a null tardiness by Lemma 3. But, by the second part of Lemma 3, the tardiness of the job is therefore 0 also in the second case. ⊓ ⊔ Fig. 3: Dependencies among main lemmas, and between the latter and Theorem 1.
According to Note 1 at the end of Section 2.2, the head of the chain in subcondition (b) in Corollary 1 is necessarily the last portion of some job. In the end, considering this fact and sub-condition (a) in Corollary 1, to get an upper bound to the tardiness of any job, we need to focus only on last portions blocked by priority. This is exactly what we do in most of the proof of the harmonic bound, which consists of the following two main steps. To help the reader visualize the proof flow, we report in Figure 3 the dependencies among main lemmas, and between the latter and Theorem 1 (the figure also shows lemmas and sub-steps that we describe later in Section 8). Finally, the proofs the lemmas reported in the following steps are provided in Section 8.
Step 1: compute an upper bound to the tardiness as a function of the total lag As we discussed in Section 3, a job J j i may experience a positive tardiness, because the service scheme of the MPS is sub-optimal in packing jobs on the processors so as to fully utilize all processors. In particular, exactly because of this sub-optimal job packing, the MPS may be also cumulatively lagging behind in executing some jobs when J j i is released. Our first main step is to compute an upper bound to the tardiness as a function of these facts.
Lemma 4 For every job J j i , there exists at least one integer k for which the following inequality holds: where, with some abuse of notation, the denominator of the first fraction is measured in service units per time unit 6 , whereas the coefficient M−1 M is a pure number.
The quantity Λ (k) M is an upper bound to the component of the difference f j i − d j i caused by the cumulative lag of the MPS, i.e., by the total lag of the MPS with respect to the DPS. Instead, as in the RHS of (7), the fixed component M−1 M C i in the RHS of (31) accounts for the sub-optimal job packing of the MPS.
Step 2: compute an upper bound to the total lag (31)  M by computing an upper bound to Λ (k) that holds for all k. This bound is reported in the next lemma.
then we have, for all k ≥ 1, We familiarized already with all the terms in (33) in Section 3, except for Λ and L g ′ . In practice, these two terms are the counterparts of Γ and C g ′ in (5) and (6). In more detail, Λ turns into Γ , and L g ′ turns into C g ′ after replacing (33) in (31) (because the denominator in Λ (k) M is a speed, and amounts of services are then turned into times). This replacement is actually our final sub-step for proving Theorem 1.
Regarding the computation of the bound (33), we already said in Section 4.7 that we compute it by computing an upper bound to ∑ g∈τ(Ĵ,b k ) lag g (b k ) that holds for any job portionĴ ⊆ J j i for which the k-th non-growing-Lag interval is the last nongrowing-Lag interval. In the same section, we already pointed out also that, thanks to (27), we achieve this goal by computing an upper bound to the sum ∑ g∈τ lag g (b k ).
The peculiarity of the proof reported in this paper lies in how we compute an upper bound to the sum ∑ g∈τ lag g (b k ). In this respect, we compute this upper bound with the same two steps as in Anderson (2005, 2008)): first, we compute a bound to each lag in the sum ∑ g∈τ lag g (b k ), and then we sum these per-task bounds. But we do get a smaller value for the sum of the per-task bounds than that in Anderson (2005, 2008)). The reason is that we compute each per-task bound using the lag-balance property, which we can now state more precisely as follows: the upper bound to lag g (b k ) for each task τ g ∈τ decreases as the sum of the lags of part of the other tasks inτ increases. This internal counterbalance enables us to bring down the value of the sum of the per-task bounds over all the tasks inτ.
We state the lag-balance property formally in Lemma 24 in Section 8.3 (where we describe the sub-steps by which we implement Step 2). Instead, in the rest of this outline, we focus on the preliminary property from which we derive the lagbalance property. We call this preliminary property time-balance property, and we focus on it here, because, for not repeating similar steps twice, we use this property also to compute the bound (31) in Step 1. For this reason, to prove this property is a preliminary Step 0 in our proof.
Step 0: prove the time-balance property The next lemma states the time-balance property.
Lemma 6 (Time-balance property) Given: (a) a job portionĴ ⊆ J j i blocked by pri- wherel is the size, in service units, of the part of the job J j i not yet executed in the MPS by timeŝ, and the quantity M − ∑ v∈ϒ U v is measured in service units per time unit (i.e., is a speed).
The above degrees of freedom in choosing t 1 and ϒ are key to use the time-balance property for proving the lag-balance property as a part of Step 2. Instead, we use the special case t 1 =b and ϒ = / 0 to prove Lemma 4 in Step 1. Although the time-balance property is the cornerstone on which the harmonic bound depends, the proof of Lemma 6 does not highlight the core properties of G-EDF that enable the time-balance property, and hence the lag-balance property, to hold (the proof of Lemma 6 consists only of algebraic steps). For this reason, we devote the next section to the intuition behind the time-balance property.

Intuition behind the time-balance property
The time-balance property, and therefore the lag-balance property, follow from a general balance property that G-EDF shares with any work-conserving scheduler. Hereafter we refer to this property as just the general property. In this section we provide, first, a description of this property and an intuitive explanation of the reason why it holds. Then we show, again intuitively, the link between the general property and the time-balance property. The reader not interested in these aspects may skip to the proof machinery in Section 7.

The general property
Lemma 7 (General property) Given: (a) a job portionĴ ⊆ J j i that starts to be executed at timeŝ, (b) any time instant t 1 , with t 1 ≤ŝ and such that at least M tasks have pending jobs at all times in [t 1 ,ŝ), (c) any subset ϒ ⊆ τ not containing τ i (including the empty set), and 1. the parameters of every task in τ and the arrival time of every job; 2. the amounts of service received by every task, by time t 1 , in the MPS and in the DPS, respectively; 3. the service pattern of the DPS from time 0 on; 4. the amount of service received in the MPS by every task in τ \ϒ during [t 1 ,ŝ); and, assuming (for simplicity) that for every task τ v ∈ ϒ the equality lag holds at all times t in [t 1 ,ŝ), we have that the following implication holds: if all the quantities considered in the above items 1-4 are fixed and if any work-conserving algorithm is used to schedule jobs in the MPS, then the differencê s − d j i decreases as the sum ∑ v∈ϒ lag v (ŝ) increases. 7 According to the purpose of this section, we discuss and justify Lemma 7 intuitively. First, the general property is more general than the time-balance property because, differently from Lemma 6, in Lemma 7: a) the scheduling algorithm can be any work-conserving algorithm, including, but not limited to, G-EDF, b) t 1 is not constrained to belong to a busy interval, and c) ϒ is any subset of tasks that does not contain τ i . In particular, a task in ϒ may or may not be served during [t 1 ,ŝ).
To visualize the property, let Scenario A and Scenario B be two generic scenarios such that: for every parameter and quantity in items 1-4 in Lemma 7, the parameter or the quantity has the same value in both scenarios; in Scenario A, the value of the sum of the lags ∑ v∈ϒ lag v (ŝ) is higher than in Scenario B.
The top half of Figure 4 shows the service provided by the MPS and the DPS during [t 1 ,ŝ) for a possible Scenario A. The execution of job portions is represented with rectangles. In particular, the job portions of the tasks in/not in ϒ are represented with filled/empty rectangles; apart fromĴ, which is represented with a dark rectangle. Finally, the bottom half of Figure 4 shows the service provided by the MPS and the DPS in a possible Scenario B. Note that, according to item 3 in Lemma 7, the service of the DPS is identical in both scenarios.
As for the service of the MPS in Scenario B, we can consider that each of the parameters and the quantities in items 1-3 in Lemma 7 has the same value in both scenarios. Therefore the only possibility for the sum ∑ v∈ϒ lag v (ŝ) to be lower in Scenario B (with respect to Scenario A) is that, in Scenario B, the tasks in ϒ receive in the MPS more total service, during [t 1 ,ŝ), than in Scenario A. Let this extra total service be equal to ∆ service units.

Proc.
A Since, in Scenario B, every task not belonging to ϒ receives the same amount of service, in the MPS, as in Scenario A (by item 4), it follows that the overall total work done by the MPS in Scenario B during [t 1 ,ŝ) is larger than the total work done by the MPS in Scenario A, exactly by ∆ service units. As shown by Figure 4 for the case M = 3, this implies that in Scenario B the length of [t 1 ,ŝ), i.e., the differenceŝ − t 1 , is larger by ∆ /M time units with respect to Scenario A. The reason is that the MPS works at constant total speed M during [t 1 ,ŝ), which, in its turn, happens because at least M tasks have pending jobs at all times in [t 1 ,ŝ), and the scheduling algorithm is work-conserving.
Before completing the justification of the general property, it may be worth stressing that, being the length of [t 1 ,ŝ) in scenario B larger than in scenario A, also the DPS may happen to provide the tasks in ϒ with more total service in Scenario B than in Scenario A. This implies that the difference between the values of ∑ v∈ϒ lag v (ŝ) in scenario B and scenario A may be less than ∆ , whereas one may have expected, in the first place, that it was equal to ∆ . The exact relation between ∆ and the values of ∑ v∈ϒ lag v (ŝ) is correctly taken into account (implicitly) in the proof of Lemma 6. In any case, what matters for the general property to hold is that, if the sum of the lags ∑ v∈ϒ lag v (ŝ) in Scenario B is lower than in Scenario A, then the length of [t 1 ,ŝ) is larger too.
To complete the justification of the property, suppose, on the opposite end, that we move from Scenario B to Scenario A, i.e., from an original generic scenario to an alternative scenario in which the value of every parameter and quantity in items 1-4 in Lemma 7 is again the same as in the original scenario, but in which the sum of the lags ∑ v∈ϒ lag v (ŝ) is higher, and not lower, than in the original scenario. Reversing the above arguments and visualizing what happens through Figure 4, we deduce that now the differenceŝ − t 1 decreases by ∆ /M. Putting this and the previous case together, and considering that, being t 1 fixed, the differenceŝ − d j i of course grows/decreases withŝ − t 1 , we get the general property as stated in Lemma 7.

From the general to the time-balance property
The time-balance property, as stated in Lemma 6, is an instance of the general property in which: 1) the scheduling algorithm is G-EDF, 2) the time interval [t 1 ,ŝ) is a subset of the busy period for the job portionĴ, and 3) ϒ ⊆ τ MPS (Ĵ,t 1 ). The second fact is essential for the RHS of (34) in Lemma 6 to contain a component proportional to the total lag. And the presence of this component is the property that allows us to use Lemma 6 for proving Lemma 4 in Step 1. On the other hand, the relation ϒ ⊆ τ MPS (Ĵ,t 1 ) is instrumental in turning the time-balance property into a property (the lag-balance property) that allows us to compute a tighter upper bound to the total lag in Step 2.
Accordingly, also the lag-balance property, as reported in this paper in Lemma 24, happens to be an instance of a more general lag-balance property. In fact, by repeating about the same steps by which we derive the lag-balance property from the time-balance property (proof of Lemma 24), but applying these steps to the general property instead of the time-balance property, one would obtain a general lag-balance property that holds for any work-conserving scheduler.
On the opposite end, the time-balance property in Lemma 6 holds under more relaxed constraints than imposing that all the parameters and quantities in items 1-4 in Lemma 7 are fixed. In fact, for the RHS of (34) to decrease as ∑ v∈ϒ lag v (ŝ) increases, it is enough that only all the other variables in the RHS of (34) are fixed.
The time-balance property in (34) holds under the above weaker constraints, because the bound (34) is computed assuming that the DPS provides the most unfavorable service it could provide, i.e., the service that causes the differenceŝ − d j i to reach its maximum possible value. The reader will find details in the proof of Lemma 6. In particular, in that proof we exploit (implicitly) that fact that, after assuming that the above worst-case DPS service occurs, the general property holds for any work-conserving scheduler, provided that only the parameters in item 1 in Lemma 7 and the sum ∑ h∈τ(Ĵ,t 1 ) lag h (t 1 ) are fixed. Unfortunately, even if the service of the DPS is fixed, the general property becomes however quite hard to visualize when only these two weaker invariants hold. We do not show the property also for this case.

Proof machinery
As a support to the proof of the harmonic bound, in this section we provide a proof machinery containing all the properties we need to implement the steps described in the outline in Section 5. As the definitions in Section 4, most properties basically coincide with corresponding properties of the machineries used in (Devi and Anderson (2008); Valente and Lipari (2005b)). Some properties are a little generalized with respect to (Devi and Anderson (2008); Valente and Lipari (2005b)), to prepare the ground for the use of the lag-balance property.
The machinery is a fairly rich collection of lemmas; therefore, although in the following subsections we try to describe, intuitively, the role in the proof of each property, the full picture and the contribution of every component will probably be completely clear only after putting all the pieces together in Section 8. Alternatively, the reader may skip directly to the proof in Section 8, and jump back to next lemmas as needed.

Busy interval and blocking tasks
An obvious condition for Lemma 6 to hold in Step 0 is that the time instant t 1 defined in that lemma exists. The next lemma guarantees this fact.
Lemma 8 Given a job portionĴ ⊆ J j i blocked by priority and the release time r j i of J j i , the start timeb of the busy interval forĴ belongs to [r j i , d j i ).
Proof First,b ≥ r j i holds by Definition 2. Second, as for the remaining inequalitŷ b ≤ d j i , we prove it by contradiction. To this purpose, suppose that the opposite, i.e., b > d j i holds, and let Θ denote the set of M jobs in execution at timeb. By Definition 2, all the jobs in Θ have a deadline earlier than or equal to d j i . Hence they have all been released before time d j i . This fact, plus the fact that all the jobs in Θ are still pending at timeb (recall that, according to our definition of pending, a job in execution is still pending), imply that there exists a minimal time instant t s ∈ [r j i , d j i ], such that all the jobs in Θ are pending throughout [t s ,b]. Therefore, since Θ contains M jobs, all with a deadline earlier than or equal to d j i , by Lemma 1 all the processors of the MPS would have been busy executing jobs with deadlines earlier than or equal to d j i throughout [t s ,b] (i.e., executing either jobs in Θ or jobs with an even earlier deadline). But, by Definition 2, this implies that the busy interval forĴ should have started at or before time t s , which contradicts our assumption thatb > d j i (because t s ≤ d j i ).

⊓ ⊔
We conclude this subsection with the following property, which we use in various lemmas to implement Step 2.
Lemma 9 Given a job portionĴ ⊆ J j i starting at timeŝ after being blocked by priority, all the tasks in τ MPS (Ĵ,t) are in service (i.e., have their jobs in execution) at all times t in every sub-interval of [0,ŝ) during which |τ MPS (Ĵ,t)| ≤ M holds, whereas only tasks in τ MPS (Ĵ,t) are in service at all times t in every sub-interval of [0,ŝ) during which |τ MPS (Ĵ,t)| > M holds.
Proof The thesis trivially follows from Lemma 1 and the fact that the tasks in τ MPS (Ĵ,t) own, of course, the portions with the earliest deadlines among all the portions pending during any sub-interval of [0,ŝ). ⊓ ⊔

Dedicated-Processor System (DPS)
The next lemma states the main property of the DPS that allows us to prove Lemma 4 in Step 1. Informally, this property says that the maximum tardiness of a job grows with the difference between the total work that the MPS may do while the last portion of the job waits to be started, and the total work that the DPS has to do while executing the same job. The link between this property and Lemma 4 is that, as we also prove, this difference is in its turn upper-bounded by a quantity that grows with Λ (k).
Before reporting the lemma, we stress two points. First, instead of providing directly an upper bound to f j i − d j i , the lemma provides an upper bound toŝ − d j i for a generic job portionĴ, whereŝ is the start time ofĴ. This intermediate result will come in handy for proving the time-balance property in Step 0. Second, the lemma refers to two time intervals that start from a generic time instant t 1 ∈ [b, min{d j i ,ŝ}) (where the upper limit min{d j i ,ŝ} for t 1 is needed to let both time intervals be well-defined, as explained in the proof). As we already discussed in the description of Step 0, we exploit this degree of freedom to get a tighter upper bound to the total lag in Step 2.
Lemma 10 Given a job portionĴ ⊆ J j i blocked by priority, the busy interval [b,ŝ) forĴ, and any time instant where: W MPS h (t 1 ,ŝ) , we prove the thesis by: 1) computingŝ − t 1 as a function of the total work done by the MPS during [t 1 ,ŝ), i.e., of the sum , 2) computing a lower bound to d j i − t 1 as a function of the total work done by the DPS during [t 1 , d j i ], i.e., of the sum , and 3) subtracting this lower bound from the value computed forŝ − t 1 in the first step.
As for the first step, all processors of the MPS are constantly busy throughout [t 1 ,ŝ), by definition of busy interval, and because [t 1 ,ŝ) ⊆ [b,ŝ). Therefore, the MPS works at constant total speed M during [t 1 ,ŝ). It follows that where the denominator is measured of course in service units per time unit.
Regarding the second step, and considering that, by Definition 4, the maximum total speed of the DPS is ∑ h∈τ U h , with ∑ h∈τ U h ≤ M, we get Subtracting (37) from (36), we get the thesis.

⊓ ⊔
Note that we could correctly replace M with ∑ h∈τ U h in (37), and obtain a tighter upper bound than (35) for the case ∑ h∈τ U h < M. Instead, we use M in (37) to get simpler formulas.

Lag
The next lemma provides us with two sufficient conditions for the quantity lag h (t) to be equal to just the difference between the amounts of service received by the task τ h in the DPS and in the MPS. We use this property in almost all lemmas related to lags.
Lemma 11 Given a job portionĴ ⊆ J j i , if a task τ h belongs to τ MPS (Ĵ,t) at time t, or, more in general, if only jobs of τ h with a deadline earlier than or equal to d j i have been executed by time t, then Proof We start by proving the thesis for the more general case where the MPS has not yet executed any job of τ h with a deadline strictly later than d j i by time t. This implies W MPS h (t) ≤ ∑ m n=1 l n h with m defined as in Definition 5. Then (38) holds from (21). Therefore, As for the special case where τ h belongs to τ MPS (Ĵ,t) at time t, by Definition 3, the task has at least one pending job with a deadline earlier than or equal to d j i at time t. This implies that by time t the MPS has not yet executed any job of τ h with a deadline strictly later than d j i . Then (38) and hence (39) hold also in this case.

⊓ ⊔
The next lemma tells us that, given a portionĴ ⊆ J j i , the lag lag h (t) of a task belonging to τ MPS (Ĵ,t) does not increase while the task is served in the MPS. We use this property in two other lemmas in Step 2.
Lemma 12 Given a job portionĴ ⊆ J j i , the lag lag h (t) of a task τ h decreases by at least (1 − U i )(t 2 − t 1 ) service units in a time interval [t 1 ,t 2 ] during which the task continuously belongs to τ MPS (Ĵ,t) and is continuously served in the MPS.
Proof During [t 1 ,t 2 ], the task receives an amount of service W MPS given to the task in the DPS is at most U i · (t 2 − t 1 ). In addition, since τ h continuously belongs to τ MPS (Ĵ,t) by Lemma 11. Using these relations, we can write

⊓ ⊔
The following two lemmas link the maximum lag of a task, on the start time of a generic job portion, with how late that portion is. In the first lemma, we assume that the length of the job the portion belongs to is maximum. As we explain in more detail in Section 7.7, this assumption helps us simplify proofs in Step 2.

Lemma 13 Given a job portionĴ ⊆ J j i blocked by priority, and a job portionĴ
In the end, the last inequality holds for any ordering between s h and d l h . We can now compute our bound to lag h (ŝ) with the following derivations, where the first equality follows from Lemma 11 and the fact that τ h belongs to τ MPS (Ĵ, s h ), The last property of the lag of a task that we use in Step 2 is concerned with the possible values of this function at the time instants at which the task enters or exits the sets τ MPS (Ĵ,t) and τ(Ĵ,t).
Lemma 15 Given a job portionĴ ⊆ J j i blocked by priority, we have that, if t e is a time instant at which a task τ e enters the set τ MPS (Ĵ,t) or the set τ(Ĵ,t), then lag e (t e ) = 0. If, instead, the task τ e exits τ(Ĵ,t) at time t e , then lag e (t e ) > 0.
Proof As for the case of a task entering τ MPS (Ĵ,t) or τ(Ĵ,t), according to Definitions 6 and 7, a task can enter τ(Ĵ,t) only by entering τ MPS (Ĵ,t). Then, focusing on a task that enters τ MPS (Ĵ,t) at time t e , we can note, first, that it cannot happen because of a job completion in the MPS. In fact, when a job is completed in the MPS, either the task has no more pending jobs in the MPS, or the next job to execute for the task has a strictly later deadline than the just completed one. Second, if a task already has pending jobs right before time t e in the MPS, then the task cannot enter τ MPS (Ĵ,t) because of the arrival of a new job at time t e either. In fact, the newly-released job has a strictly later deadline than the pending ones.
In the end, the only possibility for a task τ e to enter τ MPS (Ĵ,t) at time t e is that the task has no pending job in the MPS right before time t e , and a new job of the task is released at time t e . Since the DPS has certainly finished all previous jobs of the task by time t e too (by Definition 4), it follows that W DPS e (t e ) = W MPS e (t e ). This implies lag e (t e ) = 0 by Lemma 11.
As for the exit from τ(Ĵ,t), we can consider two alternatives. First, it happens because the task τ e exits τ MPS (Ĵ,t) at time t e . In this case, by Definitions 6 and 7, the only possibility for the task to not enter τ DPS (Ĵ,t + e ), and hence to exit also τ(Ĵ,t), is that lag e (t e ) > 0. The second alternative is that τ e exits τ(Ĵ,t) for a different reason than because the task τ e exits τ MPS (Ĵ,t) at time t e . In this second case, the only possibility for the task to exit τ(Ĵ,t) at time t e is that the task belongs to τ DPS (Ĵ,t e ) and exits the latter set at time t e . For this to happen, lag e (t e ) > 0 must hold, again by Definition 6. ⊓ ⊔

Total lag
In Section 4.5 we highlighted that, according to Figures 1.D and 1.F, the number of tasks in τ MPS (J 2 2 ,t) is lower than the total utilization of the task set while the total lag grows. This is a concrete example of the general property stated in the following lemma.

Lemma 16 If the total lag for a job portionĴ ⊆ J j i blocked by priority is strictly increasing at all t in
Proof To prove the thesis, we start by considering that the total lag cannot increase when the set τ(Ĵ,t) changes. In fact, by Lemma 15, if a task enters τ(Ĵ,t), then its lag is zero, while if a task exits τ(Ĵ,t), then its lag is positive.
Consider then any sub-interval [t 1 ,t 2 ) ⊆ [t a ,t b ) such that neither the set τ(Ĵ,t) or the set τ MPS (Ĵ,t) change at any time in [t 1 ,t 2 ). If τ(Ĵ,t) changes at time t 2 , then, as above noted, the total lag cannot increase at time t 2 . In addition, if only τ MPS (Ĵ,t) changes at time t 2 , then the total lag does not change at all, because the total lag is a continuous function in all time intervals during which τ(Ĵ,t) does not change (as lags are continuous functions). It follows that, if the thesis holds for any sub-interval [t 1 ,t 2 ) defined as above, then the thesis holds also for the whole time interval [t a ,t b ). Since τ(Ĵ,t) = τ(Ĵ,t 1 ) holds for all t ∈ [t 1 ,t 2 ), and defining, for brevity, , the variation of the total lag during [t 1 ,t 2 ) is equal to where the last inequality follows from τ MPS (Ĵ,t 1 ) ⊆ τ(Ĵ,t 1 ) and As for the sum in the last line in (45), by Lemma 11, we have: In addition, by Lemma 9, a number of processors at least equal to |τ MPS (Ĵ,t 1 )| is busy serving tasks in τ MPS (Ĵ,t 1 ) during [t 1 ,t 2 ). Since every processor of the MPS has unit speed, it follows that Replacing (46) and (47) in (45), we get Since U sum ≤ M, the only possibility for the RHS of (48) to be strictly positive is that |τ MPS (Ĵ,t 1 )| < U sum . In this respect, being |τ MPS (Ĵ,t 1 )| an integral number, the last relation implies that |τ MPS (Ĵ,t 1 )| can be equal at most to the largest integer strictly lower than U sum , i.e., that |τ MPS (Ĵ,t 1 )| ≤ ⌈U sum ⌉ − 1. ⊓ ⊔

An upper bound to the extra work done by the MPS with respect to the DPS
The following lemma states that the total lag is an upper bound to the difference between the total work done by the MPS and the total work done by the DPS during the two time intervals considered in Lemma 10. This result, combined with Lemma 10 itself, allows us to prove the bound (31) in Step 1. Actually, the lemma provides a relation between two more general quantities than just two total amounts of work. In fact, the relation concerns the total service given to any subset of tasks in τ. This generalization, plus the fact that, as in Lemma 10, the two time intervals of interest start from a generic time instant t 1 ∈ [b, min{d j i ,ŝ}), are instrumental in exploiting the lag-balance property to get a tighter upper bound to the total lag in Step 2. The proof of the lemma is relatively long, so, for ease of exposition, we state the lemma first, and then: (a) introduce the idea behind the proof, (b) present the per-task bounds used to prove the lemma as a separate lemma, (c) provide the proof.

andl is the size, in service units, of the part of the job J j i not yet executed in the MPS by timeŝ.
To compute an upper bound to the left-hand side (LHS) of (49), we focus on a set of tasks with the following two properties: (1) it contains at least all the tasks, except for those in ϒ , that are served by the MPS during [t 1 ,ŝ], and (2) for every task in the set, it is possible to lower-bound the service that the task receives in the DPS during [t 1 , d j i ] as a function of the service that it receives in the MPS during [t 1 ,ŝ]. As for the first property, to compute the value of the first sum in the LHS of (49) it is necessary to consider at least all the tasks served during [t 1 ,ŝ], except for those in ϒ . Instead, the second property allows us to lower-bound the value of the second sum in the LHS of (49) as a function of the first sum. In particular, it allows us to prove that the difference between the two sums is upper-bounded exactly by the RHS of (49). To define such a set, we start from the following superset.
Definition 12 Referring to the notations in Lemma 17, we define as τ the union of the set of tasks served during [t 1 ,ŝ] in the MPS and of the set τ(Ĵ,t 1 ).
The set we need is τ \ϒ , where ϒ is defined as in Lemma 17. In fact, as for the first property, the following equality trivially holds: On the other hand, the second property follows from the fact that the superset τ itself enjoys the second property, as stated in the following lemma.

Lemma 18 Given: (a) a portionĴ ⊆ J j i blocked by priority, (b) the busy interval [b,ŝ) forĴ, (c) any time instant t
In addition, τ i ∈ τ, and wherel is the size, in service units, of the part of the job J j i not yet executed in the MPS by timeŝ.
Proof First, as we demonstrated at the beginning of the proof of Lemma 10, both time intervals [t 1 ,ŝ] and [t 1 , d j i ] are well-defined. In addition, J j i is pending at time t 1 , by definition of t 1 and by Lemma 8. Therefore τ i belongs to τ MPS (Ĵ,t 1 ), and hence to τ(Ĵ,t 1 ). This implies that τ i belongs to τ, because τ(Ĵ,t 1 ) ⊆ τ. Using this fact, we prove the rest of the thesis by considering, separately, (a) the tasks in τ \ {τ i } that receive some service in the MPS during [t 1 ,ŝ), (b) the task τ i itself, and (c) the tasks in τ \ {τ i } that do not receive any service in the MPS during [t 1 ,ŝ).
As for the first subset, consider any task τ h ∈ τ \ {τ i }, for which some jobs are executed, completely or in part, by the MPS during [t 1 ,ŝ). These jobs have deadlines earlier than or equal to d j i . This implies that, by time t 1 , the MPS has not yet executed any job of τ h with a deadline later than d j i . Therefore, by Lemma 11 we have Moreover, the fact that the jobs of τ h executed during [t 1 ,ŝ) have deadlines earlier than or equal to d j i also implies that the DPS must have completed all of these jobs by time d j i . Of course, by time d j i , the DPS may have executed, completely or in part, also other jobs of τ h . In the end, the following inequality holds: Using the last two relations, we can write: Consider now τ i . The DPS must of course have finished J j i by time d j i , whereas the MPS has still to execute at leastl service units of that job at timeŝ, i.e., W MPS (55), and replacing the last inequality instead of (54), we get (52).
We are left now with the tasks in τ \ {τ i } whose jobs are not executed at all by the MPS during [t 1 ,ŝ). If τ h is one such task, we have W MPS h (t 1 ,ŝ) = 0. In addition, if lag h (t 1 ) ≥ 0 holds for the task, then W DPS h (t 1 , d j i ) + lag h (t 1 ) ≥ 0 trivially holds. The last inequality implies that (51) holds as well, because W h (t 1 ,ŝ) = 0.
On the other hand, in the case where lag h (t 1 ) < 0 holds, we can consider that, by (23), (22) and (21), lag h (t 1 ) ≥ W DPS h (t 1 ) − ∑ m n=1 l n h , i.e., |lag h (t 1 )| is equal at most to the sum of the sizes of the job portions, with a deadline earlier than or equal to d j i , that the DPS has still to execute at time t 1 . This sum is in its turn at most equal to W DPS h (t 1 , d j i ), because during [t 1 , d j i ] the DPS must finish both these job portions and possible other jobs of τ h , released during [t 1 , d j i ) and with a deadline earlier than or equal to d j i . In the end, also in this case, W DPS h (t 1 , d j i ) + lag h (t 1 ) ≥ 0 holds, which, combined with W h (t 1 ,ŝ) = 0, proves (51) again.

⊓ ⊔
Of course τ(Ĵ,t 1 ) and hence τ may not contain all the tasks that receive service in the DPS during [t 1 , d j i ]. Thus the bounds (51) and (52) may be loose in some scenarios. This is not however a problem, because we are interested in worst-case scenarios. Using the relations proved so far, we can now prove Lemma 17.
Proof (Lemma 17) Considering that, by Lemma 18, τ contains τ i , we have From (56), we have that the thesis holds if ∑ h∈(τ\τ(Ĵ,t 1 ))\ϒ lag h (t 1 ) ≤ 0 holds. We prove the latter inequality by proving that lag h (t 1 ) ≤ 0 holds for every task τ h in (τ \ τ(Ĵ,t 1 )) \ϒ . To this purpose, we prove, as a first step and by contradiction, that τ h cannot have pending jobs at time t 1 . Suppose then that τ h has some pending job in the MPS at time t 1 . It follows that one or more of these pending jobs are executed, at least in part, during [t 1 ,ŝ). In fact, by definition of τ, for τ h to belong to (τ \ τ(Ĵ,t 1 )) \ϒ , the task must receive some service during [t 1 ,ŝ). But the deadline of the jobs executed during [t 1 ,ŝ) has to be earlier than or equal to d j i , by Definition 2. This implies that τ h belongs to τ MPS (Ĵ,t 1 ) and therefore to τ(Ĵ,t 1 ) as well, by Definitions 3 and 7. This contradicts the fact that τ h belongs to (τ \ τ(Ĵ,t 1 )) \ϒ . Now, since τ h has no pending job in the MPS at time t 1 , then W MPS holds. In addition, we can consider that, according to the above arguments, jobs of τ h with a deadline earlier than or equal to d j i are executed during [t 1 ,ŝ). Since these jobs are released after time t 1 , it follows that no job with a strictly later deadline than d j i may have been released by time t 1 . Therefore, at time t 1 the DPS can have served only jobs with a deadline earlier than or equal to d j i . From this fact and the inequality W MPS h (t 1 ) ≥ W DPS h (t 1 ) it follows that lag h (t 1 ) ≤ 0 by (22). ⊓ ⊔

Busy intervals are subsets of non-growing-Lag intervals
By comparing figures 1.C and 1.G, we can see that the busy interval for each portion blocked by priority happens to be a sub-interval of the last non-growing-Lag interval for the same portion. This is not an accident, but a general property, as stated in the following lemma.

Lemma 19 The busy interval [b,ŝ) for a job portionĴ blocked by priority is a subinterval of the last non-growing-Lag interval for that portion. In particular, if the last non-growing-Lag interval forĴ is
Proof Since the last non-growing Lag interval forĴ must finish before timeŝ by Definition 9, we have that, if (57) holds, then the first part of the lemma holds, i.e., [b,ŝ) ⊆ [b k , f k ) and f k =ŝ. To prove (57), we can consider that, by definition of busy interval, |τ MPS (Ĵ,t 1 )| ≥ M > ⌈U sum ⌉ − 1 holds for all t 1 in [b,ŝ). Therefore, by Lemma 16, (57) holds for all t 1 in [b,ŝ).

⊓ ⊔
Thanks to the last lemma, the total lag at any time instant in a busy interval is not higher than the total lag at the beginning of the non-growing-Lag interval the busy interval belongs to. This is a necessary condition in our proof of an upper bound to Λ (k) in Step 2. 7.7 Reduced set of tasks used to upper-bound the total lag in Step 2 In this subsection we report the properties of the setτ defined in Definition 11, plus several abbreviations and assumptions that help simplify the proof of an upper bound to Λ (k) in Step 2. We start by the following lemma.
Lemma 20 Suppose that the k-th non-growing-Lag interval is a non-growing-Lag interval for a job portionĴ ⊆ J j i blocked by priority.

Otherwise, defining asb q the beginning of the growing-Lag interval forĴ that
terminates at time b k , (a) for every task τ g inτ, there exists a minimal time instant s g such that s g ≤b q , and τ g is served in the MPS at all times t in [s g , b k ) and belongs to τ MPS (Ĵ,t) at all times t in [s g , b k ]; (b) denoting byĴ g ⊆ J l g the head of the chain of τ g that starts at time s g , we have that s g > r l g holds (r l g is the release time of J l g ), andĴ g is blocked by priority; (c) the setτ is made of at most ⌈U sum ⌉ − 1 tasks.
Proof As for item 1, if b k = 0 the thesis holds trivially. Instead, we prove the thesis for k = 1 after proving item 2.a. Regarding item 2.a, consider any task τ g ∈τ. By Definitions 6 and 7, for τ g to belong toτ ⊆ τ(Ĵ, b k ), and, at the same time, have a positive lag at time b k (as provided by (26)), τ g must belong to τ MPS (Ĵ, b k ), because τ DPS (Ĵ, b k ) contains only tasks with a non-positive lag. In particular, by Definition 3, this implies that τ g has at least one pending job at time b k . Defining as t g the minimum time instant such that [t g , b k ] is a maximal time interval during which τ g has always pending jobs in the MPS, from the previous property we have that this time instant exists (as a degenerate case, t g = b k ). We prove item 2.a using t g . In particular, we prove, first, that at all times t in the intersection [t g , b k ) ∩ [b q , b k ), the task τ g is being served in the MPS and belongs to τ MPS (Ĵ,t). Then we prove that t g <b q , and that this inequality plus the previous property imply item 2.a.
As for the first step, from the fact that τ g belongs to τ MPS (Ĵ, b k ), i.e., owns at least one pending job with a deadline earlier than or equal to d j i in the MPS at time b k , and the fact that, by definition of t g , τ g has always pending jobs in the MPS throughout [t g , b k ], it follows that τ g has always at least one pending job with a deadline earlier than or equal to d j i in the MPS throughout [t g , b k ] (because the jobs of any task are both released and executed in increasing-deadline order). But, still by Definition 3, this implies that τ g belongs to τ MPS (Ĵ,t) at all times t in [t g , b k ]. In this respect, the . This relation and the fact that τ g belongs to τ MPS (Ĵ,t) at all times t in [t g , b k ] imply that τ g is continuously served during As for the second step, we consider first that, if t g <b q , then there exists a time instant s g such that t g ≤ s g ≤b q and τ g is constantly in service during [s g , b k ). In fact, as a degenerate case, s g =b q , because from the previous step and t g <b q we have that τ g is constantly in service at least during [b q , b k ). We prove then that s g exists by proving, by contradiction, that t g <b q . Suppose that t g ≥b q holds. By definition of t g , τ g starts to have pending jobs at time t g . As a consequence, according to what we proved in the previous step, τ g enters τ MPS (Ĵ,t) at time t g , and therefore lag g (t g ) = 0 holds by Lemma 15. This finally implies lag g (b k ) ≤ 0 by Lemma 12 and because τ g is constantly served, and continuously belongs to τ MPS (Ĵ,t), during [t g , b k ). But this is absurd by (26). In the end, the above-defined time instant s g exists.
The fact that τ g is constantly served during [s g , b k ) also implies that s g has pending jobs also throughout [s g ,b q ). In this respect, the jobs of τ g executed before timeb q have of course a deadline earlier than or equal to that of the job of τ g in execution at timeb q . As a consequence, since τ g belongs to τ MPS (Ĵ,b q ), then τ g continuously belongs to τ MPS (Ĵ,t) also during [s g ,b q ). This completes the proof of item 2.a.
We prove now item 2.b. Using the same arguments as above, we also have, again by Lemma 12 and since τ g is continuously served during [s g , b k ), that for τ g to have a strictly positive lag at time b k , the lag of the task must be strictly positive at time s g . For this to happen, by Lemma 14, s g > r l g must hold. In this respect, by definition of a head, the only possibility for the headĴ g to start after time r l g is that the head is blocked by priority.
As for item 2.c, from item 2.a we have that every task τ g ofτ constantly belongs to τ MPS (Ĵ,t) during [s g , b k ), and hence during [b q , b k ). But, by Lemma 16, |τ| ≤ ⌈U sum ⌉ − 1 at all times during We can finally prove also the sub-case k = 1 of item 1. If k = 1, then the k-th nongrowing-Lag interval is the very first non-growing-Lag interval. As a consequence, by Lemma 19, no job may have been blocked by priority before time b k . Thanks to item 2.b,τ is therefore the empty set. ⊓ ⊔ In Figure 5 we show an example of the chains of the tasks inτ in execution at the beginningb q of the growing-Lag interval that precedes the k-th non-growing-Lag interval [b k , f k ). The MPS has 7 processors. Rectangles represent job portions executed on the processors as in Figure 1.B. We assume that the non-growing-Lag interval [b k , f k ) shown in Figure 5 is the last non-growing-Lag interval for a portion J ⊆ J j i blocked by priority. The portionĴ is depicted as a filled, dark rectangle. Some rectangles are drawn with dashed lines during [b q , b k ), meaning that the portions they represent may or may not be executed during [b q , b k ). In fact, by Lemma 16, what In particular, the dark rectangle represents the execution of the portionĴ blocked by priority, whose last non-growing-Lag period is [b k , f k ), whereas filled light rectangles represent the chains, of the tasks inτ, in execution at the beginningb q of the growing-Lag interval that precedes [b k , f k ).
matters for [b q , b k ) to be a growing-Lag period, is only that these portions have a strictly later deadline than d j i . In the figure we assume that |τ| = 4, and we represent the chains, in execution at timeb q , of the four tasks inτ with filled, light rectangles. Note that the figure shows the possible case where a chain happens to terminate exactly at time b k (on P 4 ). The figure also shows other information that we use later.
We introduce now some abbreviations and assumptions that we use in the rest of this paper to simplify the proofs. With the following assumptions we do not lose generality, because we just change the values of the indexes of the tasks inτ. Assumption 1 For each task τ g ∈τ, hereafter we refer to its chain in execution at timeb q (withb q defined as in Lemma 20) as just its chain, and we denote by s g the start time of the chain. Second, we suppose thatτ = {τ 1 , τ 2 , . . . , τ G }, and s 1 ≤ s 2 ≤ . . . ≤ s G holds, i.e., that the indexes of the tasks inτ are ordered by the start times of the chains. In particular, for two consecutive tasks τ v and τ v+1 , both belonging toτ and such that their chain heads both start at the same time s v = s v+1 , we assume that the chain head of τ v has a deadline earlier than or equal to that of the chain head of τ v+1 . Finally, we also assume that the chain starting at time s g , with 1 ≤ g ≤ G, is executed on the g-th processor.
By Assumption 1 and item 2.c in Lemma 20, we have In Figure 5 the start times of the chains and the allocation of the chains on the processors do reflect Assumption 1. Note also that each chain head starts in a time interval during which all the processors are busy. This happens because, by item 2.b of Lemma 20, the heads of the chains are all blocked by priority.
Basing on Assumption 1, we can prove the following general property on the sum of the variations of the lags of the tasks inτ. This property comes in handy in a number of proofs.
⊓ ⊔ As we already said in the description of Step 0, we use the time-balance property stated in Lemma 6 to compute a lag-balance property that, in its turn, allows us to get a tighter upper bound to the total lag. The following lemma reports both the subset ϒ g that we use to obtain the lag-balance property from the time-balance property, and a time instant t max g that allows us to use Lemma 6 with this subset, i.e., such that the hypotheses of Lemma 6 hold with ϒ = ϒ g after setting t 1 = t max g . Since it may not be easy to visualize the job portions and the time instants considered in the lemma and in its proof, we show an example before proving the lemma.

Lemma 22
Given a task τ g ∈τ with g > 1, and defining:Ĵ g ⊆ J l g as the head of the chain of τ g , ϒ g ⊆ {τ 1 , τ 2 , . . . , τ g−1 } as any subset of the first g − 1 tasks inτ,b v as the beginning, for each task τ v ∈ ϒ g ∪ {τ g }, of the busy interval of the job portion of τ v in execution at time s g , and t max g ≡ max τ v ∈ϒ g ∪{τ g }bv , we have that: (1) t max Suppose that the chains of the tasks inτ in Lemma 22 are those shown in Figure 5, and that g = 3 and ϒ 3 = {τ 1 , τ 2 }. With these assumptions, the job portions of the tasks in ϒ 3 in execution at time s g = s 3 are the ones in execution on processors P1 and P2 at time s 3 .
Proof We start by proving item 1, i.e., thatb g ≤ t max g and t max g < min{s g , d l g } hold. The first inequality trivially holds by definition of t max g . As for the second inequality, we prove first that t max g < s g and then that t max g < d l g . Regarding t max g < s g , since the heads of the chains of the tasks inτ are all blocked by priority, we have thatb v <ŝ v for all τ v in ϒ g ∪ {τ g }. Therefore, by definition of t max g , t max g = max τ v ∈ϒ g ∪{τ g }bv < max τ v ∈ϒ g ∪{τ g }ŝv = s g , where the last equality follows from Assumption 1.
Given a task τ v ∈ ϒ g , letĴ u be the portion of the task in execution at time s g , and letŝ u andd u be the start time and the deadline ofĴ u . Figure 5 shows bothĴ u andŝ u assuming g = 3 and v = 2. SinceĴ g is blocked by priority at least right before time s g (by item 2.b of Lemma 20), butĴ u is already in execution by time s g (time s 3 in Figure 5), we have thatd u ≤ d l g must hold, by Lemma 1, forĴ g not to preemptĴ u before time s g . Now letĴ v be the head of the chainĴ u belongs to, namely the portion of τ 2 starting at time s 2 in Figure 5. Ifd v is the deadline ofĴ v , then we also have that d v ≤d u (because eitherĴ u =Ĵ v orĴ v belongs to a previous job, of τ v , than thatĴ u belongs to). Combining this inequality withd u ≤ d l g , we getd v ≤ d l g . Since the last inequality holds for any task τ v ∈ ϒ g , it follows that, denoting byd v the deadline of the head of the chain of each task τ v ∈ ϒ g in execution at time s g , and definingd g ≡ d l g , we have ∀τ v ∈ ϒ g ∪ {τ g }d v ≤ d l g . Considering also Lemma 8, this implies that ∀τ v ∈ ϒ g ∪ {τ g },b v <d v ≤ d l g . As a consequence, by definition of t max g , t max g = max τ v ∈ϒ g ∪{τ g }bv < d l g . We prove now item 2, i.e., that every task τ v in ϒ g belongs to τ MPS (Ĵ g ,t max g ). Consider the chain of a task τ v ∈ ϒ g in execution at time s g and the deadlined u of the above-defined portionĴ u in execution at time s g . As shown in Figure 5 for g = 3 and v = 2, the set of the portions of the chain of τ v executed during [s v , s g ] ([s 2 , s 3 ] in the figure) consists ofĴ u , plus possible other portions executed beforeĴ u (ifĴ u is not the head of the chain). The latter portions therefore belong to previous jobs of τ v , with respect to the jobĴ u belongs to. As a consequence, since we already proved that d u ≤ d l g , all these portions have a deadline earlier than or equal to d l g . Thanks to the above property, and by definition of τ MPS (Ĵ g ,t) (Definition 3), to prove item 2 we prove that at least one of these portions is pending in the MPS at time t max g . To this purpose, suppose first that t max g < s v (i.e., t max g < s 2 in Figure 5). In this case we can consider that, by definition, t max g ≥b v , and, by Lemma 8,b v is at least equal to the release time of the jobĴ v belongs to. As a consequence, the headĴ v is already pending (although still blocked by priority) at time t max g . As for the other alternative, namely t max g ≥ s v , we already proved, at the beginning of this proof, that t max g < s g also holds. In the end, t max g ∈ [s v , s g ) (namely, t max g ∈ [s 2 , s 3 ) in Figure 5), i.e., one of the portions of τ v executed during [s v , s g ] is executing, and thus pending (according to our definition of pending), at time t max g . Thus, in both alternatives, a portion with a deadline earlier than or equal to d l g is pending at time t max g .

⊓ ⊔
To further simplify proofs, we make also the following last assumption in the lemmas by which we compute an upper bound to the total lag. Assumption 2 Given any job portionĴ ⊆ J j i blocked by priority, and: (a) assuming that the k-th non-growing Lag interval is the last non-growing Lag interval for that portion, (b) making Assumption 1, and (c) denoting, for every task τ g inτ, by J l g the job to which the head of the chain of τ g starting at time s g belongs, we assume that l l g = L g holds.
This assumption does not affect the correctness of the proof of the harmonic bound. When needed, we use the following lemma to prove this fact.
Lemma 23 Suppose that the k-th non-growing-Lag interval is a non-growing-Lag interval for a job portionĴ ⊆ J j i blocked by priority, and consider any task τ g ∈τ. LetĴ g ∈ J l g be the head of the chain of τ g starting at time s g . If the length of J l g is strictly lower than the maximum possible job length for the task, i.e., l j g < L g , then the start timeŝ of the portionĴ does not decrease if the job J l g is replaced with a job with a length equal to L g (without changing any other parameter).
Proof Let J g be the long job with which the shorter job J l g is replaced. For brevity, we call true scenario the original scenario, and artificial scenario the one in which J l g is replaced with J g . Since this replacement does not change either the deadline of any job, or the length of any job but J l g , it follows that, by Lemma 1, the schedule of all the jobs in the MPS during [0, s g ] is exactly the same in both scenarios.
Consider then the time interval [s g , b k ). By item 2.a of Lemma 20 (and as shown in Figure 5 for every task in {τ 1 , τ 2 , τ 3 , τ 4 }), in both scenarios the g-th processor is constantly busy serving task τ g during [s g , b k ). In addition, in both scenarios, all the parameters of all the jobs of all the tasks but τ g are the same. Thus, during [s g , b k ), the schedule of all the jobs of all the tasks but τ g is exactly the same in both scenarios. In particular, this implies that the value of b k is the same in both scenarios, because, by Lemma 16, the k-th non-growing-Lag period starts when at least a new task enters τ MPS (Ĵ,t), and τ g already belongs to τ MPS (Ĵ, b k ) by item 2.a of Lemma 20.
As a last step, we analyze what happens during [b k ,ŝ]. To this purpose, we define, firstly, as f g the time instant at which the chain of τ g terminates. Since the parameters of all the jobs but J l g are the same in both scenarios, whereas the headĴ g is longer in the artificial scenario (because J g is longer than J l g and the schedule of the jobs of τ g is identical in both scenarios up to time s g ), it follows that the value of f g in the artificial scenario cannot be lower than in the true scenario. Consider for example the chain of τ 1 in Figure 5, and visualize what happens after increasing the size of the portion of τ 1 that starts at time s 1 .
We consider now two alternatives for the true scenario. The first is that f g ≥ŝ. This is what happens, e.g., with the chain of τ 2 in Figure 5. As highlighted by the figure, sinceĴ is blocked by priority, and hence τ i cannot be in service right before timeŝ, τ g = τ i holds. Moreover, recall that, in the artificial scenario, f g cannot be lower than in the true scenario. Finally, the parameters of all the jobs but J l g are the same in both scenario. In the end, during [b k ,ŝ] the schedule of all the job portions of all the tasks but τ g is exactly the same in both scenarios. Thereforeŝ has the same value in both scenarios.
Finally, consider the second alternative, i.e., f g <ŝ. This is what happens in Figure 5 for all the chains except for that of τ 2 . According to the above arguments, the amount of service received by every task by time b k is exactly the same in both scenarios. In addition, in the artificial scenario the value of f g cannot be lower than in the true scenario. Considering also that the parameters of all the jobs but J l g , which becomes longer, are the same in both scenarios, it follows that, in the artificial scenario, the MPS cannot have less work to do during [b k ,ŝ] than in the true scenario. To visualize this fact, suppose that τ g = τ 1 in Figure 5, and imagine that, in the artificial scenario, the chain of τ 1 terminates at the same time as or later than in the true scenario. In the end, since the MPS works at constant total speed M during [b k ,ŝ] in both scenarios, in the artificial scenarioŝ cannot be lower than in the true scenario.

Proof of the harmonic bound
Using the machinery provided in Section 7, in this section we prove Theorem 1 by the steps outlined in Section 5 (and shown in Figure 3).

8.1
Step 0: prove the time-balance property Step 0 consists in proving Lemma 6 (Section 5). For the reader's convenience, we sum up the statement of the lemma before providing the proof. Given: (a) the busy interval [b,ŝ) for a portionĴ ⊆ J j i , (b) any time instant t 1 ∈ [b, min{d j i ,ŝ}), and (c) any subset ϒ ⊆ τ MPS (Ĵ,t 1 ) not containing τ i , including the empty set, we havê wherel is the size, in service units, of the part of the job J j i not yet executed in the MPS by timeŝ.
Proof (Lemma 6) We start from (35) and split both the work done by the MPS and the work done by DPS into two components: total service provided to the tasks in τ \ϒ , and total service provided to the tasks in ϒ . This split helps us let the time-balance property emerge. We have then the following relations, where the inclusion on the last equality follows from ϒ ⊆ τ MPS (Ĵ,t 1 ) ⊆ τ(Ĵ,t 1 ), The last two terms in the numerator of the last line of (61) enable us to let the time-balance property come into play. To this purpose, first we rewrite these terms as follows: Now, from ϒ ⊆ τ MPS (Ĵ,t 1 ) it follows that the MPS can have executed, by time t 1 and for any task in ϒ , only jobs with a deadline earlier than or equal to d j i . The same then holds also by timeŝ, because, by Definition 2, the MPS can execute only jobs with a deadline earlier than or equal to d j i during [t 1 ,ŝ). As a consequence, by Lemma 11, Replacing (63) in the last line of (62), we get deadline of J e i . Since the head is blocked by priority and, according to the final note in Section 2.2, is the last portion of the job J e i , the difference f e i − d e i can be upperbounded by repeating the same steps (and hence obtaining the same upper bound) as in (68). The thesis then follows from the fact that, by Lemma 3, Step 2: upper-bound the total lag Step 2 consists in proving Lemma 5. First we describe the strategy and the relations by which we prove the lemma, and then we report the main sub-steps by which we get to prove the lemma. We prove Lemma 5 by induction. Denoting, for brevity, by B the RHS of (33), Lemma 5 states that Λ (k) ≤ B holds for all k. In this respect, as we show in Section 8.7, Λ (1) ≤ B is trivial to prove for any B ≥ 0. Instead, regarding the inductive step, consider that, denoting byĴ k the portion for which the total lag at time b k is maximum, and byd k the deadline ofĴ k , we have By (69), the following implication holds: The consequent in (70) is exactly what has to be proved, as an inductive step, to prove by induction that Λ (k) ≤ B holds for all k. As a consequence, we can prove that Λ (k) ≤ B holds for all k, by proving, as a base case, that Λ (1) ≤ B, and, for the inductive step, the antecedent in (70), i.e., the implication To prove the implication (71), which holds for the job portionĴ k for which the total lag at time b k is maximum, we prove that the same implication holds for any job portionĴ ⊆ J j i , blocked by priority and for which the k-th non-growing-Lag interval is the last non-growing-Lag interval. In particular, we use the following relations to prove that, if the antecedent in (71) holds, then the consequent in (71) holds for the above generic portionĴ ⊆ J j i : where lag g (s g , b k ) = lag g (b k ) − lag g (s g ). In view of (72), we prove that ∑ h∈τ(Ĵ,b k ) lag h (b k ) ≤ B holds by, first, computing an upper bound to ∑ g∈τ lag g (s g ), and then adding ∑ g∈τ lag g (s g , b k ) to this bound. To get an upper bound to ∑ g∈τ lag g (s g ), we compute a bound to each lag in the summation, and then sum these per-task bounds. We can now state, even more precisely, that the peculiarity of our proofs lies in how we compute a bound to each lag in the sum ∑ g∈τ lag g (s g ).
We take the following four sub-steps to get to prove Lemma 5. The first three substeps are also shown in Figure 3, to further help understand their role in the proof. The bounds computed in the first three sub-steps are all in open form, in that they contain at least one term whose value has not yet been computed (Λ (k − 1)). To simplify proofs, in all the next lemmas we assume that Assumption 2 holds, which implies, in particular, that the head of each task τ g starting at time s g belongs to a job of length L g . The proofs of the lemmas are provided from Section 8.4 onwards.

Sub-step 2.1: Compute an upper bound in open form to lag g (s g ) for τ g ∈τ
As a first sub-step, we prove the following lemma, stating the lag-balance property.
Lemma 24 (Lag-balance property) If ϒ g ⊆ {τ 1 , τ 2 , . . . , τ g−1 } is any subset of the first g − 1 tasks inτ, andl g is the size, in service units, of the headĴ g ⊆ J l g starting at time s g , then the following inequality holds for the g-th task: If ϒ g = {τ 1 , τ 2 , . . . , τ g−1 }, then the bound (73) to lag g (s g ) decreases as the sum of the first g − 1 addends in ∑ g∈τ lag g (s g ) increases (in this respect, recall Assumption 1 in Section 7.7 regarding the indexes of the tasks inτ). This internal counterbalance among the lags of the tasks inτ is the key property that allows us to get a tighter bound to the whole sum ∑ g∈τ lag g (s g ), and hence to the total lag by (72).

Sub-step 2.2: Compute an upper bound to ∑ g∈τ lag g (s g ) in open form
In this sub-step, we compute an upper bound to ∑ g∈τ lag g (s g ) using Lemma 24. To simplify the notation, in this and in the next subsections we use the following definitions and properties: Note that M g is a special case of the more general quantity M τ ′ g in (4). In addition, recall that we denote as G the number of tasks inτ (Assumption 1 in Section 7.7).

Lemma 25
The following inequality holds:

Sub-step 2.3: Compute an upper bound to the total lag in open form
As a penultimate sub-step, we compute an upper bound to ∑ h∈τ (Ĵ,b k (72), and, in particular, by adding ∑ g∈τ lag g (s g , b k ) to the bound (77).

Sub-step 2.4: Compute an upper bound to the total lag and to the tardiness in closed form
As a last sub-step, we prove first Lemma 5, i.e., that Λ (k) ≤ B holds for all k. Then we prove Theorem 1 by just replacing the bound Λ (k) ≤ B in (31).
8.4 Sub-step 2.1: Compute an upper bound in open form to lag g (s g ) for τ g ∈τ For the reader's convenience, we repeat the statement of Lemma 24 here. If ϒ g ⊆ {τ 1 , τ 2 , . . . , τ g−1 } is any subset of the first g − 1 tasks inτ, andl g is the size, in service units, of the headĴ g ⊆ J l g starting at time s g , then the following inequality holds for the g-th task: Proof (Lemma 24) First, we derive a bound to s g − d l g , and then turn this bound into (79). Definingb g as the beginning of the busy interval ofĴ g and t max g as in Lemma 22, we have, by Lemma 22, that t max g ∈ [b g , min{s g , d l g }) and ϒ g ⊆ τ MPS (Ĵ g ,t max g ). This implies that Lemma 6 holds forĴ g after settingĴ =Ĵ g , J j i = J l g ,b =b g ,ŝ = s g , t 1 = t max g , ϒ = ϒ g andl =l g . We have then: where we used the general form (22) for the two lag functions. In fact, in both expressions the reference deadline is d l g , and not d j i as in (23). As a first step to turn (80) into (79), we prove that the first sum of lags in the RHS of (80) is upper-bounded by Λ (k − 1). Let [b g , s g ) and [b p , f p ) be, respectively, the busy interval and the last non-growing-Lag interval forĴ g . Since t max g ∈ [b g , s g ), we have also that Lemma 19 holds forĴ g and t max g after settingĴ =Ĵ g ,b =b g ,ŝ = s g , t 1 = t max g and b k = b p . Replacing these equalities in (57) and writing explicitly the first argument d l g in the lag functions, we get We prove now that p ≤ k − 1, from which we derive that the RHS of (81) is upper-bounded by Λ (k − 1). By item 2.a of Lemma 20, s g ≤b q holds, whereb q is the beginning of the growing-Lag interval that terminates at time b k (see Figure 5). Sinceb q < b k , it follows that s g < b k holds too. But, by Definition 2,b g < s g , and, still by Lemma 19, b p ≤b g . Combining the last three inequalities, we have b p < b k , which implies p ≤ k − 1 by the ordering among busy intervals (Definition 10). See for example Figure 5, showing a possible b p in case g = 3. We have As for the other sum, ∑ v∈ϒ g lag v d l g (s g ), recall that ϒ g is a subset ofτ by Assumption 1. Using the deadlines d l g and d j i of J l g and J j i (J j i is the reference job in the definition ofτ), we define m(v) ≡ max d n v ≤d j g n and m ′ (v) ≡ max d n v ≤d j i n, i.e., we define m(v) and m ′ (v) as the indexes of the latest-deadline jobs of τ v , among those with a deadline earlier than or equal to, respectively, d l g and d j i . Since d l g ≤ d j i holds by Definition 3 and item 2.a of Lemma 20, it follows that m(v) ≤ m ′ (v) holds for all v in ϒ g . Using this inequality, we can write Using all the inequalities proved so far, we get We can now compute our bound to lag g (s g ) from (84). Considering that: 1) by Assumption 2, we replace J l g with a job of length L g if l l g < L g , 2) τ g belongs to τ MPS (Ĵ, s g ) by item 2.a of Lemma 20, and 3) s g > r l g by item 2.b of Lemma 20, we have thatĴ g satisfies the conditions of Lemma 13 after setting h = g. As a consequence, To prove the upper bound to ∑ g∈τ lag g (s g ) stated in Lemma 25, we use the following (algebraic) properties of the coefficients M g .

Lemma 27
The following two relations holds: In particular, the last inequality implies, after setting g = Q, As we already said, we prove this bound by adding ∑ g∈τ lag g (s g , b k ) to the bound (77). As can be seen, the only difference between the RHS of (99) and the RHS of (77) is the absence of the term −M G+1 ∑ G g=1 U g ∆ g M g M g+1 in the RHS of (99). To prove that this term cancels from the RHS of (77) after adding ∑ g∈τ lag g (s g , b k ), we use the following property.

Lemma 28
The following inequality holds: The above inequality holds for purely algebraic reasons, and its proof is relatively long. We report this proof in the appendix to not break the main flow. We can now prove Lemma 26.

Proof (Lemma 26)
We prove the thesis in two steps. First, we substitute, in (72), the sum ∑ g∈τ lag g (s g ) with the RHS of (77), and the sum ∑ g∈τ lag g (s g , b k ) with the following upper bound. Second, we prove that the sum of the RHS of (77) and the following upper bound is smaller than or equal to the RHS of (99). As for the upper bound to ∑ g∈τ lag g (s g , b k ), we can write the following relations: Using (101), we can write (102) ⊓ ⊔ 8.7 Sub-step 2.4: Compute an upper bound to the total lag and to the tardiness in closed form In this last sub-step we prove, firstly, Lemma 5, i.e., that Λ (k) ≤ B holds for all k, with B equal to the RHS of (33). To achieve this goal, we start by proving (71), i.e., the implication that lets the inductive step (Λ (k −1) ≤ B ⇒ Λ (k) ≤ B) hold. In particular, as we already said, we prove (71) by focusing on a generic job portionĴ ⊆ J j i for which the k-th non-growing-Lag interval is the last non-growing-Lag interval. In this respect, the problem with computing a quantity for which (71) holds (after replacinĝ J k withĴ) is that, according to (78) and depending on the value of Λ (k − 1), the sum ∑ h∈τ(Ĵ,b k ) lag h (b k ) may be higher than Λ (k − 1). Fortunately, as we prove in the next lemma, and still according to (78), there exists a saturation value, which we denote as Λ , such that Therefore, intuitively, if Λ (k) reaches or overcomes the threshold Λ , then it cannot grow any more with k, until it becomes again lower than or equal to Λ . This is, informally, the property we use to prove (71). In the next lemma we prove that such a threshold Λ is equal to the quantity already denoted by Λ in Lemma 5.
Lemma 29 If Λ is defined as in (32), then (103) holds for all k > 1 and for every portionĴ for which [b k , f k ) is the last non-growing-Lag interval.
We can then continue our derivations as follows:

⊓ ⊔
Using the previous lemma, we can finally prove Lemma 5.
Proof (Lemma 5) We prove the thesis, i.e., that Λ (k) ≤ B for all k, by induction. Since B differs from the RHS of (7) only in that C v ′ is replaced by L v ′ and Γ is replaced by Λ , we have that B ≥ 0 follows from exactly the same steps as the proof of Lemma 2, after substituting L v ′ for C v ′ and Λ for Γ .
Base case: k = 1. According to (25), Inductive step: we prove that, if Λ (k − 1) ≤ B holds, then Λ (k) ≤ B holds. We consider two alternatives. First, B < Λ , with Λ defined as in Lemma 29. In this case, considering also the inductive hypothesis, we have Λ (k − 1) ≤ B < Λ , and hence Λ (k − 1) < Λ . Then we have the following sequence of relations, where the last inequality follows from the fact that, according to the second inequality in (58), i.e., |τ| ≤ ⌈U sum ⌉ − 1, plus Definition 1 and (4), the expression in the LHS of the last line yields the maximum possible value for the expression in the penultimate line: The thesis, i.e., Λ (k) ≤ B, follows from replacing the bounds Λ (k −1) ≤ B (which holds by the inductive hypothesis) and (107) in the first and in the second argument of the last max function in (69); that is, restarting from that last max function, The second case is that B ≥ Λ . For this case we consider two further alternatives: Λ (k − 1) < Λ and Λ (k − 1) ≥ Λ . For the first alternative, the thesis follows from exactly the same steps as for the first case above. For the second alternative, i.e., if and Similarly to the first case, the thesis, i.e. Λ (k) ≤ B, follows from replacing Λ (k − 1) ≤ B (which holds by the inductive hypothesis) and (109) in (69).

⊓ ⊔
We can now easily prove the harmonic bound.
Proof (Theorem 1) Substituting (33) in (31), we get with the assumption that the denominator in the first fraction, namely 1 M , is a speed. Moving from this assumption to the assumption that this denominator is a pure number, we have to replace L g ′ with C g ′ , and Λ with Γ in (110). This yields (7) according to (6).
To complete the proof, we note that we obtained (110) from (33) and (31), and we proved (33) and (31) using Assumption 2. Nevertheless, having derived (7) from (110) is enough for the bound (7) to hold also if Assumption 2 does not hold. In fact, denoting byŝ the start time of the last portion of J j i , by Lemma 23 we have that the maximum possible value forŝ, in the case Assumption 2 does not hold, is not higher than the maximum possible value forŝ in the case Assumption 2 holds. The same property therefore holds also for the maximum possible value of f j i . Thus, if the difference f j i − d j i is at most equal to the RHS of (7) under Assumption 2, then this difference is at most equal to the same quantity also if Assumption 2 does not hold.

Experiments
In this section we compare the tightness of the harmonic bound with that of existing bounds for implicit-deadline tasks. To obtain the results reported in this section, we simulated the execution of random task sets, generated according to the distributions of utilizations and periods considered in previous work about tardiness or lateness (e.g., Erickson and Anderson (2012); Ward et al (2013); Erickson et al (2014)). In the next paragraphs we describe how we generated the task sets, simulated their execution and measured tightness. Then we report our results. Both the code used in the experiments and our full results can be found in (Experiment-scripts (2014) 9 ). Systems and task sets. We generated task sets for systems with two to eight processors, as eight has been inferred as the largest number of processors for which G-EDF is an effective solution to provide SRT guarantees (Bastoni et al (2010)). In particular, for each number of processors M, we considered task sets with total utilizations U sum in the range [M/2, M], increasing in steps of M/10. Regarding individual task utilizations, we considered three uniform and three bimodal distributions. As for uniform distributions, we considered a light, a medium and a heavy one, with task utilizations distributed in, respectively, in [0.001, 0.1], [0.01, 0.99] and [0.5, 0.99]. Instead, in the three bimodal distributions, task utilizations were uniformly distributed in either [0.01, 0.5] or [0.5, 0.99], with probabilities, respectively, 8/9 and 1/9, 6/9 and 3/9, and 4/9 and 5/9. We call, respectively, light, medium and heavy these three bimodal distributions. Finally, we considered three possible uniform distributions for task periods, denoted as short, moderate and long, and in the ranges [3ms, 33ms], [10ms, 100ms] and [50ms, 250ms].
We generated 1000 sets of implicit-deadline periodic tasks for every: number of processors M in [2,8], total utilization U sum in [M/2, M], combination of distributions of task utilizations and periods. For brevity, hereafter we denote as just group of task sets, each group of 1000 task sets generated with the same combination of these parameters.
Simulation. We simulated the execution of the task sets using RTSIM (2011). We let each simulation last for the maximum duration supported by RTSIM, which happened to be, for any task set, at least 8K times as long as the longest task period.
Tardiness Bounds. We considered the harmonic bound (HARM) and the three tardiness bounds for G-EDF computed, respectively, with the analysis of Devi and Anderson (2008) (DA), the CVA analysis using the PP Reduction Rule (CVA), and the CVA analysis using the alternative optimization rule proved by Devi and Anderson (2008), as defined by Erickson et al (2010) (CVA2). We calculated the values of the latter three bounds using SchedCAT (2014).
Tightness index and normalized error. Given: a tardiness bound, a simulation run for a task set, and a task in the set, we define as observed tightness index of the bound for that task, the ratio between the value of the bound for that task and the maximum tardiness that that task experiences in the run. We use this index as a tightness measure in our results, because this index is unbiased with respect to a change of the time scale, i.e., the value of the index does not change if the execution time, the period and the arrival times of the jobs of every task are all multiplied by a common factor. This invariant does not hold, for example, for the difference between the value of a bound and the maximum observed tardiness.
A problem of the observed tightness index is however that it is well-defined only for tasks that experience a non-null tardiness. Fortunately, there happened to be a clear distinction among groups of task sets in terms of experienced tardiness: for each group of task sets, either all the tasks of all the task sets happened to experience a non-null tardiness, or almost all the tasks of all the task sets experienced a null tardiness. In particular, as we highlight when discussing results, in the second case all bounds happened to be remarkably loose.
To measure the tightness of the bounds also in the second case, we introduce a second measure of tightness too. This second metric may be less robust with respect to scaling issues than the observed tightness index, but is well-defined also for tasks that experience a null tardiness. We call this metric observed normalized error, and we define it as the difference between the value of the bound and the actual maximum tardiness experienced by the task, divided by the period of the task. The idea behind this normalization is that the period of a task, especially with implicit deadlines, represents somehow a reference time interval for understanding how tolerable a given error on the tardiness is for that task. In other words, a given absolute error on the tardiness of a task is likely to be less relevant for a task with a large period than for a task with a short period. The purpose of the normalization is of course also to try to offset the above-discussed time-scale problems.
Measures and statistics. At the end of each simulation run we computed, for each bound and for each task, the observed tightness index and the normalized error of the bound for that task. Then, for each bound and each group of task sets, we computed the minimum and the average values of the observed tightness indexes and of the normalized errors for that bound, over all the tasks of all the task sets in the group. For brevity, hereafter we call these four quantities the minimum and the average tightness index, and the minimum and the average normalized error for that group of task sets. We also computed the 95% confidence interval for the average tightness index and the average normalized error. For both average values, and over all the groups of task sets, the confidence interval was never above 15% of the average, and, for most groups of task sets, it was below 5%. For this reason, to reduce clutter we do not show also confidence intervals in the next figures.
For both the minimum and the average tightness indexes, it is worth noting that lower values than the ideal value, one, are of course not allowed. In this respect, according to the definition of tightness reported in the introduction, the closer the minimum tightness index of a bound B is to one for at least one of the groups of task sets generated for a given number of processors, the closer the bound is to being tight for that number of processors. In view of these facts, given two bounds B1 and B2 with minimum tightness indexes I B1 and I B2 for a group of task sets G, we measure how tighter B1 is than B2 as a function of how closer I B1 is to one, and not to zero, with respect to I B2 . In formulas, and assuming that I B2 > 1 holds (otherwise it is enough to swap B1 and B2 and reverse the statement), we say that B1 is x% tighter or x% looser than B2 for G, if x = 100 * |I B2 − I B1 |/(I B2 − 1), and I B2 < I B1 (tighter) or I B1 < I B2 (looser).

Results
Full total utilization and non-light per-task utilizations. First we focus on task sets with a total utilization equal to M, and on all distributions of utilizations except for light uniform distributions. Figures 6, 7, 8 and 9 show the minimum and the average tightness indexes of the four bounds for four representative cases in this first subset of groups of tasks. Specifically, these figures show the performance of the bounds for four different combinations of medium or heavy utilizations, and of short or long periods. The figures highlight the following general results, which hold not only for the combinations of parameters considered in the figures, but, actually, for all the groups of tasks in this subset (full results can be found in (Valente (2014)): -The relative order among both the minimum and the average tightness indexes of the bounds is the same for every value of M, apart from M = 3, for which the average tightness index of HARM is slightly higher than that of CVA2. This happens also with some other groups of task sets (see (Valente (2014))). -In accordance with the experimental results available in the literature (Ward et al (2013)), the minimum and average tightness indexes of DA are substantially larger than those of CVA and CVA2. In particular, DA is up to 40% looser than CVA2 ( Figure 6). -In all cases, both the minimum and the average tightness indexes of HARM are significantly lower than the corresponding tightness indexes of DA. In particular, HARM is up to 50% tighter than DA ( Figure 6). This gap highlights the effectiveness of the lag-balance property, and is the reason behind the following positive results. -Both the minimum and the average tightness indexes of HARM are always at least as low as those of the second best-performing bound, namely CVA2, apart from the above-mentioned cases, with M = 3, where the average tightness index of HARM is slightly higher than that of CVA2. -HARM outperforms CVA2 from M = 4 on. In particular, the minimum and average tightness indexes of HARM become lower and lower than those of CVA2 as M increases, apart from some occasional fluctuation (i.e., the slope of HARM is smaller than the slope of CVA2). As a general consideration, both average and minimum tightness indexes do not always grow with M ( Figure 6). In contrast, occasional fluctuations affect a few groups of task sets, as can be seen in the extended version of this paper (Valente (2014)). The reason is apparently just that luckier scenarios occur occasionally for some bounds (this issue may deserve further, non-trivial investigations).
Light per-task utilizations. Things change dramatically with uniform light utilizations. Fortunately, the tightness of tardiness bounds may not be very relevant in these cases, as we discuss after showing the performance of the bounds. Figure 10.a reports the minimum tightness index for a representative case for uniform light utilizations and total utilization equal to M (we comment on the average tightness index and Figure 10.b in a moment). First, with uniform light utilizations, the generated task sets quickly become very large as the number of processors increases. Because of this fact, the figure reports results only for M ≤ 6, as with higher values of M it was unfeasible to compute the value of the harmonic bound for all the tasks. Regardless of this limitation, the figure clearly shows that all bounds become very loose if all task utilizations are light. Fluctuations become very large too. We cannot show the average tightness index because it is infinite for all bounds and values of M. In fact, in all runs, almost every task experiences a null tardiness, whereas, for every task, all bounds happen to be at least in the order of the overall maximum execution time for the task set. This result differs substantially with respect to the previous cases, where in all runs every task happens to experience a non-null tardiness. To show the average performance of the bounds also with uniform light utilizations, we resort to the average normalized error in Figure 10.b. The bounds are quite loose also on average: as M increases, the average error ranges from 1/4 to almost 1/2 of the period.
Fortunately, the case of light utilizations is exactly one of those for which tardiness bounds may not be very relevant, for the following two reasons. First, defining U max ≡ max τ i ∈τ U i , Goossens et al (2003) proved that G-EDF meets all deadlines for every implicit-deadline task set for which the total utilization U sum satisfies the following inequality: This implies that, if U max is very small, as it is the case with uniform light utilizations, then it is enough to keep the total utilization slightly below M to make sure that all deadlines are met with G-EDF.
Secondly, if partitioned EDF can be used instead of G-EDF (and if all utilizations are light), then all deadlines can be met at the price of an even lower loss of total utilization than with G-EDF. In more detail, the scenario in question is when there is no hindrance to partitioning tasks among processors, and scheduling each per-processor subset of tasks with EDF. In fact, given a generic task set with U sum ≤ M, consider the sum, say S, of the utilizations of the tasks that may fail to be accommodated in a feasible partitioning. If all task utilizations are very low, then the sum S can be equal at most to a very low fraction of the total utilization of the task set (as for the worst possible case, S ≤ U max · (M − 1) holds).
In addition, a partitioned scheme has a reduced overhead with respect to a global one. This increases the actual total utilization achievable. Since all tasks have a very low utilization, this increase may easily offset the above loss S. In the end, when all task utilizations are low, it may be possible to meet all deadlines with no or negligible loss in terms of total utilization.
Total utilization lower than M. Similar tightness problems occur, with all distributions of utilizations, if the total utilization U sum is lower than M. Figure 11.a shows, e.g., the minimum tightness index for the same distributions of utilizations and periods as in Figure 7, but with U sum = 0.9 · M. As can be seen, after decreasing U sum by just 0.1 · M, the minimum tightness index becomes much higher than with U sum = M. In particular, all the bounds are quite loose for M > 5. As with uniform light utilizations, the average tightness index is infinite for all groups of task sets, bounds, and values of M. Then, also in this case, we show the average performance of the bounds through the average normalized error in Figure 11.b. Now the situation is much worse than with uniform light utilization, because the average normalized error ranges from about 0.7 to more than 3.
Although all bounds quickly become remarkably loose as M grows, it is worth noting that the harmonic bound outperforms the other bounds more and more, in terms of both minimum tightness index and average normalized error, as M increases beyond 5. In particular, the harmonic bound is the only one to preserve a minimum tightness index at most equal to 2.5, and an average normalized error at most equal to 2.1.
Going down to U sum = 0.8 · M, both the minimum tightness index and the average normalized error have again high values, as shown in Figure 12 (the average tightness index is of course again infinite). They also fluctuate more with M. Before commenting on the relative performance of the harmonic bound, we highlight that the situation becomes quite critical with M = 2, because the actual tardiness experienced by the tasks tends to be very small.
In this respect, if we further reduce the total utilization to 0.7 · M, then even the minimum tightness index becomes infinite for all bounds and groups of task sets with M = 2, except for CVA. As a consequence, to show the performance of the bounds  for U sum = 0.7, in Figure 13.a we report the minimum normalized error instead of the minimum tightness index. The index is negative for CVA with M = 2, because CVA is actually a lateness bound, and does take negative values for some task for M = 2.
Finally, Figure 13.b reports the average normalized error for U sum = 0.7, and highlights that also in this case, as expected, the bounds are definitely loose. Regarding the trend of the average normalized errors, the latter decrease moving from Figure 11.b to Figure 13.b. The reason is simply that the values of all the bounds do decrease as U sum decreases, while the tardiness of the tasks remains unchanged, i.e., equal to 0.
Although the actual problem is that all bounds are very loose, we can also point out that the performance of the harmonic bound degrades slightly more than that of the other bounds as U sum decreases, and that the harmonic bound is no longer the best-performing one with U sum ≤ 0.8. This result complies with the fact that, as we already highlighted in the comments after the proof of Lemma 10 in Section 7.2, we computed the harmonic bound in a simplified way. On the flip side, this simplification lets the bound become explicitly looser as the ratio U sum /M decreases.
With U sum = 0.6 · M, the performance of the bounds is about the same as for the case U sum = 0.7 · M. We do not show results also for U sum = 0.6 · M and below, for similar reasons as for the above case of light utilizations. First, the probability that a generic task set meets (111) is not negligible with U sum = 0.6 · M (unless the task set contains tasks with a very high utilization), and this probability increases as U sum decreases. In addition, and probably even more relevant, there are partitioned scheduling algorithms, including variants of EDF itself, with which a task set with U sum = 0.6 · M is very likely to be schedulable, while all task sets with U sum ≤ 0.5 · M are schedulable (Davis and Burns (2011)).
To sum up, tardiness bounds, and thus their tightness, may be little relevant for U sum ≤ 0.6·M. In contrast, there is a band of total utilizations of interest, ranging from about 0.7 · M to about 0.9 · M, for which all bounds happen to be remarkably loose. Fortunately, the harmonic bound has room for improvement for this band of total utilizations, because, as we already pointed out above, it is currently computed in a simplified way that lets it become looser and looser and the ratio U sum /M decreases.

Conclusion and future work
In this paper we showed how to compute a new tardiness bound for preemptive global EDF and implicit-deadline tasks, by integrating a lag-balance property, enjoyed by any work-conserving scheduling algorithm, with the approach used to compute one of the first tardiness bounds for G-EDF (Devi and Anderson (2008)). According to our experiments, the new bound, which we tagged as harmonic, is up to 50% tighter than the original bound obtained through the same approach (the maximum improvement is reached with M = 8 and a total utilization equal to M). As a consequence of this improvement, in spite of the fact that the original bound results to be, in the worstcase, 40% looser than the bounds proposed in the intervening years, the harmonic bound is up to 29% tighter than the best available bound.
Such a result may open new ways for obtaining tighter response-time or utilization bounds, with existing or new scheduling algorithms. As next steps, we plan to generalize the harmonic bound to consider also non-preemptive global EDF and task sets with a lower total utilization than the system capacity. We also want to investigate more efficient algorithms for computing, or at least approximating the bound. In fact, the brute-force algorithm reported in this paper has an exponential running time, although it has proved to be feasible for all the task sets considered in the experiments, except for some of the cases where tardiness bounds are probably not very relevant.
Finally, in this paper we also highlighted a general negative result: with light distributions, as well as with total utilizations lower than the total system capacity, all bounds proved to be quite loose.