CMSC 27200 — Lecture 6

Minimizing lateness

Because scheduling is such a general problem, we can come up with many different variants of the problem. Let's consider the following variant.

Suppose that instead of being given a start and finish time for jobs, we're simply given how long each job takes and a due date that we're trying to aim for. This would be like if you have four assignments due this week and you're really good at predicting how much time you'd need. Unfortunately, it may not be possible to meet all the due dates, but we should try our best to do so. What we would like is to schedule our jobs to minimize how late we are across all jobs.

The scheduling to minimize lateness problem is: Given a list of $n$ jobs $j_1, j_2, \dots, j_n$, where,

schedule all jobs to minimize maximum lateness $L = \max\limits_{1 \leq i \leq n} \ell(i)$.

Below, we have a list of jobs together with their deadlines and a possible arrangement of these jobs.

The lateness of jobs (which are late) are denoted in yellow. We take the largest of these to be the maximum lateness $L$ as defined in the problem.

Again, we have a number of metrics to choose from to attempt a greedy strategy.

  1. shortest processing time,
  2. earliest deadline,
  3. smallest slack ($d(i) - t(i)$)

The correct choice is strategy (2). Let $J = \{(t_n,d_n) \mid 1 \leq i \leq n\}$ be our list of jobs' processing times and deadlines. We have the following algorithm.

    \begin{algorithmic}
    \PROCEDURE{earliest-deadline}{J}
        \STATE Sort jobs by deadlines and renumber jobs so that $d_1 \leq d_2 \leq \cdots \leq d_n$.
        \STATE $t = 0$
        \FOR{$j$ from 1 to $n$}
            \STATE $s_j = t$
            \STATE $f_j = t+t_j$
            \STATE $t = t+t_j$
        \ENDFOR
    \RETURN{$\{(s_i,f_i) \mid 1 \leq i \leq n\}$}
    \ENDPROCEDURE
    \end{algorithmic}

This algorithm is due to Jackson in 1955, which makes it one of the earliest examples of a greedy algorithm for an optimization problem. Now, you might be wondering what the heck they were trying to schedule back in 1955, and the answer is factory production lines.

Much like the earliest finish time algorithm, we can get a pretty simple running time analysis for the earliest deadline algorithm.

The earliest deadline first algorithm has running time $O(n \log n)$.

Our first step is sorting the deadlines, which takes $O(n \log n)$ time. Then we have a for loop which goes through each job and computes the start and finish times. These operations take $O(1)$ time, so the loop has running time $O(n)$ in total. So the algorithm is dominated by the sorting step, giving it $O(n \log n)$ running time.

One of the things we might notice about this algorithm is that there's no downtime between jobs. In fact, it's not too hard to convince yourself that you can take any optimal schedule that minimizes lateness and just get rid of any idle time in it and you would still have a schedule that minimizes lateness.

What this means is that we can just take every schedule and jam all the jobs together and to get a different schedule, we just start swapping jobs.

Consider our earliest-deadline schedule that is created by our algorithm. The jobs have been renumbered so that they are listed in order from 1 through $n$.

Given a schedule $S$, an inversion is a pair of jobs $i,j$ such that

  1. $i \lt j$, and
  2. $j$ is scheduled before $i$.

The idea of an inversion is more general than just schedules; we can apply it to permutations of lists. If we have a list $1, \dots, n$, then an inversion in a permutation $\pi$ is a pair $i,j$ with $i \lt j$ and $j$ appearing before $i$ in the permutation.

Here, we would like to consider inversions in schedules with respect to our earliest deadline schedule, which is the only idle-free schedule that contains no inversions.

Suppose an idle-free schedule has an inversion. Then it contains two adjacent inverted jobs.

Let $(i,j)$ be an inversion that is closest together and let $k$ be the element to the right of $j$. If $j \gt k$, then $(j,k)$ is an inversion that is closer. If $j \lt k$, then $(i,k)$ is an inversion that is closer together, since $i \lt j \lt k$.

Exchanging two adjacent inverted jobs $i$ and $j$ reduces the number of inversions by 1 and does not increase maximum lateness.

We denote $\ell(i)$ to be the lateness for job $i$ in the schedule with the inversion and $\ell'(i)$ the lateness in the schedule after swapping the inversion.

For every job $k \neq i,j$, the lateness stays the same and for job $i$, we have $\ell'(i) \leq \ell(i)$, since $i$ gets moved to before $j$.

Now, consider job $j$. If $j$ is late, then \begin{align} \ell'(j) &= f'(j) - d(j) \\ &= f(i) - d(j) \\ &\leq f(i) - d(i) \\ &\leq \ell(i) \end{align} In other words, once swapped, while the lateness of $j$ can increase, it turns out that $j$ is no later than $i$ was before the swap. To get this calculation, we observe that once we swap $i$ and $j$, the finish time for $j$ after the swap, $f'(j)$, is the old finish time for $i$. Then to get the inequality, we recall that $i \lt j$ means that $d(i) \leq d(j)$. Ultimately, maximum lateness does not increase.