CMSC 27200 — Lecture 12

Suppose we have a list of data, as integers, that represents daily change: that is, each entry in the list is the change, either positive or negative, from the previous day. This is something you might be interested in if you have been looking obsessively at daily tracking numbers or something recently (perhaps for root vegetable prices or something of the sort).

One simple strategy would be to look for the highest or lowest number and go forwards or backwards to find the biggest difference. This would be either Day 1 going backwards or Day 4 going forward. But we observe that the biggest positive difference in change occurs from Day 2 to 3.

Day	0	1	2	3	4	5
Actual number	10	11	7	10	6	7
Change	(Today)	1	–4	3	–4	1

We are looking instead for the period of maximum positive change. Intuitively, this is the contiguous portion of the list that has the largest sum, or the maximum sum of all subarrays.

Given an array $A$ of $n$ integers, find a contiguous subarray $A[i:j]$ with the maximum sum. That is, $\max\limits_{1 \leq i \lt j \leq n} \sum\limits^j_{k = i} A[k]$.

The simplest straightforward algorithm considers each subarray or range and computes the sum. Since there are $\binom n 2$, which is $O(n^2)$, such ranges and computing each range is $O(n)$, we get $O(n^3)$. The observation that we really don't need to compute all of those sums every time if we do a bit of precomputation eliminates the sum computation and brings us down to $O(n^2)$. Can we do better?

We can attempt to apply the same strategy to this problem as we did for Mergesort: simply break the problem in half! So we recursively find the maximum subarray in each half and choose the one with the larger sum. Easy-peasy.

There is one possibility missing here, though: what if the maximum subarray crosses the point at which we break the array up? Such an array wouldn't be picked up by our recursion. The obvious solution is to also consider this array, but this poses a problem: if we're not careful, this could end up being very expensive as well.

We can make our lives easier by making the observation that the fact that this particular extra subarray we're considering must cross the midpoint of the array. In effect, the midpoint acts as an anchor. We can then turn this into a problem of finding two subarrays, both anchored one one side by the midpoint.

Why does this help? We make the observation that $\sum\limits_{k = i}^m A[k] = A[i] + \sum\limits_{k=i+1}^m A[k]$. In other words, each shorter subarray sum is part of the subarray that contains it. If we keep track of these sums, we don't need to do a lot of repeated computation.

This means that all we need to do is find the maximum on both sides of the array and add them up to get the maximum sum that crosses the midpoint. We can use a subroutine to do this.

    \begin{algorithmic}
    \PROCEDURE{max-crossing}{$A$}
        \STATE $m = \left\lfloor \frac n 2 \right\rfloor$
        \STATE left-sum $ = -\infty$
        \STATE sum $ = 0$
        \FOR{$i$ from $m$ to 1}
            \STATE sum $=$ sum $+ A[i]$
            \STATE left-sum = max(sum, left-sum)
        \ENDFOR
        \STATE right-sum $ = -\infty$
        \STATE sum $ = 0$
        \FOR{$j$ from $m+1$ to $n$}
            \STATE sum $=$ sum $+ A[j]$
            \STATE right-sum = max(sum, right-sum)
        \ENDFOR
        \RETURN{left-sum $+$ right-sum}
    \ENDPROCEDURE
    \end{algorithmic}

We argue that the maximum sum of such an array is necessarily the sum of the largest arrays ending at index $m$ and beginning at index $m+1$. Let $\ell$ be the sum of $A[i:m]$, the subarray with the largest sum ending at $m$ and let $r$ be the sum of $A[m+1:j]$, the subarray with the largest sum starting at $m+1$.

Suppose that some other subarray $A[i':j']$ that crosses $m$ has the largest sum, and call this sum $S$. Without loss of generality, suppose the sum of $A[i':m]$, call it $\ell'$, is less than $\ell$. Then $S - \ell' + \ell \gt S$, which contradicts our assumption that $S$ was maximal. The same argument can be made for $A[m+1:j']$.

Now, let us show that Max-Crossing correctly computes the maximum subarray sum. We will show that the algorithm correctly computes the maximum subarray sums for $A[1:m]$ ending at $m$ and $A[m+1:n]$ starting at $m+1$. We will prove this by induction for the left half by induction on $i = m-k$—the argument is symmetric for the right half.

The base case is when $k = 0$, so $i = m$. In this case, $A[m:m] = A[m]$. The algorithm makes the assignment $\texttt{left-sum} = A[m]$.

Now, consider $k \gt 0$ and assume that $\texttt{left-sum}$ correctly stores the maximal sum for the array $A[k-1:m] = A[i+1:m]$. Now, we consider the sum of $A[k:m] = A[i:m]$, which is \[\texttt{sum} = \sum^m_{j = i} A[j] = A[i] + \sum^m_{j = i+1} A[j].\] By our inductive hypothesis, the maximum sum of $A[i+1:m]$ is $\texttt{left-sum}$. This gets changed only if $\texttt{sum} \gt \texttt{left-sum}$.

A symmetric argument holds for $A[m+1:j]$ and we can conclude that Max-Crossing correctly computes the maximum subarray sum that crosses $m$.

    \begin{algorithmic}
    \PROCEDURE{max-subarray}{$A$}
        \STATE Let $n = |A|$
        \IF{$n$ = 1}
            \RETURN{$A[1]$}
        \ELSE
            \STATE $m = \left\lfloor \frac n 2 \right\rfloor$
            \STATE $\ell = $ \CALL{max-subarray}{$A[1:m]$}
            \STATE $r = $ \CALL{max-subarray}{$A[m+1:n]$}
            \STATE $x = $ \CALL{max-crossing}{$A$}
            \RETURN{$\max\{\ell,r,x\}$}
        \ENDIF
    \ENDPROCEDURE
    \end{algorithmic}

Now, to prove that this actually works. In a way, since the algorithm is recursive, the proof is also fairly straightforward, because induction is about recursion. The only complication is when dealing with the "combining".

By (strong) induction on $n$, the length of $A$. For the base case, we have $n = 1$, in which case the sum of the array is the single element.

Now suppose $n \gt 1$ and assume that Max-Subarray correctly computes the maximum subarray for all arrays of length less than $n$. We observe that the maximum subarray sum is either contained in one of the halves or will be some subarray that crosses the middle of the array. We consider each possibility.

If the maximum subarray is in the left half of the array, by our inductive hypothesis, $\ell$, which is the result Max-Subarray($A[1:m]$), is the maximum subarray sum of the left half of the array. Since it is the largest subarray sum of $A$, $\ell \gt r,x$ and will be the result of $\max\{\ell,r,x\}$ as desired.
Similarly, if the maximum subarray is in the right half of the array, by our inductive hypothesis, $r$, which is the result Max-Subarray($A[m+1:n]$), is the maximum subarray sum of the right half of the array. Since it is the largest subarray sum of $A$, $r \gt \ell,x$ and will be the result of $\max\{\ell,r,x\}$ as desired.
Finally, if the maximum subarray crosses the midpoint, by Proposition 12.4, we have computed this array. The final line then correctly produces the maximum between the three arrays.

As usual, let $T(n)$ be the worst-case time complexity of Max-Subarray on an input of size $n$. When $n = 1$, the base case, we have $T(1) = O(1)$. Then in the recursive case, we compute the midpoint $m$, in $O(1)$ time. Since we split our array in half and call Max-Subarray on each half, we have $T(n/2)$ each for lines 7 and 8. Line 9 is a call to Max-Crossing.

We observe that Max-Crossing computes the midpoint in $O(1)$ time and performs two loops, in sequence. Both loops perform additions, comparisons, and assignments in each iteration. Since each loop iterates over half the array, in total they take $O(n)$ time. Thus, Max-Crossing takes $O(n)$ time.

Finally, we take the maximum of $\ell,r,x$, which can be done in $O(1)$ time. Adding everything up, we get the following recurrence. \[T(n) = \begin{cases} O(1) & \text{if $n=1$}, \\ 2T(n/2) + O(n) & \text{if $n \gt 1$}. \end{cases}\]

Since this is the same recurrence as Mergesort, as we proved before, $T(n)$ is $O(n \log n)$.

You may have noticed that this algorithm gives the sum of the subarray, but not the indices for the subarray, though it's not too hard to modify the algorithm so that it reports that also.

It turns out that this is not the best we can do—a linear time algorithm is possible and this algorithm has a hint for how to do it. These algorithms are discussed in an article by Jon Bentley from 1984.

CMSC 27200 — Lecture 12

Maximum subarray sum