CMSC 27200 — Lecture 12

Maximum subarray sum

Let's recap the basic outline of a divide and conquer algorithm.

  1. Base case.
  2. Recursive case.
    1. Divide the problem into subproblems.
    2. Recursively solve the subproblems.
    3. Combine the solutions of the subproblems.

Let's look at another relatively simple problem to reinforce this.

Suppose we have a list of data, as integers, that represents daily change: that is, each entry in the list is the change, either positive or negative, from the previous day. This is something you might be interested in if you have been looking obsessively at daily tracking numbers or something recently (perhaps for root vegetable prices or something of the sort).

Consider the following numbers from the future.

Day 0 1 2 3 4 5
Actual number 10 11 7 10 6 7
Change (Today) 1 –4 3 –4 1

One simple strategy would be to look for the highest or lowest number and go forwards or backwards to find the biggest difference. This would be either Day 1 going backwards or Day 4 going forward. But we observe that the biggest positive difference in change occurs from Day 2 to 3.

We are looking instead for the period of maximum positive change. Intuitively, this is the contiguous portion of the list that has the largest sum, or the maximum sum of all subarrays.

Given an array $A$ of $n$ integers, find a contiguous subarray $A[i:j]$ with the maximum sum. That is, $\max\limits_{1 \leq i \lt j \leq n} \sum\limits^j_{k = i} A[k]$.

The simplest straightforward algorithm considers each subarray or range and computes the sum. Since there are $\binom n 2$, which is $O(n^2)$, such ranges and computing each range is $O(n)$, we get $O(n^3)$. The observation that we really don't need to compute all of those sums every time if we do a bit of precomputation eliminates the sum computation and brings us down to $O(n^2)$. Can we do better?

We can attempt to apply the same strategy to this problem as we did for Mergesort: simply break the problem in half! So we recursively find the maximum subarray in each half and choose the one with the larger sum. Easy-peasy.

There is one possibility missing here, though: what if the maximum subarray crosses the point at which we break the array up? Such an array wouldn't be picked up by our recursion. The obvious solution is to also consider this array, but this poses a problem: if we're not careful, this could end up being very expensive as well.

We can make our lives easier by making the observation that the fact that this particular extra subarray we're considering must cross the midpoint of the array. In effect, the midpoint acts as an anchor. We can then turn this into a problem of finding two subarrays, both anchored one one side by the midpoint.

Why does this help? We make the observation that $\sum\limits_{k = i}^m A[k] = A[i] + \sum\limits_{k=i+1}^m A[k]$. In other words, each shorter subarray sum is part of the subarray that contains it. If we keep track of these sums, we don't need to do a lot of repeated computation.

This means that all we need to do is find the maximum on both sides of the array and add them up to get the maximum sum that crosses the midpoint. We can use a subroutine to do this.

    \begin{algorithmic}
    \PROCEDURE{max-crossing}{$A$}
        \STATE $m = \left\lfloor \frac n 2 \right\rfloor$
        \STATE left-sum $ = -\infty$
        \STATE sum $ = 0$
        \FOR{$i$ from $m$ to 1}
            \STATE sum $=$ sum $+ A[i]$
            \STATE left-sum = max(sum, left-sum)
        \ENDFOR
        \STATE right-sum $ = -\infty$
        \STATE sum $ = 0$
        \FOR{$j$ from $m+1$ to $n$}
            \STATE sum $=$ sum $+ A[j]$
            \STATE right-sum = max(sum, right-sum)
        \ENDFOR
        \RETURN{left-sum $+$ right-sum}
    \ENDPROCEDURE
    \end{algorithmic}

Let's prove that this works.

Max-Crossing computes the maximum subarray sum that crosses $m$.

We argue that the maximum sum of such an array is necessarily the sum of the largest arrays ending at index $m$ and beginning at index $m+1$. Let $\ell$ be the sum of $A[i:m]$, the subarray with the largest sum ending at $m$ and let $r$ be the sum of $A[m+1:j]$, the subarray with the largest sum starting at $m+1$.

Suppose that some other subarray $A[i':j']$ that crosses $m$ has the largest sum, and call this sum $S$. Without loss of generality, suppose the sum of $A[i':m]$, call it $\ell'$, is less than $\ell$. Then $S - \ell' + \ell \gt S$, which contradicts our assumption that $S$ was maximal. The same argument can be made for $A[m+1:j']$.

Now, let us show that Max-Crossing correctly computes the maximum subarray sum. We will show that the algorithm correctly computes the maximum subarray sums for $A[1:m]$ ending at $m$ and $A[m+1:n]$ starting at $m+1$. We will prove this by induction for the left half by induction on $i = m-k$—the argument is symmetric for the right half.

The base case is when $k = 0$, so $i = m$. In this case, $A[m:m] = A[m]$. The algorithm makes the assignment $\texttt{left-sum} = A[m]$.

Now, consider $k \gt 0$ and assume that $\texttt{left-sum}$ correctly stores the maximal sum for the array $A[k-1:m] = A[i+1:m]$. Now, we consider the sum of $A[k:m] = A[i:m]$, which is \[\texttt{sum} = \sum^m_{j = i} A[j] = A[i] + \sum^m_{j = i+1} A[j].\] By our inductive hypothesis, the maximum sum of $A[i+1:m]$ is $\texttt{left-sum}$. This gets changed only if $\texttt{sum} \gt \texttt{left-sum}$.

A symmetric argument holds for $A[m+1:j]$ and we can conclude that Max-Crossing correctly computes the maximum subarray sum that crosses $m$.

Then we can use this in our algorithm for maximum subarray sum which follows.

    \begin{algorithmic}
    \PROCEDURE{max-subarray}{$A$}
        \STATE Let $n = |A|$
        \IF{$n$ = 1}
            \RETURN{$A[1]$}
        \ELSE
            \STATE $m = \left\lfloor \frac n 2 \right\rfloor$
            \STATE $\ell = $ \CALL{max-subarray}{$A[1:m]$}
            \STATE $r = $ \CALL{max-subarray}{$A[m+1:n]$}
            \STATE $x = $ \CALL{max-crossing}{$A$}
            \RETURN{$\max\{\ell,r,x\}$}
        \ENDIF
    \ENDPROCEDURE
    \end{algorithmic}

Now, to prove that this actually works. In a way, since the algorithm is recursive, the proof is also fairly straightforward, because induction is about recursion. The only complication is when dealing with the "combining".

Max-Subarray computes the maximum subarray sum.

By (strong) induction on $n$, the length of $A$. For the base case, we have $n = 1$, in which case the sum of the array is the single element.

Now suppose $n \gt 1$ and assume that Max-Subarray correctly computes the maximum subarray for all arrays of length less than $n$. We observe that the maximum subarray sum is either contained in one of the halves or will be some subarray that crosses the middle of the array. We consider each possibility.

The running time of Max-Subarray is $O(n \log n)$.

As usual, let $T(n)$ be the worst-case time complexity of Max-Subarray on an input of size $n$. When $n = 1$, the base case, we have $T(1) = O(1)$. Then in the recursive case, we compute the midpoint $m$, in $O(1)$ time. Since we split our array in half and call Max-Subarray on each half, we have $T(n/2)$ each for lines 7 and 8. Line 9 is a call to Max-Crossing.

We observe that Max-Crossing computes the midpoint in $O(1)$ time and performs two loops, in sequence. Both loops perform additions, comparisons, and assignments in each iteration. Since each loop iterates over half the array, in total they take $O(n)$ time. Thus, Max-Crossing takes $O(n)$ time.

Finally, we take the maximum of $\ell,r,x$, which can be done in $O(1)$ time. Adding everything up, we get the following recurrence. \[T(n) = \begin{cases} O(1) & \text{if $n=1$}, \\ 2T(n/2) + O(n) & \text{if $n \gt 1$}. \end{cases}\]

Since this is the same recurrence as Mergesort, as we proved before, $T(n)$ is $O(n \log n)$.

You may have noticed that this algorithm gives the sum of the subarray, but not the indices for the subarray, though it's not too hard to modify the algorithm so that it reports that also.

It turns out that this is not the best we can do—a linear time algorithm is possible and this algorithm has a hint for how to do it. These algorithms are discussed in an article by Jon Bentley from 1984.