CMSC 28000 — Lecture 9

Derivatives of Regular Expressions

Let's take a deeper dive into regular expressions. While we have a pretty good understanding of how a DFA processes a word, we've been comparatively vague about how to think about the same process when matching a word against a regular expression. For example, how would we talk about a word that has partially matched a regular expression, in the same way that we could think about a DFA that's halfway through reading a word?

Suppose we have a regular expression $\mathtt{(a+b)ab(a+b)}$ and I have the first letter of a word $b$. I want to know what the possible matches are after I have read/matched a $b$. For this expression, it is easy to tell: it is the words $aba$ and $abb$. But we can also use our regular expression to describe this set: it is the original expression without the first factor, $\mathtt{ab(a+b)}$.

This idea of a regular expression describing the set of strings after a partial match is called the derivatives of the regular expression. We can think of derivatives in the sense that we'll be considering a regular expression after it has partially matched a word, or capturing how the expression changes as it matches a word.

First, we will need the following definition.

The quotient of a language $L$ by a word $w$ is the language $$w^{-1} L = \{x \in \Sigma^* \mid wx \in L\}.$$ Much like how ideas like concatenation, powers, and factors (subwords) invoke analogies to multiplication, the use of the term quotient in this case can be thought of in the same way. The quotient $w^{-1} L$ is what's left when we "factor out" $w$ from the front of the word.

Note that we can take quotients on any language, not just those that are regular. However, a key observation that Janusz Brzozowski made in 1964 was that one can compute a quotient for a regular language via its regular expression. This is the genesis of the derivative.

First, we define the notion of the constant term of a regular expression.

The constant term of a regular expression $\mathbf r$, denoted $\mathsf c(\mathbf r)$, is defined by

and for regular expressions $\mathbf r_1$ and $\mathbf r_2$,

The consequence of this definition can be summed up as follows.

For a regular expression $\mathbf r$, $$\mathsf c(\mathbf r) = \begin{cases} \varepsilon & \text{if $\varepsilon \in L(\mathbf r)$,} \\ \emptyset & \text{if $\varepsilon \not \in L(\mathbf r)$.} \end{cases}$$

Observe that the result of $\mathsf c(\mathbf r)$ is a regular expression: it is either $\varepsilon$ or $\emptyset$, depending on whether or not $\varepsilon$ is matched by $\mathbf r$ (i.e. whether $\varepsilon$ is a member of the language of $\mathbf r$).

Remember also that derivatives are regular expressions. The goal is to define a systematic way of computing the derivatives when given a regular expression. The following definition tells us how.

Let $\mathbf r$ be a regular expression. Then the derivative of $\mathbf r$ with respect to $a \in \Sigma$, denoted $\frac{d}{da}\left(\mathbf r\right)$, is defined recursively by

and for regular expressions $\mathbf r_1$ and $\mathbf r_2$,

You might think this notation is kind of cute because it evokes the idea of derivatives from calculus. This notation is actually due to Jacques Sakarovitch, in Elements of Automata Theory, and I think it's nicer than the $D_a$ from the original Brzozowski paper.

The claim here is that if I have a regular expression $\mathbf r$, applying this definition for a symbol $a \in \Sigma$ will give me a regular expression $\frac{d}{da}\left(\mathbf r\right)$ that describes all words $x$ such that $ax$ was matched by $\mathbf r$. In other words, the derivative $\frac{d}{da}(\mathbf r)$ should describe the quotient $a^{-1}L(\mathbf r)$. Let's prove this.

Let $\mathbf r$ be a regular expression and let $a \in \Sigma$ be a symbol. Then, $$L\left(\frac{d}{da}\left(\mathbf r\right)\right) = a^{-1} L\left(\mathbf r\right) = \{x \in \Sigma^* \mid ax \in L\left(\mathbf r\right)\}.$$

We will prove this by induction on $\mathbf r$.

For the base cases, we have the following

Now, let $\mathbf s$ and $\mathbf t$ be regular expressions and assume that for all $a \in \Sigma$, $L\left(\frac{d}{da}\left(\mathbf s\right)\right) = a^{-1}L\left(\mathbf s\right)$ and $L\left(\frac{d}{da}\left(\mathbf t\right)\right) = a^{-1}L\left(\mathbf t\right)$.

First, we have $\mathbf r = \mathbf s + \mathbf t$: \begin{align*} L\left(\frac{d}{da}\left(\mathbf s + \mathbf t\right)\right) &= L\left(\frac{d}{da}\left(\mathbf s\right)+ \frac{d}{da}\left(\mathbf t\right)\right) \\ &= L\left(\frac{d}{da}\left(\mathbf s\right)\right) \cup L\left(\frac{d}{da}\left(\mathbf t\right)\right) \\ &= a^{-1} L\left(\mathbf s\right) \cup a^{-1} L\left(\mathbf t\right) \\ &= \{x \in \Sigma^* \mid ax \in L\left(\mathbf s\right)\} \cup \{x \in \Sigma^* \mid ax \in L\left(\mathbf t\right)\} \\ &= \{x \in \Sigma^* \mid ax \in L\left(\mathbf s\right) \cup L\left(\mathbf t\right)\} \\ &= \{x \in \Sigma^* \mid ax \in L\left(\mathbf s + \mathbf t\right)\} \\ &= a^{-1} L\left(\mathbf s + \mathbf t\right) \end{align*}

Next, we have $\mathbf r = \mathbf s \mathbf t$. We will define $L\left(\mathbf s\right)^\circ = L\left(\mathbf s\right) - \{\varepsilon\}$. Note that $L\left(\mathbf s\right) = L\left(\mathbf s\right)^\circ \cup L\left(\mathsf c\left(\mathbf s\right)\right)$. \begin{align*} a^{-1} L\left(\mathbf s \mathbf t\right) &= \{x \in \Sigma^* \mid ax \in L\left(\mathbf s\right)L\left(\mathbf t\right)\} \\ &= \{x \in \Sigma^* \mid ax \in \left(L\left(\mathbf s\right)^\circ \cup L\left(\mathsf c\left(\mathbf s\right)\right)\right) L\left(\mathbf t\right)\} \\ &= \{x \in \Sigma^* \mid ax \in \left(L\left(\mathbf s\right)^\circ L\left(\mathbf t\right) \cup L\left(\mathsf c\left(\mathbf s\right)\right) L\left(\mathbf t\right)\right)\} \\ &= \{x \in \Sigma^* \mid ax \in L\left(\mathbf s\right)^\circ L\left(\mathbf t\right)\} \cup \{x \in \Sigma^* \mid ax \in L\left(\mathsf c\left(\mathbf s\right)\right) L\left(\mathbf t\right)\} \\ &= \{uv \in \Sigma^* \mid au \in L\left(\mathbf s\right)^\circ, v \in L\left(\mathbf t\right)\} \cup a^{-1} L\left(\mathsf c\left(\mathbf s\right)\mathbf t\right) \\ &= \left(a^{-1} L\left(\mathbf s\right)^\circ\right) L\left(\mathbf t\right) \cup a^{-1} L\left(\mathsf c\left(\mathbf s\right)\mathbf t\right) \\ \end{align*} Here, we make the observation that $a^{-1} L\left(\mathbf s\right)^\circ = a^{-1} L\left(\mathbf s\right)$, since $a^{-1} \{\varepsilon\} = \emptyset$. Therefore, $\left(a^{-1} L\left(\mathbf s\right)^\circ\right) L\left(\mathbf t\right) = L\left(\frac{d}{da}\left(\mathbf s\right)\mathbf t\right)$. We also observe that $a^{-1} L\left(\mathsf c\left(\mathbf s\right) \mathbf t\right) = L\left(\mathsf c\left(\mathbf s\right) \frac{d}{da}\left(\mathbf t\right)\right)$, since $\mathsf c\left(\mathbf s\right)$ is either $\emptyset$ or $\varepsilon$. Putting this together, we continue, to get the following: \begin{align*} a^{-1} L\left(\mathbf s \mathbf t\right) &= \left(a^{-1} L\left(\mathbf s\right)^\circ\right) L\left(\mathbf t\right) \cup a^{-1} L\left(\mathsf c\left(\mathbf s\right)\mathbf t\right) \\ &= L\left(\frac{d}{da}\left(\mathbf s\right) \mathbf t\right) \cup L\left(\mathsf c\left(\mathbf s\right) \frac{d}{da}\left(\mathbf t\right)\right) \\ &= L\left(\frac{d}{da}\left(\mathbf s\right) \mathbf t + \mathsf c\left(s\right) \frac{d}{da}\left(\mathbf t\right)\right) \\ &= L\left(\frac{d}{da}\left(\mathbf s \mathbf t\right)\right) \end{align*}

Therefore, $L\left(\frac{d}{da}\left(\mathbf s \mathbf t\right)\right) = a^{-1}L\left(\mathbf s \mathbf t\right)$.

Finally, we have $\mathbf r = \mathbf s^*$. \begin{align*} L\left(\frac{d}{da}\left(\mathbf s^*\right)\right) &= L\left(\frac{d}{da}\left(\mathbf s\right)\mathbf s^*\right) \\ &= \left(a^{-1}L\left(\mathbf s\right)\right) L\left(\mathbf s^*\right) \\ &= \{xy \in \Sigma^* \mid ax \in L\left(\mathbf s\right), y \in L\left(\mathbf s^*\right)\} \\ &= \{w \in \Sigma^* \mid aw \in L\left(\mathbf s^*\right)\} \\ &= a^{-1} L\left(\mathbf s^*\right) \end{align*}

 

We can define an obvious extension for derivatives to words. Let $w = ax$, where $x \in \Sigma^*$ and $a \in \Sigma^*$. Then $\frac{d}{dw}\left(\mathbf r\right) = \frac{d}{dx}\left(\frac{d}{da}\left(\mathbf r\right)\right)$. This leads to the following very interesting observation.

Let $\mathbf r$ be a regular expression and let $w \in \Sigma^*$. Then $w \in L\left(\mathbf r\right)$ if and only if $\varepsilon \in L\left(\frac{d}{dw}\left(\mathbf r\right)\right)$.

We have $\varepsilon \in L\left(\frac{d}{dw}\left(\mathbf r\right)\right) \iff \varepsilon \in w^{-1} L\left(\mathbf r\right) \iff w \varepsilon \in L\left(\mathbf r\right) \iff w \in L\left(\mathbf r\right).$

Let's see where this idea takes us.

Let's compute the derivatives for the regular expression $\mathbf r = \mathtt{\left(0+1\right)^* 0}$. You may already recognize this as the language of binary strings that end in 0. This is admittedly pretty simple, but we'll see that the computations are quite long to do by hand. Let's start with $\frac{d}{d0}\left(\mathbf r\right)$. \begin{align*} \frac{d}{d0}\left(\mathbf r\right) &= \frac{d}{d0}\left(\mathtt{\left(0+1\right)^*0}\right) \\ &= \frac{d}{d0}\left(\mathtt{\left(0+1\right)^*}\right) \mathtt 0 + \mathsf c\left(\mathtt{\left(0+1\right)^*}\right)\frac{d}{d0}\left(\mathtt 0\right) \\ &= \frac{d}{d0}\left(\mathtt{\left(0+1\right)^*}\right)\mathtt 0 + \varepsilon \varepsilon \\ &= \frac{d}{d0}\left(\mathtt{\left(0+1\right)^*}\right)\mathtt 0 + \varepsilon \\ &= \frac{d}{d0}\left(\mathtt {0+1}\right)\mathtt{\left(0+1\right)^*0} + \varepsilon \\ &= \left(\frac{d}{d0}\left(\mathtt 0\right) + \frac{d}{d0}\left(\mathtt 1\right)\right)\mathtt{\left(0+1\right)^*0} + \varepsilon \\ &= \mathtt{\left(\varepsilon + \emptyset\right)\left(0+1\right)^*0 + \varepsilon} \\ &= \mathtt{\varepsilon \left(0+1\right)^*0 + \varepsilon} \\ &= \mathtt{\left(0+1\right)^*0 + \varepsilon} \end{align*}

(If you're interested in finding a set of rules for doing algebra on regular expressions, Kozen has a supplementary section on the Kleene Algebra starting on page 55, between Chapters 9 and 10. Otherwise, it's not too hard to reason through—things generally work the way you'd expect them to.)

Here, the derivative tells us that if we read a $0$, we can either stop ($\varepsilon$) because we skipped the star term $\mathtt{\left(0+1\right)^*}$, so we have a word that matches our regular expression, or we can go back for another round through the star term. Let's call this regular expression $\mathbf r_1$. Before considering the derivatives of $\mathbf r_1$, let's consider $\frac{d}{d1}\left(\mathbf r\right)$ first. \begin{align*} \frac{d}{d1}\left(\mathbf r\right) &= \frac{d}{d1}\left(\mathtt{\left(0+1\right)^*0}\right) \\ &= \frac{d}{d1}\left(\mathtt{\left(0+1\right)^*}\right)\mathtt 0 + \mathsf c\left(\mathtt{\left(0+1\right)^*}\right)\frac{d}{d1}\left(\mathtt 0\right) \\ &= \frac{d}{d1}\left(\mathtt{\left(0+1\right)^*}\right)\mathtt 0+ \varepsilon \emptyset \\ &= \frac{d}{d1}\left(\mathtt{\left(0+1\right)^*}\right)\mathtt 0 \\ &= \frac{d}{d1}\left(\mathtt{0+1}\right)\mathtt{\left(0+1\right)^*0} \\ &= \mathtt{\left(\emptyset + \varepsilon\right)\left(0+1\right)^*0} \\ &= \mathtt{\varepsilon \left(0+1\right)^*0} \\ &= \mathtt{\left(0+1\right)^*0} \\ &= \mathbf r \end{align*} This derivative says that if we read a $1$, we can go through another round of the star term $\mathtt{\left(0+1\right)^*}$ but we haven't completed a match (i.e. $\varepsilon$ is not a possible match in this derivative). Now let's consider the derivatives on $\mathbf r_1$. We will see that our job will be slightly easier. \begin{align*} \frac{d}{d0}\left(\mathbf r_1\right) &= \frac{d}{d0}\left(\mathtt{\left(0+1\right)^*0 + \varepsilon}\right) \\ &= \frac{d}{d0}\left(\mathtt{\left(0+1\right)^*0}\right) + \frac{d}{d0}\left(\varepsilon\right) \\ &= \frac{d}{d0}\left(\mathbf r\right) + \emptyset \\ &= \mathbf r_1 \end{align*} \begin{align*} \frac{d}{d1}\left(\mathbf r_1\right) &= \frac{d}{d1}\left(\mathtt{\left(0+1\right)^*0 + \varepsilon}\right) \\ &= \frac{d}{d1}\left(\mathtt{\left(0+1\right)^*0}\right) + \frac{d}{d1}\left(\varepsilon\right) \\ &= \frac{d}{d1}\left(\mathbf r\right) + \emptyset \\ &= \mathbf r \end{align*}

So we seem to have arrived at two regular expressions that we've seen before. In fact, Brzozowski shows that this is guaranteed to happen.

Every regular expression $\mathbf r$ has a finite number of distinct derivatives.

This has a few implications. First of all, to compute all the derivatives of a regular expression $\mathbf r$, we can do what we've done and just keep on computing them until we start getting the same expressions. Obviously, this is not the most efficient way to do this, but we're guaranteed the computation will stop.

But thinking back to the theoretical point of view, recall that these are sets of suffixes of words in our language. And when we consume prefixes of words belonging to these sets, we get different sets, but there are only finitely many of these sets. And finally, we know that those sets that contain $\varepsilon$ correspond to words in the language. This looks suspiciously like the following.

Unlike the Thompson construction we saw before or the position automaton I mentioned briefly, this directly gives us a DFA. However, if we want to use derivatives to compute an NFA, one way to do this was proposed by Antimirov, who, in 1996, defined partial derivatives of a regular expression. This method, like the position automaton that was mentioned earlier, produces an NFA of roughly the same size as the length of the regular expression.

Here's another example.

Let's compute the derivatives for the regular expression $\mathbf r = \mathtt{\left(10^*1 + 0\right)^*}$. We have for $0$, \begin{align*} \frac{d}{d0}\left(\mathbf r\right) &= \frac{d}{d0}\left(\mathtt{10^*1+0}\right) \mathbf r \\ &= \left(\frac{d}{d0}\left(\mathtt{10^*1}\right) + \frac{d}{d0}\left(\mathtt 0\right)\right) \mathbf r \\ &= \left(\frac{d}{d0}\left(\mathtt{10^*1}\right) + \varepsilon\right) \mathbf r \\ &= \left(\frac{d}{d0}\left(\mathtt 1\right)\mathtt{0^*1} + \mathsf c\left(\mathtt 1\right)\frac{d}{d0}\left(\mathtt {0^*1}\right) + \varepsilon\right) \mathbf r \\ &= \left(\emptyset \mathtt{0^*1} + \emptyset \frac{d}{d0}\left(\mathtt{0^*1}\right) + \varepsilon\right) \mathbf r \\ &= \varepsilon \mathbf r \\ &= \mathbf r \end{align*} So, $\frac{d}{d0}\left(\mathbf r\right) = \mathbf r$, which makes sense: reading $0$ would allows us to start from the beginning of the regular expression since we can just take the star. But for $1$, we have \begin{align*} \frac{d}{d1}\left(\mathbf r\right) &= \frac{d}{d1}\left(\mathtt{10^*1+0}\right) \mathbf r \\ &= \left(\frac{d}{d1}\left(\mathtt {10^*1}\right) + \frac{d}{d1}\left(\mathtt 0\right)\right) \mathbf r \\ &= \left(\frac{d}{d1}\left(\mathtt {10^*1}\right) + \emptyset\right) \mathbf r \\ &= \left(\frac{d}{d1}\left(\mathtt 1\right)\mathtt{0^*1} + \mathsf c\left(\mathtt 1\right)\frac{d}{d1}\left(\mathtt{0^*1}\right)\right) \mathbf r \\ &= \left(\mathtt{\varepsilon 0^*1} + \emptyset \frac{d}{d1}\left(\mathtt{0^*1}\right)\right) \mathbf r \\ &= \mathtt{0^*1} \mathbf r \\ &= \mathtt{0^*1 \left(10^*1 + 0\right)^*} \end{align*} Here, this says that if we read $1$, then we are going to be partially matched to the first term inside the star, and we must finish the rest of the term before going for another round through the star. Let's call this new RE $\mathbf r_1$ and consider its derivatives. For $0$, we have \begin{align*} \frac{d}{d0}\left(\mathbf r_1\right) &= \frac{d}{d0}\left(\mathtt{0^*1} \mathbf r\right) \\ &= \frac{d}{d0}\left(\mathtt{0^*}\right)1 \mathbf r + \mathsf c\left(\mathtt{0^*}\right) \frac{d}{d0}\left(\mathtt 1\mathbf r\right) \\ &= \frac{d}{d0}\left(\mathtt{0^*}\right)1 \mathbf r + \varepsilon \frac{d}{d0}\left(\mathtt 1\right)\mathbf r + \mathsf c\left(\mathtt 1\right)\frac{d}{d0}\left(\mathbf r\right) \\ &= \frac{d}{d0}\left(\mathtt{0^*}\right)1 \mathbf r + \emptyset \mathbf r + \emptyset \frac{d}{d0}\left(\mathbf r\right) \\ &= \frac{d}{d0}\left(\mathtt{0^*}\right)1 \mathbf r \\ &= \frac{d}{d0}\left(\mathtt 0\right) \mathtt{0^* 1} \mathbf r \\ &= \varepsilon 0^* 1 \mathbf r \\ &= \mathtt{0^* 1} \mathbf r \\ &= \mathbf r_1 \end{align*} Again, we have that reading $0$ doesn't change the regular expression, because we're still in the middle of reading $0$s. Then for $1$, we have \begin{align*} \frac{d}{d1}\left(\mathbf r_1\right) &= \frac{d}{d1}\left(\mathtt{0^*1} \mathbf r\right) \\ &= \frac{d}{d1}\left(\mathtt{0^*}\right)\mathtt 1 \mathbf r + \mathsf c\left(\mathtt{0^*}\right) \frac{d}{d1}\left(\mathtt 1\mathbf r\right) \\ &= \frac{d}{d1}\left(\mathtt{0^*}\right)\mathtt 1 \mathbf r + \varepsilon \frac{d}{d1}\left(\mathtt 1\right)\mathbf r + \mathsf c\left(\mathtt 1\right)\frac{d}{d1}\left(\mathbf r\right) \\ &= \frac{d}{d1}\left(\mathtt{0^*}\right)\mathtt 1 \mathbf r + \varepsilon \mathbf r + \emptyset \frac{d}{d1}\left(\mathbf r\right) \\ &= \frac{d}{d1}\left(\mathtt{0^*}\right)\mathtt 1 \mathbf r + \mathbf r \\ &= \frac{d}{d1}\left(\mathtt 0\right) \mathtt{0^* 1} \mathbf r + \mathbf r \\ &= \emptyset \mathtt{0^* 1} \mathbf r + \mathbf r \\ &= \mathbf r \end{align*} These derivatives give us the following DFA.

This method lends itself quite nicely to implementation in functional programming languages. In fact, Stuart Kurtz's CMSC 28000 course notes on derivatives pointed me to a paper detailing the use of derivatives in the regular expression implementations of Standard ML and Scheme (which you probably now know as Racket).

How I first learned about derivatives was as an undergrad stumbling upon the first version of Matt Might's post about his paper on using derivatives for parsing. It's quite accessible if you have some formal languages and algorithms knowledge. He also has his own implementation of a very simple regular expression matcher that uses derivatives in Scheme, which should be easy enough to follow if you made it through CS 151 (or maybe don't look at it and try to implement it yourself).