CMSC 28000 — Lecture 9

Derivatives of Regular Expressions

Let's take a deeper dive into regular expressions. While we have a pretty good understanding of how a DFA processes a word, we've been comparatively vague about how to think about the same process when matching a word against a regular expression. For example, how would we talk about a word that has partially matched a regular expression, in the same way that we could think about a DFA that's halfway through reading a word?

Suppose we have a regular expression $\mathtt{(a+b)ab(a+b)}$ and I have the first letter of a word $b$. I want to know what the possible matches are after I have read/matched a $b$. For this expression, it is easy to tell: it is the words $aba$ and $abb$. But we can also use our regular expression to describe this set: it is the original expression without the first factor, $\mathtt{ab(a+b)}$.

This idea of a regular expression describing the set of strings after a partial match is called the derivatives of the regular expression. We can think of derivatives in the sense that we'll be considering a regular expression after it has partially matched a word, or capturing how the expression changes as it matches a word.

First, we will need the following definition from Solved Problem 1 of Problem Set 2.

The quotient of a language $L$ by a word $w$ is the language $$w^{-1} L = \{x \in \Sigma^* \mid wx \in L\}.$$ Much like how ideas like concatenation, powers, and factors (subwords) invoke analogies to multiplication, the use of the term quotient in this case can be thought of in the same way. The quotient $w^{-1} L$ is what's left when we "factor out" $w$ from the front of the word.

Note that we can take quotients on any language, not just those that are regular. However, a key observation that Janusz Brzozowski made in 1964 was that one can compute a quotient for a regular language via its regular expression. This is the genesis of the derivative.

First, we define the notion of the constant term of a regular expression.

The constant term of a regular expression $\mathbf r$, denoted $\mathsf c(\mathbf r)$, is defined by

and for regular expressions $\mathbf r_1$ and $\mathbf r_2$,

The consequence of this definition can be summed up as follows.

For a regular expression $\mathbf r$, $$\mathsf c(\mathbf r) = \begin{cases} \varepsilon & \text{if $\varepsilon \in L(\mathbf r)$,} \\ \emptyset & \text{if $\varepsilon \not \in L(\mathbf r)$,} \end{cases}$$ and therefore, $L(\mathsf c(\mathbf r)) = L(\mathbf r) \cap \{\varepsilon\}$.

Observe that the result of $\mathsf c(\mathbf r)$ is a regular expression: it is either $\varepsilon$ or $\emptyset$, depending on whether or not $\varepsilon$ is matched by $\mathbf r$ (i.e. whether $\varepsilon$ is a member of the language of $\mathbf r$).

Remember also that derivatives are regular expressions. The goal is to define a systematic way of computing the derivatives when given a regular expression. The following definition tells us how.

Let $\mathbf r$ be a regular expression. Then the derivative of $\mathbf r$ with respect to $a \in \Sigma$, denoted $\frac{d}{da}\left(\mathbf r\right)$, is defined recursively by

and for regular expressions $\mathbf r_1$ and $\mathbf r_2$,

You might think this notation is kind of cute because it evokes the idea of derivatives from calculus. This notation is due to Jacques Sakarovitch, in Elements of Automata Theory, and I think it's nicer than the $D_a$ from the original Brzozowski paper.

The claim here is that if I have a regular expression $\mathbf r$, applying this definition for a symbol $a \in \Sigma$ will give me a regular expression $\frac{d}{da}\left(\mathbf r\right)$ that describes all words $x$ such that $ax$ was matched by $\mathbf r$. In other words, the derivative $\frac{d}{da}(\mathbf r)$ should describe the quotient $a^{-1}L(\mathbf r)$. This is a statement that requires proof.

Let $\mathbf r$ be a regular expression and let $a \in \Sigma$ be a symbol. Then, $$L\left(\frac{d}{da}\left(\mathbf r\right)\right) = a^{-1} L\left(\mathbf r\right) = \{x \in \Sigma^* \mid ax \in L\left(\mathbf r\right)\}.$$

We will prove this by induction on $\mathbf r$.

For the base cases, we have the following

Now, let $\mathbf s$ and $\mathbf t$ be regular expressions and assume that for all $a \in \Sigma$, $L\left(\frac{d}{da}\left(\mathbf s\right)\right) = a^{-1}L\left(\mathbf s\right)$ and $L\left(\frac{d}{da}\left(\mathbf t\right)\right) = a^{-1}L\left(\mathbf t\right)$.

First, we have $\mathbf r = \mathbf s + \mathbf t$: \begin{align*} L\left(\frac{d}{da}\left(\mathbf s + \mathbf t\right)\right) &= L\left(\frac{d}{da}\left(\mathbf s\right)+ \frac{d}{da}\left(\mathbf t\right)\right) \\ &= L\left(\frac{d}{da}\left(\mathbf s\right)\right) \cup L\left(\frac{d}{da}\left(\mathbf t\right)\right) \\ &= a^{-1} L\left(\mathbf s\right) \cup a^{-1} L\left(\mathbf t\right) \\ &= \{x \in \Sigma^* \mid ax \in L\left(\mathbf s\right)\} \cup \{x \in \Sigma^* \mid ax \in L\left(\mathbf t\right)\} \\ &= \{x \in \Sigma^* \mid ax \in L\left(\mathbf s\right) \cup L\left(\mathbf t\right)\} \\ &= \{x \in \Sigma^* \mid ax \in L\left(\mathbf s + \mathbf t\right)\} \\ &= a^{-1} L\left(\mathbf s + \mathbf t\right) \end{align*}

Next, we have $\mathbf r = \mathbf s \mathbf t$. We will define $L\left(\mathbf s\right)^\circ = L\left(\mathbf s\right) - \{\varepsilon\}$. Note that $L\left(\mathbf s\right) = L\left(\mathbf s\right)^\circ \cup L\left(\mathsf c\left(\mathbf s\right)\right)$. \begin{align*} a^{-1} L\left(\mathbf s \mathbf t\right) &= \{x \in \Sigma^* \mid ax \in L\left(\mathbf s\right)L\left(\mathbf t\right)\} \\ &= \{x \in \Sigma^* \mid ax \in \left(L\left(\mathbf s\right)^\circ \cup L\left(\mathsf c\left(\mathbf s\right)\right)\right) L\left(\mathbf t\right)\} \\ &= \{x \in \Sigma^* \mid ax \in \left(L\left(\mathbf s\right)^\circ L\left(\mathbf t\right) \cup L\left(\mathsf c\left(\mathbf s\right)\right) L\left(\mathbf t\right)\right)\} \\ &= \{x \in \Sigma^* \mid ax \in L\left(\mathbf s\right)^\circ L\left(\mathbf t\right)\} \cup \{x \in \Sigma^* \mid ax \in L\left(\mathsf c\left(\mathbf s\right)\right) L\left(\mathbf t\right)\} \\ &= \{uv \in \Sigma^* \mid au \in L\left(\mathbf s\right)^\circ, v \in L\left(\mathbf t\right)\} \cup a^{-1} L\left(\mathsf c\left(\mathbf s\right)\mathbf t\right) \\ &= \left(a^{-1} L\left(\mathbf s\right)^\circ\right) L\left(\mathbf t\right) \cup a^{-1} L\left(\mathsf c\left(\mathbf s\right)\mathbf t\right) \\ \end{align*} Here, we make the observation that $a^{-1} L\left(\mathbf s\right)^\circ = a^{-1} L\left(\mathbf s\right)$, since $a^{-1} \{\varepsilon\} = \emptyset$. Therefore, $\left(a^{-1} L\left(\mathbf s\right)^\circ\right) L\left(\mathbf t\right) = L\left(\frac{d}{da}\left(\mathbf s\right)\mathbf t\right)$. We also observe that $a^{-1} L\left(\mathsf c\left(\mathbf s\right) \mathbf t\right) = L\left(\mathsf c\left(\mathbf s\right) \frac{d}{da}\left(\mathbf t\right)\right)$, since $\mathsf c\left(\mathbf s\right)$ is either $\emptyset$ or $\varepsilon$. Putting this together, we continue, to get the following: \begin{align*} a^{-1} L\left(\mathbf s \mathbf t\right) &= \left(a^{-1} L\left(\mathbf s\right)^\circ\right) L\left(\mathbf t\right) \cup a^{-1} L\left(\mathsf c\left(\mathbf s\right)\mathbf t\right) \\ &= L\left(\frac{d}{da}\left(\mathbf s\right) \mathbf t\right) \cup L\left(\mathsf c\left(\mathbf s\right) \frac{d}{da}\left(\mathbf t\right)\right) \\ &= L\left(\frac{d}{da}\left(\mathbf s\right) \mathbf t + \mathsf c\left(s\right) \frac{d}{da}\left(\mathbf t\right)\right) \\ &= L\left(\frac{d}{da}\left(\mathbf s \mathbf t\right)\right) \end{align*}

Therefore, $L\left(\frac{d}{da}\left(\mathbf s \mathbf t\right)\right) = a^{-1}L\left(\mathbf s \mathbf t\right)$.

Finally, we have $\mathbf r = \mathbf s^*$. \begin{align*} L\left(\frac{d}{da}\left(\mathbf s^*\right)\right) &= L\left(\frac{d}{da}\left(\mathbf s\right)\mathbf s^*\right) \\ &= \left(a^{-1}L\left(\mathbf s\right)\right) L\left(\mathbf s^*\right) \\ &= \{xy \in \Sigma^* \mid ax \in L\left(\mathbf s\right), y \in L\left(\mathbf s^*\right)\} \\ &= \{w \in \Sigma^* \mid aw \in L\left(\mathbf s^*\right)\} \\ &= a^{-1} L\left(\mathbf s^*\right) \end{align*}

 

Let's compute the derivatives for the regular expression $\mathbf r = \mathtt{\left(0+1\right)^* 0}$. You may already recognize this as the language of binary strings that end in 0. This is admittedly pretty simple, but we'll see that the computations are quite long to do by hand. Let's start with $\frac{d}{d0}\left(\mathbf r\right)$. \begin{align*} \frac{d}{d0}\left(\mathbf r\right) &= \frac{d}{d0}\left(\mathtt{\left(0+1\right)^*0}\right) \\ &= \frac{d}{d0}\left(\mathtt{\left(0+1\right)^*}\right) \mathtt 0 + \mathsf c\left(\mathtt{\left(0+1\right)^*}\right)\frac{d}{d0}\left(\mathtt 0\right) \\ &= \frac{d}{d0}\left(\mathtt{\left(0+1\right)^*}\right)\mathtt 0 + \varepsilon \varepsilon \\ &= \frac{d}{d0}\left(\mathtt{\left(0+1\right)^*}\right)\mathtt 0 + \varepsilon \\ &= \frac{d}{d0}\left(\mathtt {0+1}\right)\mathtt{\left(0+1\right)^*0} + \varepsilon \\ &= \left(\frac{d}{d0}\left(\mathtt 0\right) + \frac{d}{d0}\left(\mathtt 1\right)\right)\mathtt{\left(0+1\right)^*0} + \varepsilon \\ &= \mathtt{\left(\varepsilon + \emptyset\right)\left(0+1\right)^*0 + \varepsilon} \\ &= \mathtt{\varepsilon \left(0+1\right)^*0 + \varepsilon} \\ &= \mathtt{\left(0+1\right)^*0 + \varepsilon} \end{align*}

(If you're interested in finding a set of rules for doing algebra on regular expressions, Kozen has a supplementary section on the Kleene Algebra starting on page 55, between Chapters 9 and 10. Otherwise, it's not too hard to reason through—things generally work the way you'd expect them to.)

Here, the derivative tells us that if we read a $0$, we can either stop ($\varepsilon$) because we skipped the star term $\mathtt{\left(0+1\right)^*}$, so we have a word that matches our regular expression, or we can go back for another round through the star term. Let's call this regular expression $\mathbf r_1$. Before considering the derivatives of $\mathbf r_1$, let's consider $\frac{d}{d1}\left(\mathbf r\right)$ first. \begin{align*} \frac{d}{d1}\left(\mathbf r\right) &= \frac{d}{d1}\left(\mathtt{\left(0+1\right)^*0}\right) \\ &= \frac{d}{d1}\left(\mathtt{\left(0+1\right)^*}\right)\mathtt 0 + \mathsf c\left(\mathtt{\left(0+1\right)^*}\right)\frac{d}{d1}\left(\mathtt 0\right) \\ &= \frac{d}{d1}\left(\mathtt{\left(0+1\right)^*}\right)\mathtt 0+ \varepsilon \emptyset \\ &= \frac{d}{d1}\left(\mathtt{\left(0+1\right)^*}\right)\mathtt 0 \\ &= \frac{d}{d1}\left(\mathtt{0+1}\right)\mathtt{\left(0+1\right)^*0} \\ &= \mathtt{\left(\emptyset + \varepsilon\right)\left(0+1\right)^*0} \\ &= \mathtt{\varepsilon \left(0+1\right)^*0} \\ &= \mathtt{\left(0+1\right)^*0} \\ &= \mathbf r \end{align*} This derivative says that if we read a $1$, we can go through another round of the star term $\mathtt{\left(0+1\right)^*}$ but we haven't completed a match (i.e. $\varepsilon$ is not a possible match in this derivative). Now let's consider the derivatives on $\mathbf r_1$. We will see that our job will be slightly easier. \begin{align*} \frac{d}{d0}\left(\mathbf r_1\right) &= \frac{d}{d0}\left(\mathtt{\left(0+1\right)^*0 + \varepsilon}\right) \\ &= \frac{d}{d0}\left(\mathtt{\left(0+1\right)^*0}\right) + \frac{d}{d0}\left(\varepsilon\right) \\ &= \frac{d}{d0}\left(\mathbf r\right) + \emptyset \\ &= \mathbf r_1 \end{align*} \begin{align*} \frac{d}{d1}\left(\mathbf r_1\right) &= \frac{d}{d1}\left(\mathtt{\left(0+1\right)^*0 + \varepsilon}\right) \\ &= \frac{d}{d1}\left(\mathtt{\left(0+1\right)^*0}\right) + \frac{d}{d1}\left(\varepsilon\right) \\ &= \frac{d}{d1}\left(\mathbf r\right) + \emptyset \\ &= \mathbf r \end{align*}