Variational derivative

Zadorozhny's now reading lectures on what he calls "variational methods and random processes". Despite of some common abuse of notation the course seems to be qute useful. I'll post some summaries of what he's teaching us. Yet I'll use slightly different notation and thus as always, all the good is his and all the wrong is mine. This is the part about variational derivatives.

Spaces

Let $(X, \|\cdot\|)$ be one of the following Banach spaces:

• the space of bounded continuous functions defined on the interval $T\subset\mathbb{R}$ with the norm:

• the space of functions possessing $k$ bounded continuous derivatives:

• the space $L^p(T),\ p\in\mathbb{N}$ of classes of equivalent functions with the norm:

Definition

We consider a functional $u: X\to \mathbb{R}$ and we consider it the context of variational problems or optimal control problems. The common approach to investigate these problems is linearization. Now we could quite naturally begin with consideration of Frechet derivative of $u$ (assuming it has one): $u(x+h) = u(x) + u'(x)(h) + o(\|h\|),$ where $u'(x)$ is a bounded linear operator and is the Frechet derivative of $u$ at the function $x$. Yet it turns out there's something even more convenient. As we saw earlier12 the functional $u$ is often an integral one. This and some further exploration leads to the following definition:

The function $\phi:\mathbb{R}\times X\to \mathbb{R}$ is called a functional or variational derivative of $u$ if the Frechet derivative $u'(x)$ at any function $x\in X$ is a bounded integral operator with the kernel $t\mapsto \phi(t, x)$

In simpler words, the variational derivative is a kernel of the Frechet derivative.

Nota Bene: while the Frechet derivative is unique it may have infinitely many equivalent kernels.

Notation

There's an established notation for the variational derivative. As you could've guessed it's a bit messy. Although aware of the distinction, in his works Zadorozhny usually doesn't distinguish a function from its value a derivative from the increment of the value associated with specific increment of the argument --- that is the derivative applied to that increment. The wikipedia page3 is prone to a similar abuse of notation.

Thus so far we will use the following notation for the value of the functional derivative at specific function $x$ and time $t$:

Sadly, Zadorozhny often calls this thing a function which of course it isn't. He also uses this same symbol for the value of the derivative and for all of the functions $t\mapsto \frac{\delta u(x)}{\delta x(t)}$, $x\mapsto \frac{\delta u(x)}{\delta x(t)}$, $(t, x) \mapsto \frac{\delta u(x)}{\delta x(t)}.$ It's common though. The real problem comes from the fact that he also calls both a derivative and a differential (and not just their value) the following expression:

I suppose that's because he's talking too much with physics folks. The common notion of a derivative is that of a linear mapping which approximates the function under consideration as good as possible. The integral above isn't a derivative of $u$ but the function $h\mapsto \int_T \phi(t, x)h(t)\mathrm{d}t$ is. The notion of a "differential" is then a non-sense. As opposed to the notion of a "differential associated with specific change $h$". Which is still excessive. That's a long and old story though which probably deserves its own writeup.

To denote the function $\phi$ itself I'd rather propose the notation

Yet obviously things are going to get more complicated than that when it comes to multivaried functionals and the change of variables. In these cases I believe multiindex notation could be used just as with usual derivatives.

Inclusions

It is important to note that $C_b^k$ spaces are included in each other as normed spaces so that if $A$ is a bounded linear operator in $C_b1$ then it is just as well bounded in all $C_b^k$. And so are $L^p,\ p\in\mathbb{N}$.

Suppose we have two norms $\rho_1, \rho_2$ which lie in the following relation:

Then

Thus whenever

it's also

In other words if $\omega(h) = o(\rho_1(h))$ as $\rho_1(h)\to 0$ then $\omega(h) = o(\rho_2(h))$ as $\rho_2(h)\to 0$.

Now it's easily seen that

and4

which proves our assertion about inclusions.

Properties or "The rules of variational differentiation"

Here one needs to be precise in what spaces the derivative is bounded and where it's not. Despite of that I'll omit proofs for some trivial cases.

Homogeneity

If $u$ has a variational derivative, then so does $cu$ and:

If the functionals $u$ and $v$ have variational derivatives, then so does their sum and the following equality holds:

Derivative of a multiple

If the functionals $u$ and $v$ have variational derivatives then so does $uv:x\mapsto u(x)v(x)$ and:

That is $\delta(uv)(t, x) = v(x)(\delta u)(t, x) + u(x)(\delta v)(t, x)$.

Chain rule 1

Suppose $u:X\to\mathbb{R}$ has a variational derivative and $f:\mathbb{R}\to\mathbb{R}$ is a $C_b^1(\mathbb{R})$ function. Consider a functional

Then

Chain rule 2

Suppose $u:X\to\mathbb{R}$ has a variational derivative and $f:\mathbb{R}\to\mathbb{R}$ is a $C_b^1(\mathbb{R})$ function. Consider a functional

If $(X, \|\cdot\|)$ is $L^p,\ p\in\mathbb{N}$, then $w$ has a variational derivative and:

Just in case you're not familiar with this notation, let's rewrite this definition using an auxilliary function $y=f\circ x$ defined by $y(t) = f(x(t))$ when we fix some $x$. Then $w(x) = u(y)$.

Then the statement above can be rewritten as:

Proof

Let's consider an increment:

In the second line we used a linear approximation of the value $f(x(t) + h(t))$ of a differentiable function $f$.

In the third line we've rewritten the preceding expression so that we get the expression of the form $u(x_0 + \Delta x)$, where $x_0$ and $\Delta x$ are two functions.

This allows us to use a linear approximation $u(x_0 + \Delta x) = u(x_0) + \int (\delta u)(t, x_0)\Delta x(t)\mathrm{d}t + o(\|\Delta x\|)$ in the fourth line.

Note the $o(\|h\|_1)$ in the end. It's actually $\int_T o(h(t))\mathrm{d}t$.

It follows then that in $L^1$ (and in $L^p$ by inclusion) the variational derivative of $w$ does exist and: