Going on with trivialities. I still haven't finished my story with autodiff, and since I've got to code it for DL homework... Well.

So, we consider \(F:M{=}\mathbb{R}^m\to N{=}\mathbb{R}^n\). Tangent spaces look exactly like original spaces except for where they go in multiplication: for \(g:N\to\mathbb{R}\) a tangent vector \(Y\in\mathcal{T}N\) is basically \(\mathbb{R}^n\) and acts by \(Y(g) = \langle \nabla_{F(p)} g, Y\rangle\).

Pushforward \(F_*:\mathcal{T}M\to \mathcal{T}N\) is defined by \((F_* X)(g) = X(g\circ F)\) for \(X\in\mathcal{T} M \sim \mathbb{R}^m\). Thus

\begin{equation*} \begin{split} (F_* X)(g) &= X(g\circ F) = \langle \nabla_{p} g\circ F, x\rangle \\ &= \left\langle ( \left. DF \right|_p \left.Dg\right|_{F(p)} )^\top, x\right\rangle \\ &= \langle {\underbrace{J_p(F)}_{\left. DF \right|_p}}^\top \underbrace{\nabla_{F(p)}}_{\left.Dg\right|_{F(p)}^\top} g, x \rangle \\ &= \left\langle \nabla_{F(p)}g, J_p(F)x \right\rangle . \end{split} \end{equation*}

Here we use \(D\) to denote Fr'echet derivatives (a linear map) and nabla to denote gradient (a vector -- the Riescz representation of that linear map) and we also identify linear map \(DF\) with Jacobian matrix \(J(F)\). Also I denote \(X\) casted to \(\mathbb{R}^m\) as just \(x\). I don't like how I'm already using too many different notations (after all, that's what I scorn differential geometers at for) but at the moment it seems fit.

So, basically the equation above means that in Euclidean case pushforward \(F_*\) acts on tangent \(X\) merely by multiplying with Jacobian \(J_p(F)\). In terms of matrix multiplication, \(F_* X\) is just \(J_p(F) x\in\mathbb{R}^n\)

Further, pullback \(F^*\) by definition maps right cotangent \(\xi\in\mathcal{T}_{F(p)}^*N\) into left cotangent \(F^*\xi\in\mathcal{T}_p M\) which acts as: \((F^*\xi) X = \xi(F_*X)\).

\begin{equation*} \begin{split} (F^*\xi) X &= \xi(F_*X)\\ &= \xi^\top J_p(F)x . \end{split} \end{equation*}

That is, pullbacked cotangent is just \(\xi^\top J_p(F)\in (\mathbb{R}^m)^*\) (acting on \(x\) from the left) and pullback \(F^*\) itself is still the same \(J_p(F)\) except acting on cotangents from the left. It is equivalent to say that pullback acts on \(\operatorname{column}(\xi)\) as transposed Jacobian \(J_p(F)^\top\):

\begin{equation*} \operatorname{column}(F^*\xi) = J_p(F)^\top \operatorname{column}(\xi). \end{equation*}

Now, why pulling back in gradient descent? Take \(F:\mathbb{R}^n \to\mathbb{R}\). Right cotangent is just a number \(\alpha\). Left cotangent would be \(\alpha J_p(F)\). It is such that

\begin{equation*} F^*\alpha X = \alpha(F_* X) = \alpha J_p(F)X \end{equation*}

What happens when we pull, transpose, and then push?


Comments powered by Disqus