# Pullbacks

Going on with trivialities. I still haven't finished my story with autodiff, and since I've got to code it for DL homework... Well.

So, we consider $F:M{=}\mathbb{R}^m\to N{=}\mathbb{R}^n$. Tangent spaces look exactly like original spaces except for where they go in multiplication: for $g:N\to\mathbb{R}$ a tangent vector $Y\in\mathcal{T}N$ is basically $\mathbb{R}^n$ and acts by $Y(g) = \langle \nabla_{F(p)} g, Y\rangle$.

Pushforward $F_*:\mathcal{T}M\to \mathcal{T}N$ is defined by $(F_* X)(g) = X(g\circ F)$ for $X\in\mathcal{T} M \sim \mathbb{R}^m$. Thus

\begin{equation*} \begin{split} (F_* X)(g) &= X(g\circ F) = \langle \nabla_{p} g\circ F, x\rangle \\ &= \left\langle ( \left. DF \right|_p \left.Dg\right|_{F(p)} )^\top, x\right\rangle \\ &= \langle {\underbrace{J_p(F)}_{\left. DF \right|_p}}^\top \underbrace{\nabla_{F(p)}}_{\left.Dg\right|_{F(p)}^\top} g, x \rangle \\ &= \left\langle \nabla_{F(p)}g, J_p(F)x \right\rangle . \end{split} \end{equation*}

Here we use $D$ to denote Fr'echet derivatives (a linear map) and nabla to denote gradient (a vector -- the Riescz representation of that linear map) and we also identify linear map $DF$ with Jacobian matrix $J(F)$. Also I denote $X$ casted to $\mathbb{R}^m$ as just $x$. I don't like how I'm already using too many different notations (after all, that's what I scorn differential geometers at for) but at the moment it seems fit.

So, basically the equation above means that in Euclidean case pushforward $F_*$ acts on tangent $X$ merely by multiplying with Jacobian $J_p(F)$. In terms of matrix multiplication, $F_* X$ is just $J_p(F) x\in\mathbb{R}^n$

Further, pullback $F^*$ by definition maps right cotangent $\xi\in\mathcal{T}_{F(p)}^*N$ into left cotangent $F^*\xi\in\mathcal{T}_p M$ which acts as: $(F^*\xi) X = \xi(F_*X)$.

\begin{equation*} \begin{split} (F^*\xi) X &= \xi(F_*X)\\ &= \xi^\top J_p(F)x . \end{split} \end{equation*}

That is, pullbacked cotangent is just $\xi^\top J_p(F)\in (\mathbb{R}^m)^*$ (acting on $x$ from the left) and pullback $F^*$ itself is still the same $J_p(F)$ except acting on cotangents from the left. It is equivalent to say that pullback acts on $\operatorname{column}(\xi)$ as transposed Jacobian $J_p(F)^\top$:

\begin{equation*} \operatorname{column}(F^*\xi) = J_p(F)^\top \operatorname{column}(\xi). \end{equation*}

Now, why pulling back in gradient descent? Take $F:\mathbb{R}^n \to\mathbb{R}$. Right cotangent is just a number $\alpha$. Left cotangent would be $\alpha J_p(F)$. It is such that

\begin{equation*} F^*\alpha X = \alpha(F_* X) = \alpha J_p(F)X \end{equation*}

What happens when we pull, transpose, and then push?