Going on with trivialities.
I still haven't finished my story with autodiff,
and since I've got to code it for DL homework...
Well.

So, we consider \(F:M{=}\mathbb{R}^m\to N{=}\mathbb{R}^n\).
Tangent spaces look exactly like original spaces except
for where they go in multiplication:
for \(g:N\to\mathbb{R}\)
a tangent vector \(Y\in\mathcal{T}N\)
is basically \(\mathbb{R}^n\) and
acts by \(Y(g) = \langle \nabla_{F(p)} g, Y\rangle\).

Pushforward \(F_*:\mathcal{T}M\to \mathcal{T}N\)
is defined by
\((F_* X)(g) = X(g\circ F)\)
for \(X\in\mathcal{T} M \sim \mathbb{R}^m\).
Thus

\begin{equation*}
\begin{split}
(F_* X)(g) &= X(g\circ F) = \langle \nabla_{p} g\circ F, x\rangle \\
&= \left\langle ( \left. DF \right|_p \left.Dg\right|_{F(p)} )^\top, x\right\rangle \\
&= \langle {\underbrace{J_p(F)}_{\left. DF \right|_p}}^\top \underbrace{\nabla_{F(p)}}_{\left.Dg\right|_{F(p)}^\top} g, x \rangle \\
&= \left\langle \nabla_{F(p)}g, J_p(F)x \right\rangle
.
\end{split}
\end{equation*}

Here we use \(D\) to denote Fr'echet derivatives (a linear map)
and nabla to denote gradient (a vector -- the Riescz representation of that linear map)
and we also identify linear map \(DF\) with Jacobian matrix \(J(F)\).
Also I denote \(X\) casted to \(\mathbb{R}^m\) as just \(x\).
I don't like how I'm already using too many different notations
(after all, that's what I scorn differential geometers at for)
but at the moment it seems fit.

So, basically the equation above means that in Euclidean case
pushforward \(F_*\) acts on tangent \(X\)
merely by multiplying with Jacobian \(J_p(F)\).
In terms of matrix multiplication, \(F_* X\) is just \(J_p(F) x\in\mathbb{R}^n\)

Further, pullback \(F^*\) by definition maps
right cotangent \(\xi\in\mathcal{T}_{F(p)}^*N\)
into left cotangent \(F^*\xi\in\mathcal{T}_p M\)
which acts as: \((F^*\xi) X = \xi(F_*X)\).

\begin{equation*}
\begin{split}
(F^*\xi) X &= \xi(F_*X)\\
&= \xi^\top J_p(F)x
.
\end{split}
\end{equation*}

That is, pullbacked cotangent is just \(\xi^\top J_p(F)\in (\mathbb{R}^m)^*\) (acting on \(x\) from the left)
and pullback \(F^*\) itself is still the same \(J_p(F)\)
except acting on cotangents from the left.
It is equivalent to say that pullback acts on \(\operatorname{column}(\xi)\)
as transposed Jacobian \(J_p(F)^\top\):

\begin{equation*}
\operatorname{column}(F^*\xi) = J_p(F)^\top \operatorname{column}(\xi).
\end{equation*}

Now, why pulling back in gradient descent?
Take \(F:\mathbb{R}^n \to\mathbb{R}\).
Right cotangent is just a number \(\alpha\).
Left cotangent would be \(\alpha J_p(F)\).
It is such that

\begin{equation*}
F^*\alpha X = \alpha(F_* X) = \alpha J_p(F)X
\end{equation*}

What happens when we pull, transpose, and then push?