Going on with trivialities. I still haven't finished my story with autodiff, and since I've got to code it for DL homework... Well.

So, we consider \(F:M{=}\mathbb{R}^m\to N{=}\mathbb{R}^n\). Tangent spaces look exactly like original spaces except for where they go in multiplication: for \(g:N\to\mathbb{R}\) a tangent vector \(Y\in\mathcal{T}N\) is basically \(\mathbb{R}^n\) and acts by \(Y(g) = \langle \nabla_{F(p)} g, Y\rangle\).

Pushforward \(F_*:\mathcal{T}M\to \mathcal{T}N\) is defined by \((F_* X)(g) = X(g\circ F)\) for \(X\in\mathcal{T} M \sim \mathbb{R}^m\). Thus

\begin{equation*} \begin{split} (F_* X)(g) &= X(g\circ F) = \langle \nabla_{p} g\circ F, x\rangle \\ &= \left\langle ( \left. DF \right|_p \left.Dg\right|_{F(p)} )^\top, x\right\rangle \\ &= \langle {\underbrace{J_p(F)}_{\left. DF \right|_p}}^\top \underbrace{\nabla_{F(p)}}_{\left.Dg\right|_{F(p)}^\top} g, x \rangle \\ &= \left\langle \nabla_{F(p)}g, J_p(F)x \right\rangle . \end{split} \end{equation*}

Here we use \(D\) to denote Fr'echet derivatives (a linear map) and nabla to denote gradient (a vector -- the Riescz representation of that linear map) and we also identify linear map \(DF\) with Jacobian matrix \(J(F)\). Also I denote \(X\) casted to \(\mathbb{R}^m\) as just \(x\). I don't like how I'm already using too many different notations (after all, that's what I scorn differential geometers at for) but at the moment it seems fit.

So, basically the equation above means that in Euclidean case pushforward \(F_*\) acts on tangent \(X\) merely by multiplying with Jacobian \(J_p(F)\). In terms of matrix multiplication, \(F_* X\) is just \(J_p(F) x\in\mathbb{R}^n\)

Further, pullback \(F^*\) by definition maps right cotangent \(\xi\in\mathcal{T}_{F(p)}^*N\) into left cotangent \(F^*\xi\in\mathcal{T}_p M\) which acts as: \((F^*\xi) X = \xi(F_*X)\).

\begin{equation*} \begin{split} (F^*\xi) X &= \xi(F_*X)\\ &= \xi^\top J_p(F)x . \end{split} \end{equation*}

That is, pullbacked cotangent is just \(\xi^\top J_p(F)\in (\mathbb{R}^m)^*\) (acting on \(x\) from the left) and pullback \(F^*\) itself is still the same \(J_p(F)\) except acting on cotangents from the left. It is equivalent to say that pullback acts on \(\operatorname{column}(\xi)\) as transposed Jacobian \(J_p(F)^\top\):

\begin{equation*} \operatorname{column}(F^*\xi) = J_p(F)^\top \operatorname{column}(\xi). \end{equation*}

Now, why pulling back in gradient descent? Take \(F:\mathbb{R}^n \to\mathbb{R}\). Right cotangent is just a number \(\alpha\). Left cotangent would be \(\alpha J_p(F)\). It is such that

\begin{equation*} F^*\alpha X = \alpha(F_* X) = \alpha J_p(F)X \end{equation*}

What happens when we pull, transpose, and then push?

Dual to dual

Just realized how double dual isn't really the original space even in simple Euclidean case (which I was aware of, but some how didn't feel I was understanding): with vector being a column \(x\in\mathbb{R}^n\), dual a row \(x^\top\in(\mathbb{R}^n)^*\), the double dual \(x^{\top\top}\) is indeed a column, except when it acts on the rows in multiplication it goes TO THE RIGHT and not to the left:

\begin{equation*} \begin{split} x^\top(y) &= x^\top y,\\ x^{\top\top}(y^{\top}) &= y^\top x^{\top\top} = y^\top x. \end{split} \end{equation*}

This contrasts with rows acting on columns, where the dual (the row) acts from the left.

So exciting.

Update! It's much more than that! If we treat \(\mathbb{R}^n\) as a manifold, it turns out then that its tangent space looks more like double dual \((\mathbb{R}^n)^{**}\) rather than \(\mathbb{R}^n\) or \((\mathbb{R}^n)^*\), because when we consider a tangent vector acting on scalar functions \(\mathbb{R}^n\to\mathbb{R}\) in the special -- linear -- case, the tangent goes on the right and it does so as a column. Take a scalar function \((\mathbb{R}^n)^* \ni a^\top : \mathbb{R}^n \to \mathbb{R}\). Then a tangent \(X\in\mathcal{T}\mathbb{R}^n\) should act on \(a\) from the right:

\begin{equation*} X(a) = \left.\partial_t(t\mapsto a(x) + a^\top t X)\right|_{t=0} = a^\top X. \end{equation*}

Here \(x\in\mathbb{R}^n\) denotes \(X\in\mathcal{T}\mathbb{R}^n\) casted to \(\mathbb{R}^n\)

Browser multiplexing

Just realised I need a tmux for browser (as in, persistent remote BROWSER session I can attach to anytime from anywhere in the world).

That is aside from WEB ceasing to exist of course

SKDL19L2 take-outs

  • Dropout is similar to ensembling

  • Conjugate gradients are akin to momentum, mixing new direction with previous ones. That is quite different from my previous intuition that we "always move in new direction" which was correct... up to the metric.

  • Nesterov is about two things:

    • It is better to use gradient at the destination point than the gradient at the origin

    • With large momentum, we got a meaningful enough estimate of where we end up to compute that gradient

  • Mentioned this blog post again

  • Again this bullshit about SGD being "incorrect". It's not that SGD is not correct, it's that you define it the wrong way (being aware that it should be done in different way) and omit the explicit isomorphism.

DFS Mode

My brain seems to be working in a depth-first mode. I've noticed it long time ago, only then I thought it's only so for problems of technical or scientific nature and now I realize that it is a more general principle that applies to any activity I get involved in. Should I see anything that I'm able to admire (there are rather few such things), be that a mathematical problem or a person, I get obsessed with it; a new independent thread gets run in my head, performing depth-first expansions starting from that new point; it consumes me...

One more week, Night Feb 21-22

Now that everybody left... Another attempt to write down my thoughts and feelings. Already failing, jumping back and forth over lines writing different things in parallel, producing some pile of inconsistenty. I'm at the same place again. Another week (nearly) gone. And everyone's gone too. Trying to decide where to go tonight. Just realized that yesternight was the first night of this week when I've actually slept. Night after Monday I went to the dorm kind of late, slept for an hour and half, taken a shower, and left. I was going to Skoltech but then I figured it'd be a waste of time so when I got to Krestyanskaya Zastava, I changed the line. Failed stats deadline. Learned a few things about gyrovector spaces and the idea of combining matrix multiplication with logarithmic and exponential maps so as to make scalings and rotations along geodesics. Back to the place. Stayed there all night in a very enchanting company, desperately trying to work. Learned about some really good songs. With these people I'm learning a lot about humans and the world. About living. From outside, it seems like they really got this skill -- skill of living their lives. I couldn't really work though. Grinding. And so every week. By 4am we decided to watch Kill Bill and order a pizza. Wouldn't go to sleep after that though, because I had a meeting scheduled at 2pm. I shut down at some point and unintentionally hibernated for 30 minutes or so. Missed the clock as I've learned later. Then I kept trying to concentrate. Learned about exponential barycentres. Really cool thing. Basically we're saying that if we got a barycentre, then in its tangent space zero would be a barycentre of induced measure, except that we need to pick a way to define integrals and barycentres. Loveable. These two fields seem really cool to me. Optimization on manifolds and all that optimal transportation+curvature bounds stuff. But I really can't do anything right now. The week before that I've spent trying first to boot the mainline U-boot and kernel, then trying to port mainline U-boot from scratch, based on the vendors' legacy sources, hoping to learn mainline source code's structure and overall design. Yet one sleepless night and I'm incapable again. Then entire week of ever more devastating nights. And now another such week approaches its end. Though these three days promise to be long. I hate how things cluster and overlap one with each other, also always leaving windows of void, so that I have time to look back and look one more time and find it all still runnning right down to hell. A week that got me back into the well from which I had barely escaped by taking off to Petersburg earlier. Also I got to choose where I want to sleep to-night. Last night I stayed at some really terrible hostel. It smelled like kero... Don't want to go to the dorm. Need to wash my clothes though. And, well, everybody left. I feel so grateful to all these people. They keep giving. And I keep distance. Wish I could tell. That evening by Petropavlovsky fortress, the lines of buildings, the bridges, and the wind, the cold whistling wind, the wind blowing that red hair. Them two having to scream sometimes to be heard, them two wearing that facade of disdainful satire, them two ironically mocking me and each other, them two telling stories, them two so authentically and convincingly admiring the place, the moment and the life. For the two days after I returned to Moscow the memory of that evening was the only thing that drove me. I've been occupied with that picture. Trying to recall even some details of it. The memory's fading, and the only thing I remember now is that I have believed back then that I felt a revival and even some taste of life. Now as always, I'm wondering if it was real or if I've deceived myself again. But I have it fixed as a fact that I did have the belief. Which I did not ever have before. That might be the only thing that matters after all. To have a belief. And they helped me to believe. But this cannot work forever. Because I've nothing to give back.

Got 3 minutes to pack the things and get to the metro, if I'm going to take it...

01:06. I'm in the sub. The shut the entrances, but let me and another late girl in through the exits. She seemed very excited about something. And talkative. I wouldn't talk. Nah, not that I wouldn't, but I did not. On the second thought though... This may be the key thing. I wish that I would but I wouldn't. There's no excitement for me. I used to be excited about programming. I went all in and I burned out. Over the course of last five years, I used to be even more excited about math, never escaping from the state of burnout though. Now there's still some fire about math, but not nearly enough to be doing things. 01:11 -- suddenly, Proletarskaya, got to run.

01:15 -- Krestyanskaya Zastava:

"the next station is -- Dubrovka"

So, all I got strength for is analyzing and judging in the background, giving advices people don't ask for, participating in discussions I wasn't invited to. So as to feel less useless. But really doing research? Really trying things out?

And here I come back to that point: I wish I would anything. I "don't would nothing".

Language stuff still gets me curious though. But now that I'm so experienced, I shall not ever learn anything rigorous about linguistics. I better be an amateur, a hobbyist. Be a middle that it never becomes another exhausting routine.

01:27 -- begining to realize that I'm going to the dorm; unsure what for; haven't been there since Monday though; 01:29:30 -- wait, it's Lyublino already...

Also that thing about learning segmentation via classifier and GANs was kind of cool as well.

02:10. I'm in the dorm, ready to shut down. Day's over. Day of contemplating how people come and go. Feelings they show. How they fear. Work. Love. Live. 02:32. To-morrow I'll live one more day just to watch lives happening and to listen to people telling stories.

02:40. Still trying to figure out plans for tomorrow. So damn many things... Not so, so much things Because they're not finite, nor enumerable. It's a continuum. Unbounded, open, dense. And I'm a compact. I'm a-bed.

Suddenly gnome-keyring-d started consuming all of my CPU, reducing estimated battery time from 11 hours to 3 and preventing new shells and htop from starting. Killed it.