Tommorow's too late (Posts about math)https://newkozlukov.gitlab.io/enContents © 2019 <a href="mailto:newkozlukov@gmail.com">Sergei Kozlukov</a> Fri, 13 Dec 2019 08:25:55 GMTNikola (getnikola.com)http://blogs.law.harvard.edu/tech/rss- Immersions, submersions, embeddingshttps://newkozlukov.gitlab.io/posts/immersions-submersions-embeddings/Sergei Kozlukov<div><p>Some tldr-excerpts from Lee and Spivak
formalizing embeddings and stuff.</p>
<hr>
<p><strong>Topological embedding</strong> -- an <em>injective</em> <em>continuous</em> map <script type="math/tex"> f: A\to X </script>
that is also a <em>homeomorphism onto its image</em> <script type="math/tex"> f(A) </script>.
We can think of <script type="math/tex"> f(A) </script> "as a homeomorphic copy of <script type="math/tex"> A </script> in <script type="math/tex"> X </script>" (Lee, 2011).</p>
<p>A smooth map <script type="math/tex"> F: M\to N </script> is said to have rank <script type="math/tex"> k </script> at <script type="math/tex"> p \in M </script>
if the linear map <script type="math/tex"> {F_*}_{T_p M} </script> ( the pushforward) has rank <script type="math/tex"> k </script>.
<script type="math/tex"> F </script> is of <em>constant rank</em> <script type="math/tex"> k </script> if it is of rank <script type="math/tex"> k </script>
at every point.</p>
<p><strong>Immersion</strong> -- smooth map <script type="math/tex"> F:M \to N </script>
whose pushforward <script type="math/tex"> F_* </script> is injective at every point,
that is <script type="math/tex"> \operatorname{rank} F = \operatorname{dim} M </script>.</p>
<p><strong>Submersion</strong> -- smooth map <script type="math/tex"> F:M \to N </script>
whose pushforward is surjective at every point,
that is <script type="math/tex"> \operatorname{rank} F = \operatorname{dim} N </script>.</p>
<p>(Smooth) <strong>Embedding</strong> (of a manifold) --
an <em>injective immersion</em> <script type="math/tex"> F: M\to N </script> that is also
a <em>topological embedding</em>.</p>
<p>So, a map <script type="math/tex"> F: M\to N </script> is an embedding, if</p>
<ol>
<li>
<script type="math/tex"> \operatorname{rank}F = \operatorname{dim} M </script>,</li>
<li>
<script type="math/tex"> F </script> is injective,</li>
<li>
<script type="math/tex"> F </script> is a homeomorphism onto <script type="math/tex"> F(M) </script> with subspace topology.</li>
</ol></div>https://newkozlukov.gitlab.io/posts/immersions-submersions-embeddings/Tue, 02 Jul 2019 21:01:57 GMT
- Tangents in geoopthttps://newkozlukov.gitlab.io/posts/tangents-in-geoopt/Sergei Kozlukov<p>Write your post here.</p>https://newkozlukov.gitlab.io/posts/tangents-in-geoopt/Tue, 02 Jul 2019 21:01:34 GMT
- Conehttps://newkozlukov.gitlab.io/posts/cone/Sergei Kozlukov<div><p>A cone over topological space <script type="math/tex"> X </script> is the quotient</p>
<p>
<script type="math/tex; mode=display">\left[X{\times}[0,+\infty)\right]/\left[X{\times}\{0\}\right].</script>
</p>
<p>A point <script type="math/tex">a</script> of that cone can be identified with a point <script type="math/tex">x\in X</script>
and the distance <script type="math/tex"> \lvert Ox \rvert </script> to origin (the apex fiber) <script type="math/tex">O = X{\times}\{0\}</script>.</p>
<p>The reason I care about cones is the notion of the <em>tangent cone</em>
of a metric space at a point.</p></div>https://newkozlukov.gitlab.io/posts/cone/Tue, 25 Jun 2019 16:13:05 GMT
- Further procrastinationhttps://newkozlukov.gitlab.io/posts/further-procrastination/Sergei Kozlukov<ul class="simple">
<li><p>Just learned Sussman (author of SICP) also authored a monograph on <a class="reference external" href="https://oapen.org/download?type=document&docid=1004028">differential geometry</a></p></li>
<li><p>And on <a class="reference external" href="https://www.fisica.net/mecanicaclassica/struture_and_interpretation_of_classical_mechanics_by_gerald_jay_sussman.pdf">classical mechanics</a></p></li>
<li><p>Moreover, the former followes the concept of <a class="reference external" href="https://booksdescr.com/item/detail/id/5c63f81a50b42539789a587e">Turtle Geomtry</a>
and states in its Prologue the approach I admired most since my early childhood: learning things by programming them,
thus forcing oneself to be precise and exact in judgements and claims. I'm recalling right now
again that first "lecture" on elementary notions of set theory the summer before admission to VSU...
Constructing function as a set so it becomes more "tangible" an object.
The Katharsis that followed.
I didn't realize back then that it's same as in programming.
Five years I've been living with guilt and shame that I started as a coder
and not a Mathematician. Five years I felt programming is disgusting and despisable thing to do.
And only now I truly realize that the thing I loved about it in those first years
is the same thing I've fallen in love with Mathematics for that summer of 2014.</p></li>
<li><p>Also stumbled upon a tweet mentioning the following
<a class="reference external" href="https://books.google.ru/books?id=VsK_31_j0XgC&lpg=PA245&dq=intuitive%20interpretation%20of%20the%20laplacian%2C%20farlow&pg=PA246#v=onepage&q=intuitive%20interpretation%20of%20the%20laplacian,%20farlow&f=false">interpretation of Laplace operator</a> as measuring average sign of a function around the point. Sort of trivial, and resembles how we derive sufficient min/max conditions, yet I did not notice.</p></li>
<li><p>Majority of these I found in: <a class="reference external" href="https://colab.research.google.com/github/google/jax/blob/master/notebooks/autodiff_cookbook.ipynb">JAX cookbook</a></p></li>
<li><p>Update! Accidentally found <a class="reference external" href="https://perso.uclouvain.be/pa.absil/Talks/ICIAM070717_oom_05.pdf">these slides by Absil</a>
giving some historical propspect on the subject</p></li>
<li><p>For instance, the slides mention <a class="reference external" href="https://eng.uok.ac.ir/mfathi/Courses/Advanced%20Eng%20Math/Linear%20and%20Nonlinear%20Programming.pdf">Luenberger (1973)</a> stating that "we'd perform line search along geodesics... if'twere feasible". Now we're closer to the roots of the whole thing</p></li>
</ul>https://newkozlukov.gitlab.io/posts/further-procrastination/Fri, 05 Apr 2019 13:24:05 GMT
- MD is not RSGD, but RSGD also does M from MDhttps://newkozlukov.gitlab.io/posts/mirror-descent-is-not-log-update-exp/Sergei Kozlukov<div><p>The whole idea of trying to parallel mirror descent
with following geodesics as in RSGD has come to naught.
And not the way one would expect, because MD still seems
"type-correct" and RSGD doesn't yet.
Long story short: in RSGD we're pulling back COTANGENTS
but updating along a TANGENT.</p>
<hr class="docutils">
<p>Update! Before updating, we're <strong>raising an index</strong> of cotangent
by applying inverse metric tensor,
thus making it a tangent! Thanks to <cite>@ferrine</cite> for the idea.</p>
<p>Following <span class="math">\(\mathbb{R}^m\to\mathbb{R}^n\)</span> analogy of previous posts:</p>
<div class="math">
\begin{equation*}
F:M\to N,
\end{equation*}
</div>
<div class="math">
\begin{equation*}
\xi = F^*\eta\in\mathcal{T}^*M,~\text{for}~\eta\in\mathcal{T}^*N,
\end{equation*}
</div>
<div class="math">
\begin{equation*}
X = \xi^\sharp = g^{-1}(\xi) = g^{-1} \xi^\top~\text{so it becomes a column}.
\end{equation*}
</div></div>https://newkozlukov.gitlab.io/posts/mirror-descent-is-not-log-update-exp/Fri, 05 Apr 2019 08:56:20 GMT
- Pullbackshttps://newkozlukov.gitlab.io/posts/pullbacks/Sergei Kozlukov<div><p>Going on with trivialities.
I still haven't finished my story with autodiff,
and since I've got to code it for DL homework...
Well.</p>
<p>So, we consider <span class="math">\(F:M{=}\mathbb{R}^m\to N{=}\mathbb{R}^n\)</span>.
Tangent spaces look exactly like original spaces except
for where they go in multiplication:
for <span class="math">\(g:N\to\mathbb{R}\)</span>
a tangent vector <span class="math">\(Y\in\mathcal{T}N\)</span>
is basically <span class="math">\(\mathbb{R}^n\)</span> and
acts by <span class="math">\(Y(g) = \langle \nabla_{F(p)} g, Y\rangle\)</span>.</p>
<p>Pushforward <span class="math">\(F_*:\mathcal{T}M\to \mathcal{T}N\)</span>
is defined by
<span class="math">\((F_* X)(g) = X(g\circ F)\)</span>
for <span class="math">\(X\in\mathcal{T} M \sim \mathbb{R}^m\)</span>.
Thus</p>
<div class="math">
\begin{equation*}
\begin{split}
(F_* X)(g) &= X(g\circ F) = \langle \nabla_{p} g\circ F, x\rangle \\
&= \left\langle ( \left. DF \right|_p \left.Dg\right|_{F(p)} )^\top, x\right\rangle \\
&= \langle {\underbrace{J_p(F)}_{\left. DF \right|_p}}^\top \underbrace{\nabla_{F(p)}}_{\left.Dg\right|_{F(p)}^\top} g, x \rangle \\
&= \left\langle \nabla_{F(p)}g, J_p(F)x \right\rangle
.
\end{split}
\end{equation*}
</div>
<p>Here we use <span class="math">\(D\)</span> to denote Fr'echet derivatives (a linear map)
and nabla to denote gradient (a vector -- the Riescz representation of that linear map)
and we also identify linear map <span class="math">\(DF\)</span> with Jacobian matrix <span class="math">\(J(F)\)</span>.
Also I denote <span class="math">\(X\)</span> casted to <span class="math">\(\mathbb{R}^m\)</span> as just <span class="math">\(x\)</span>.
I don't like how I'm already using too many different notations
(after all, that's what I scorn differential geometers at for)
but at the moment it seems fit.</p>
<p>So, basically the equation above means that in Euclidean case
pushforward <span class="math">\(F_*\)</span> acts on tangent <span class="math">\(X\)</span>
merely by multiplying with Jacobian <span class="math">\(J_p(F)\)</span>.
In terms of matrix multiplication, <span class="math">\(F_* X\)</span> is just <span class="math">\(J_p(F) x\in\mathbb{R}^n\)</span></p>
<p>Further, pullback <span class="math">\(F^*\)</span> by definition maps
right cotangent <span class="math">\(\xi\in\mathcal{T}_{F(p)}^*N\)</span>
into left cotangent <span class="math">\(F^*\xi\in\mathcal{T}_p M\)</span>
which acts as: <span class="math">\((F^*\xi) X = \xi(F_*X)\)</span>.</p>
<div class="math">
\begin{equation*}
\begin{split}
(F^*\xi) X &= \xi(F_*X)\\
&= \xi^\top J_p(F)x
.
\end{split}
\end{equation*}
</div>
<p>That is, pullbacked cotangent is just <span class="math">\(\xi^\top J_p(F)\in (\mathbb{R}^m)^*\)</span> (acting on <span class="math">\(x\)</span> from the left)
and pullback <span class="math">\(F^*\)</span> itself is still the same <span class="math">\(J_p(F)\)</span>
except acting on cotangents from the left.
It is equivalent to say that pullback acts on <span class="math">\(\operatorname{column}(\xi)\)</span>
as transposed Jacobian <span class="math">\(J_p(F)^\top\)</span>:</p>
<div class="math">
\begin{equation*}
\operatorname{column}(F^*\xi) = J_p(F)^\top \operatorname{column}(\xi).
\end{equation*}
</div>
<hr class="docutils">
<p>Now, why pulling back in gradient descent?
Take <span class="math">\(F:\mathbb{R}^n \to\mathbb{R}\)</span>.
Right cotangent is just a number <span class="math">\(\alpha\)</span>.
Left cotangent would be <span class="math">\(\alpha J_p(F)\)</span>.
It is such that</p>
<div class="math">
\begin{equation*}
F^*\alpha X = \alpha(F_* X) = \alpha J_p(F)X
\end{equation*}
</div>
<hr class="docutils">
<p>What happens when we pull, transpose, and then push?</p></div>https://newkozlukov.gitlab.io/posts/pullbacks/Thu, 04 Apr 2019 21:06:54 GMT
- Dual to dualhttps://newkozlukov.gitlab.io/posts/dual-to-dual/Sergei Kozlukov<div><p>Just realized how double dual isn't really the original space
even in simple Euclidean case (which I was aware of,
but some how didn't feel I was understanding):
with vector being a column <span class="math">\(x\in\mathbb{R}^n\)</span>,
dual a row <span class="math">\(x^\top\in(\mathbb{R}^n)^*\)</span>,
the double dual <span class="math">\(x^{\top\top}\)</span> is indeed a column,
except when it acts on the rows in multiplication
it goes TO THE RIGHT and not to the left:</p>
<div class="math">
\begin{equation*}
\begin{split}
x^\top(y) &= x^\top y,\\
x^{\top\top}(y^{\top}) &= y^\top x^{\top\top} = y^\top x.
\end{split}
\end{equation*}
</div>
<p>This contrasts with rows acting on columns,
where the dual (the row) acts from the left.</p>
<p>So exciting.</p>
<hr class="docutils">
<p>Update! It's much more than that!
If we treat <span class="math">\(\mathbb{R}^n\)</span> as a manifold,
it turns out then that its tangent space
looks more like double dual <span class="math">\((\mathbb{R}^n)^{**}\)</span>
rather than <span class="math">\(\mathbb{R}^n\)</span> or <span class="math">\((\mathbb{R}^n)^*\)</span>,
because when we consider a tangent vector acting
on scalar functions <span class="math">\(\mathbb{R}^n\to\mathbb{R}\)</span>
in the special -- linear -- case,
the tangent goes on the right
and it does so as a column.
Take a scalar function <span class="math">\((\mathbb{R}^n)^* \ni a^\top : \mathbb{R}^n \to \mathbb{R}\)</span>.
Then a tangent <span class="math">\(X\in\mathcal{T}\mathbb{R}^n\)</span> should act on <span class="math">\(a\)</span> from the right:</p>
<div class="math">
\begin{equation*}
X(a) = \left.\partial_t(t\mapsto a(x) + a^\top t X)\right|_{t=0} = a^\top X.
\end{equation*}
</div>
<p>Here <span class="math">\(x\in\mathbb{R}^n\)</span> denotes <span class="math">\(X\in\mathcal{T}\mathbb{R}^n\)</span> casted to <span class="math">\(\mathbb{R}^n\)</span></p></div>https://newkozlukov.gitlab.io/posts/dual-to-dual/Thu, 04 Apr 2019 16:26:33 GMT
- SKDL19L2 take-outshttps://newkozlukov.gitlab.io/posts/skdl19l2-take-outs/Sergei Kozlukov<ul class="simple">
<li><p>Dropout is similar to ensembling</p></li>
<li><p>Conjugate gradients are akin to momentum, mixing new direction with previous
ones. That is quite different from my previous intuition that we "always
move in new direction" which was correct... up to the metric.</p></li>
<li><p>Nesterov is about two things:</p>
<ul>
<li><p>It is better to use gradient at the destination point than the gradient at
the origin</p></li>
<li><p>With large momentum, we got a meaningful enough estimate of where we end up
to compute that gradient</p></li>
</ul>
</li>
<li><p>Mentioned <a class="reference external" href="https://distill.pub/2017/momentum/">this blog post again</a></p></li>
<li><p>Again this bullshit about SGD being "incorrect".
It's not that SGD is not correct, it's that you define it the wrong way
(being aware that it should be done in different way)
and omit the explicit isomorphism.</p></li>
</ul>https://newkozlukov.gitlab.io/posts/skdl19l2-take-outs/Thu, 28 Mar 2019 08:07:52 GMT
- Betancourt: higher-order autodiffhttps://newkozlukov.gitlab.io/posts/betancourt-higher-order-autodiff/Sergei Kozlukov<div><p>Just stumbled upon an open tab with
Betancourt's <a href="https://arxiv.org/pdf/1812.11592.pdf">"Geometric theory of higher order automatic differentiation"</a>
which I started reading the new year night but quickly got distracted from.
I remember my first feeling was that it's slightly more verbose
than actually needed and that I'd have used some different wordings.
While this might be true, I'm finding the survey in its intro very clear and explaining.
I must remind my imaginary interlocutor that I have not as of yet
went through any course of differential geometry, only skimmed some textbooks and wikis.
I've been really struggling. Mainly because of terrible notation and language
established by physicists, I believe, and employed in most classic texts.
There are exceptions of course.
Some of seemingly good texts include e.g. works of Lee.
I think I would've solved my problems if I read carefully Lee's monographs
and walked through exercises therein. I'm not yet ready to make this effort
(not ready to do anything at all).</p>
<p>As for higher-order differential geometric structures,
I have only encountered jets when reading Arnold's "lectures on PDEs",
where they were in fact treated (if my memory doesn't deceive me)
as "things" that appear in Taylor expansions, without actually specifying
their "datatypes". </p>
<p>Now, here's what I actually wanted to remember when I started typing this note:
a good survey rapidly introducing principal concepts before verbose and detailed body of a text
makes a lot of difference. Lee, say, pours on you quite an amount of information
that makes use of terms that haven't been yet made concrete. And that information
might give you a good intuition if you already got some very basic framework of concepts
and notations to add new nodes and connections to. Betancourt on the other hand
throws "pullbacks" and "pushforwards" at you in the very beginning. He throws them
pretty concretized, almost tangible, in the sense that he defines domains
and ranges of the mappings, and essential properties of their actions.
He doesn't spend too much time on it, doesn't overload the reader
with questions like existence. These questions are important later for rigorous analysis,
but not for sketching the initial map of the field, not for initial understanding
of connections to other concepts.</p>
<p>Brain is a terribly lazy thing. At least mine is. The art of writing
(when the purpose of writing is to explain a subject and convey a message)
is in hacking reader's brain so as to leave it no chance to object
comprehending the message. That's an obvious thing that I always knew too.
Yet I tend to forget this when actually writing.</p></div>readinghttps://newkozlukov.gitlab.io/posts/betancourt-higher-order-autodiff/Thu, 03 Jan 2019 12:55:15 GMT
- Eric Moulines' mini-coursehttps://newkozlukov.gitlab.io/posts/eric-moulines-mini-course/Sergei Kozlukov<div><p>I was a fool enough to miss most of Eric's course.
Well, just one lecture out of two, technically. But still.
Though it wasn't much of my fault either:
that Saturday had been reserved long time ago.</p>
<p>So, I listened to his talk last Wednesday
which wasn't exactly impressive, then missed
the first lecture on MC, and today
finally arrived for his last lecture
which was really cool.
He's a great reader!</p>
<p>Now, I'm to remind you that I don't know
much about measures and probabilities, it's not my field.
So most of the stuff that I found cool
might in fact be quite trivial.
But here are things that were new to me and seemed cool:</p>
<ul>
<li>Hamiltonian paths are closed. For this reason
when doing HMC we got to be taking random lengths
lest our iterations end up just where they begin</li>
<li>Total Variation is actually a norm on measures
which is dual to <script type="math/tex"> \infty </script>-norm on functions!
Really lovely.
And quite useful, because I couldn't persuade myself
to try and <em>really read</em> the definition of TV for along time now.
You know, analytic definitions always make you feel
disrespected -- it is as if the author didn't care enough
to pre-process his idea into a form convenient enough
to be taken as an apparent geometrical truth.
<script type="math/tex; mode=display"> \|\xi\|_{\mathrm{TV}} = \sup_{\|f\|_\infty \leq 1} \xi(f), </script>
<script type="math/tex; mode=display"> \|\xi\|_{\mathrm{TV}} = \xi_+(\mathbb{X}) + \xi_{-}(\mathbb{X}). </script>
</li>
<li>There exists a coupling formulation for TV:
For measures <script type="math/tex"> \xi, \xi' \in \mathbb{M}_1(\mathbb{X}, \mathcal{X}) </script>
there exist <script type="math/tex"> \mathbb{X} </script>-valued random variables
(on some prob. space) such that
<script type="math/tex; mode=display"> \mathrm{d}_{\mathrm{TV}}(\xi, \xi') = \mathrm{Pr}(X\neq X'). </script>
</li>
<li>We've also covered Dobrushin coefficient.</li>
<li>We've even applied Banach's fixed-point theorem!
Felt like I was home. Not all this stochastic-differential-rubbish
that they don't care to define signatures of!</li>
</ul></div>https://newkozlukov.gitlab.io/posts/eric-moulines-mini-course/Wed, 21 Nov 2018 13:18:26 GMT