Tommorow's too late (Записи о main)https://newkozlukov.gitlab.io/ruContents © 2019 <a href="mailto:newkozlukov@gmail.com">Sergei Kozlukov</a> Fri, 13 Dec 2019 08:25:54 GMTNikola (getnikola.com)http://blogs.law.harvard.edu/tech/rss- Immersions, submersions, embeddingshttps://newkozlukov.gitlab.io/ru/posts/immersions-submersions-embeddings/Sergei Kozlukov<div><p>Some tldr-excerpts from Lee and Spivak
formalizing embeddings and stuff.</p>
<hr>
<p><strong>Topological embedding</strong> -- an <em>injective</em> <em>continuous</em> map <script type="math/tex"> f: A\to X </script>
that is also a <em>homeomorphism onto its image</em> <script type="math/tex"> f(A) </script>.
We can think of <script type="math/tex"> f(A) </script> "as a homeomorphic copy of <script type="math/tex"> A </script> in <script type="math/tex"> X </script>" (Lee, 2011).</p>
<p>A smooth map <script type="math/tex"> F: M\to N </script> is said to have rank <script type="math/tex"> k </script> at <script type="math/tex"> p \in M </script>
if the linear map <script type="math/tex"> {F_*}_{T_p M} </script> ( the pushforward) has rank <script type="math/tex"> k </script>.
<script type="math/tex"> F </script> is of <em>constant rank</em> <script type="math/tex"> k </script> if it is of rank <script type="math/tex"> k </script>
at every point.</p>
<p><strong>Immersion</strong> -- smooth map <script type="math/tex"> F:M \to N </script>
whose pushforward <script type="math/tex"> F_* </script> is injective at every point,
that is <script type="math/tex"> \operatorname{rank} F = \operatorname{dim} M </script>.</p>
<p><strong>Submersion</strong> -- smooth map <script type="math/tex"> F:M \to N </script>
whose pushforward is surjective at every point,
that is <script type="math/tex"> \operatorname{rank} F = \operatorname{dim} N </script>.</p>
<p>(Smooth) <strong>Embedding</strong> (of a manifold) --
an <em>injective immersion</em> <script type="math/tex"> F: M\to N </script> that is also
a <em>topological embedding</em>.</p>
<p>So, a map <script type="math/tex"> F: M\to N </script> is an embedding, if</p>
<ol>
<li>
<script type="math/tex"> \operatorname{rank}F = \operatorname{dim} M </script>,</li>
<li>
<script type="math/tex"> F </script> is injective,</li>
<li>
<script type="math/tex"> F </script> is a homeomorphism onto <script type="math/tex"> F(M) </script> with subspace topology.</li>
</ol></div>https://newkozlukov.gitlab.io/ru/posts/immersions-submersions-embeddings/Tue, 02 Jul 2019 21:01:57 GMT
- Tangents in geoopthttps://newkozlukov.gitlab.io/ru/posts/tangents-in-geoopt/Sergei Kozlukov<p>Write your post here.</p>https://newkozlukov.gitlab.io/ru/posts/tangents-in-geoopt/Tue, 02 Jul 2019 21:01:34 GMT
- Conehttps://newkozlukov.gitlab.io/ru/posts/cone/Sergei Kozlukov<div><p>A cone over topological space <script type="math/tex"> X </script> is the quotient</p>
<p>
<script type="math/tex; mode=display">\left[X{\times}[0,+\infty)\right]/\left[X{\times}\{0\}\right].</script>
</p>
<p>A point <script type="math/tex">a</script> of that cone can be identified with a point <script type="math/tex">x\in X</script>
and the distance <script type="math/tex"> \lvert Ox \rvert </script> to origin (the apex fiber) <script type="math/tex">O = X{\times}\{0\}</script>.</p>
<p>The reason I care about cones is the notion of the <em>tangent cone</em>
of a metric space at a point.</p></div>https://newkozlukov.gitlab.io/ru/posts/cone/Tue, 25 Jun 2019 16:13:05 GMT
- Must needshttps://newkozlukov.gitlab.io/ru/posts/must-needs/Sergei Kozlukov<div><h3>Must needs be</h3>
<p>Whenever I encounter constructs like "<code>It_1</code> must needs be <code>X</code>"
I tend to decompose it</p>
<blockquote>
<p>"<code>it</code> <strong>must</strong> be that <code>it_1</code> needs to be <code>X</code>",</p>
</blockquote>
<p>rather than into</p>
<blockquote>
<p>"it must be <code>X</code>" intensified by an <strong>adverb</strong> "needs",</p>
</blockquote>
<p>which seem to be the consensus.</p>
<h3>Needs must</h3>
<p>There's also archaic <a href="https://en.oxforddictionaries.com/definition/needs_must">"needs must"</a>
in which "needs" seems to be a noun and they (the "needs") actually "must":</p>
<blockquote>
<p>If needs must, I'll do it.</p>
</blockquote>
<p>Finally, there's other "must needs" in which "needs" acts as an amplifier
and some knowledgeable people identify it as an adverb:
</p><div class="figure "><a href="https://newkozlukov.gitlab.io/galleries/en/linguista-sum-must-needs.jpg" class="image-reference"><img src="https://newkozlukov.gitlab.io/galleries/en/linguista-sum-must-needs.thumbnail.jpg"></a>
<a href="https://t.me/linguista_sum/2204">https://t.me/linguista_sum/2204</a>
</div>
<h3>Shall have to</h3>
<p>I also have just encountered a construct "shall have to":</p>
<p></p><div class="figure "><a href="https://newkozlukov.gitlab.io/galleries/en/shall-have-to.jpg" class="image-reference"><img src="https://newkozlukov.gitlab.io/galleries/en/shall-have-to.thumbnail.jpg"></a>
<a href="https://www.pressreader.com/uk/daily-mail/20150321/282561606661928">"I <b>shall have to</b> be careful" from My Dear Bessie: A Love story in letters</a>
</div></div>https://newkozlukov.gitlab.io/ru/posts/must-needs/Sun, 02 Jun 2019 17:30:56 GMT
- Double negationshttps://newkozlukov.gitlab.io/ru/posts/double-negations/Sergei Kozlukov<div><p>It might seem English rather discourages the use of double negations, so that
the following sentences, if at all parseable, are likely to be taken as a sign
of lack of education:</p>
<ol>
<li>
<blockquote>
<p>I haven't got no money.</p>
</blockquote>
</li>
<li>
<blockquote>
<p>I never don't do that.</p>
</blockquote>
</li>
</ol>
<p>The reason these sentences feel smelly is that they contain double negations
which technically cancel each other, so that the sentences above might read:</p>
<ol>
<li>
<blockquote>
<p>I'm not in the state of lack of money</p>
</blockquote>
</li>
<li>
<blockquote>
<p>It never happens that I don't do that (e.g. never happens that I forget to do that)</p>
</blockquote>
</li>
</ol>
<p>But because of low likelihood of the original constructs, one would rather assume
the message contain a mistake.</p>
<p>The second example contrasts with the situation we got in French and Russian,
where we use what might seem like double negatives:</p>
<ol>
<li>
<blockquote>
<p>Я (I) <strong>никогда</strong> (NEVER) <strong>не</strong> (NOT) делаю (do) этого (that)</p>
</blockquote>
</li>
<li>
<blockquote>
<p>Я (I) <strong>никогда</strong> (NEVER) <strong>не</strong> (NOT) курю (smoke).</p>
</blockquote>
</li>
<li>
<blockquote>
<p>Je (I) <strong>ne</strong> fais (not do) <strong>jamais</strong> (NEVER) ca (that)</p>
</blockquote>
</li>
</ol>
<p>Those aren't really double negatives, it's rather that the <strong>scopes of verbs
and negations are propagated differently</strong>,
and actually omitting the "никогда" or "jamais" would lead to a contradiction
in the message. For instance, the sentence</p>
<blockquote>
<p>Я (I) никогда (NEVER) курю (smoke)</p>
</blockquote>
<p>might be interpreted as comprised of claims:</p>
<ol>
<li>I do smoke ("я курю").</li>
<li>The modality of this event, i.e. the answer to the question "how often that happens?" is: "never" ("никогда").</li>
</ol>
<p>The two are in conflict with each other and while one <strong>could</strong> try and use
this construct to deliver the idea of him not smoking, its likelihood is
neglectible.</p>
<p>Now it seems that in Francais (though I don't really understand French yet)
the situation is the same as we say:</p>
<blockquote>
<p>Jamais (NEVER) plus (more) Je <strong>ne</strong> (NOT) Te dirai (will say)</p>
</blockquote>
<p>while the sentence without "ne": </p>
<blockquote>
<p>Jamais plus je te dirai</p>
</blockquote>
<p>does sound contradictive, just as it would in Russian.</p>
<p>To emphasize the difference with English, let's note that we'd rather encode
the message "Je ne fais jamais ca" as</p>
<blockquote>
<p>I <strong>don't</strong> <strong>ever</strong> do that</p>
</blockquote>
<p>Which can be decomposed to</p>
<ol>
<li>
<blockquote>
<p>I don't do that</p>
</blockquote>
</li>
<li>
<blockquote>
<p>My behaviour is consistent, i.e. I <strong>always</strong> ("ever") choose the policy "not do that"</p>
</blockquote>
</li>
</ol>
<p>My friend has given me a hint this might be coming from Latin in which both
Russian and Francais have roots.</p>
<h3>Double negatives in English</h3>
<p>It wouldn't be true, however, to say that two negating terms cannot occur in
one sentence. First and trivial, there are "Niggish" constructions like</p>
<blockquote>
<p>"I <strong>ain't</strong> got <strong>no</strong> money",</p>
</blockquote>
<p>which sound rather natural.</p>
<p>However, the case that got me curious is the use of "either" which I consider a
"negating term". So, a perfectly valid example of two negating terms going in
a row in English can be seen in:</p>
<blockquote>
<p>-- I'm not a linguist.</p>
<p>-- Me neither!</p>
</blockquote>
<p>Moreover, one can notice that it takes an effort to put a non-negative term
in place of "either" and the following sequence </p>
<blockquote>
<p>-- I'm not a linguist.</p>
<p>-- Me too.</p>
</blockquote></div>https://newkozlukov.gitlab.io/ru/posts/double-negations/Sun, 02 Jun 2019 07:14:50 GMT
- Further procrastinationhttps://newkozlukov.gitlab.io/ru/posts/further-procrastination/Sergei Kozlukov<ul class="simple">
<li><p>Just learned Sussman (author of SICP) also authored a monograph on <a class="reference external" href="https://oapen.org/download?type=document&docid=1004028">differential geometry</a></p></li>
<li><p>And on <a class="reference external" href="https://www.fisica.net/mecanicaclassica/struture_and_interpretation_of_classical_mechanics_by_gerald_jay_sussman.pdf">classical mechanics</a></p></li>
<li><p>Moreover, the former followes the concept of <a class="reference external" href="https://booksdescr.com/item/detail/id/5c63f81a50b42539789a587e">Turtle Geomtry</a>
and states in its Prologue the approach I admired most since my early childhood: learning things by programming them,
thus forcing oneself to be precise and exact in judgements and claims. I'm recalling right now
again that first "lecture" on elementary notions of set theory the summer before admission to VSU...
Constructing function as a set so it becomes more "tangible" an object.
The Katharsis that followed.
I didn't realize back then that it's same as in programming.
Five years I've been living with guilt and shame that I started as a coder
and not a Mathematician. Five years I felt programming is disgusting and despisable thing to do.
And only now I truly realize that the thing I loved about it in those first years
is the same thing I've fallen in love with Mathematics for that summer of 2014.</p></li>
<li><p>Also stumbled upon a tweet mentioning the following
<a class="reference external" href="https://books.google.ru/books?id=VsK_31_j0XgC&lpg=PA245&dq=intuitive%20interpretation%20of%20the%20laplacian%2C%20farlow&pg=PA246#v=onepage&q=intuitive%20interpretation%20of%20the%20laplacian,%20farlow&f=false">interpretation of Laplace operator</a> as measuring average sign of a function around the point. Sort of trivial, and resembles how we derive sufficient min/max conditions, yet I did not notice.</p></li>
<li><p>Majority of these I found in: <a class="reference external" href="https://colab.research.google.com/github/google/jax/blob/master/notebooks/autodiff_cookbook.ipynb">JAX cookbook</a></p></li>
<li><p>Update! Accidentally found <a class="reference external" href="https://perso.uclouvain.be/pa.absil/Talks/ICIAM070717_oom_05.pdf">these slides by Absil</a>
giving some historical propspect on the subject</p></li>
<li><p>For instance, the slides mention <a class="reference external" href="https://eng.uok.ac.ir/mfathi/Courses/Advanced%20Eng%20Math/Linear%20and%20Nonlinear%20Programming.pdf">Luenberger (1973)</a> stating that "we'd perform line search along geodesics... if'twere feasible". Now we're closer to the roots of the whole thing</p></li>
</ul>https://newkozlukov.gitlab.io/ru/posts/further-procrastination/Fri, 05 Apr 2019 13:24:05 GMT
- MD is not RSGD, but RSGD also does M from MDhttps://newkozlukov.gitlab.io/ru/posts/mirror-descent-is-not-log-update-exp/Sergei Kozlukov<div><p>The whole idea of trying to parallel mirror descent
with following geodesics as in RSGD has come to naught.
And not the way one would expect, because MD still seems
"type-correct" and RSGD doesn't yet.
Long story short: in RSGD we're pulling back COTANGENTS
but updating along a TANGENT.</p>
<hr class="docutils">
<p>Update! Before updating, we're <strong>raising an index</strong> of cotangent
by applying inverse metric tensor,
thus making it a tangent! Thanks to <cite>@ferrine</cite> for the idea.</p>
<p>Following <span class="math">\(\mathbb{R}^m\to\mathbb{R}^n\)</span> analogy of previous posts:</p>
<div class="math">
\begin{equation*}
F:M\to N,
\end{equation*}
</div>
<div class="math">
\begin{equation*}
\xi = F^*\eta\in\mathcal{T}^*M,~\text{for}~\eta\in\mathcal{T}^*N,
\end{equation*}
</div>
<div class="math">
\begin{equation*}
X = \xi^\sharp = g^{-1}(\xi) = g^{-1} \xi^\top~\text{so it becomes a column}.
\end{equation*}
</div></div>https://newkozlukov.gitlab.io/ru/posts/mirror-descent-is-not-log-update-exp/Fri, 05 Apr 2019 08:56:20 GMT
- Pullbackshttps://newkozlukov.gitlab.io/ru/posts/pullbacks/Sergei Kozlukov<div><p>Going on with trivialities.
I still haven't finished my story with autodiff,
and since I've got to code it for DL homework...
Well.</p>
<p>So, we consider <span class="math">\(F:M{=}\mathbb{R}^m\to N{=}\mathbb{R}^n\)</span>.
Tangent spaces look exactly like original spaces except
for where they go in multiplication:
for <span class="math">\(g:N\to\mathbb{R}\)</span>
a tangent vector <span class="math">\(Y\in\mathcal{T}N\)</span>
is basically <span class="math">\(\mathbb{R}^n\)</span> and
acts by <span class="math">\(Y(g) = \langle \nabla_{F(p)} g, Y\rangle\)</span>.</p>
<p>Pushforward <span class="math">\(F_*:\mathcal{T}M\to \mathcal{T}N\)</span>
is defined by
<span class="math">\((F_* X)(g) = X(g\circ F)\)</span>
for <span class="math">\(X\in\mathcal{T} M \sim \mathbb{R}^m\)</span>.
Thus</p>
<div class="math">
\begin{equation*}
\begin{split}
(F_* X)(g) &= X(g\circ F) = \langle \nabla_{p} g\circ F, x\rangle \\
&= \left\langle ( \left. DF \right|_p \left.Dg\right|_{F(p)} )^\top, x\right\rangle \\
&= \langle {\underbrace{J_p(F)}_{\left. DF \right|_p}}^\top \underbrace{\nabla_{F(p)}}_{\left.Dg\right|_{F(p)}^\top} g, x \rangle \\
&= \left\langle \nabla_{F(p)}g, J_p(F)x \right\rangle
.
\end{split}
\end{equation*}
</div>
<p>Here we use <span class="math">\(D\)</span> to denote Fr'echet derivatives (a linear map)
and nabla to denote gradient (a vector -- the Riescz representation of that linear map)
and we also identify linear map <span class="math">\(DF\)</span> with Jacobian matrix <span class="math">\(J(F)\)</span>.
Also I denote <span class="math">\(X\)</span> casted to <span class="math">\(\mathbb{R}^m\)</span> as just <span class="math">\(x\)</span>.
I don't like how I'm already using too many different notations
(after all, that's what I scorn differential geometers at for)
but at the moment it seems fit.</p>
<p>So, basically the equation above means that in Euclidean case
pushforward <span class="math">\(F_*\)</span> acts on tangent <span class="math">\(X\)</span>
merely by multiplying with Jacobian <span class="math">\(J_p(F)\)</span>.
In terms of matrix multiplication, <span class="math">\(F_* X\)</span> is just <span class="math">\(J_p(F) x\in\mathbb{R}^n\)</span></p>
<p>Further, pullback <span class="math">\(F^*\)</span> by definition maps
right cotangent <span class="math">\(\xi\in\mathcal{T}_{F(p)}^*N\)</span>
into left cotangent <span class="math">\(F^*\xi\in\mathcal{T}_p M\)</span>
which acts as: <span class="math">\((F^*\xi) X = \xi(F_*X)\)</span>.</p>
<div class="math">
\begin{equation*}
\begin{split}
(F^*\xi) X &= \xi(F_*X)\\
&= \xi^\top J_p(F)x
.
\end{split}
\end{equation*}
</div>
<p>That is, pullbacked cotangent is just <span class="math">\(\xi^\top J_p(F)\in (\mathbb{R}^m)^*\)</span> (acting on <span class="math">\(x\)</span> from the left)
and pullback <span class="math">\(F^*\)</span> itself is still the same <span class="math">\(J_p(F)\)</span>
except acting on cotangents from the left.
It is equivalent to say that pullback acts on <span class="math">\(\operatorname{column}(\xi)\)</span>
as transposed Jacobian <span class="math">\(J_p(F)^\top\)</span>:</p>
<div class="math">
\begin{equation*}
\operatorname{column}(F^*\xi) = J_p(F)^\top \operatorname{column}(\xi).
\end{equation*}
</div>
<hr class="docutils">
<p>Now, why pulling back in gradient descent?
Take <span class="math">\(F:\mathbb{R}^n \to\mathbb{R}\)</span>.
Right cotangent is just a number <span class="math">\(\alpha\)</span>.
Left cotangent would be <span class="math">\(\alpha J_p(F)\)</span>.
It is such that</p>
<div class="math">
\begin{equation*}
F^*\alpha X = \alpha(F_* X) = \alpha J_p(F)X
\end{equation*}
</div>
<hr class="docutils">
<p>What happens when we pull, transpose, and then push?</p></div>https://newkozlukov.gitlab.io/ru/posts/pullbacks/Thu, 04 Apr 2019 21:06:54 GMT
- Dual to dualhttps://newkozlukov.gitlab.io/ru/posts/dual-to-dual/Sergei Kozlukov<div><p>Just realized how double dual isn't really the original space
even in simple Euclidean case (which I was aware of,
but some how didn't feel I was understanding):
with vector being a column <span class="math">\(x\in\mathbb{R}^n\)</span>,
dual a row <span class="math">\(x^\top\in(\mathbb{R}^n)^*\)</span>,
the double dual <span class="math">\(x^{\top\top}\)</span> is indeed a column,
except when it acts on the rows in multiplication
it goes TO THE RIGHT and not to the left:</p>
<div class="math">
\begin{equation*}
\begin{split}
x^\top(y) &= x^\top y,\\
x^{\top\top}(y^{\top}) &= y^\top x^{\top\top} = y^\top x.
\end{split}
\end{equation*}
</div>
<p>This contrasts with rows acting on columns,
where the dual (the row) acts from the left.</p>
<p>So exciting.</p>
<hr class="docutils">
<p>Update! It's much more than that!
If we treat <span class="math">\(\mathbb{R}^n\)</span> as a manifold,
it turns out then that its tangent space
looks more like double dual <span class="math">\((\mathbb{R}^n)^{**}\)</span>
rather than <span class="math">\(\mathbb{R}^n\)</span> or <span class="math">\((\mathbb{R}^n)^*\)</span>,
because when we consider a tangent vector acting
on scalar functions <span class="math">\(\mathbb{R}^n\to\mathbb{R}\)</span>
in the special -- linear -- case,
the tangent goes on the right
and it does so as a column.
Take a scalar function <span class="math">\((\mathbb{R}^n)^* \ni a^\top : \mathbb{R}^n \to \mathbb{R}\)</span>.
Then a tangent <span class="math">\(X\in\mathcal{T}\mathbb{R}^n\)</span> should act on <span class="math">\(a\)</span> from the right:</p>
<div class="math">
\begin{equation*}
X(a) = \left.\partial_t(t\mapsto a(x) + a^\top t X)\right|_{t=0} = a^\top X.
\end{equation*}
</div>
<p>Here <span class="math">\(x\in\mathbb{R}^n\)</span> denotes <span class="math">\(X\in\mathcal{T}\mathbb{R}^n\)</span> casted to <span class="math">\(\mathbb{R}^n\)</span></p></div>https://newkozlukov.gitlab.io/ru/posts/dual-to-dual/Thu, 04 Apr 2019 16:26:33 GMT
- SKDL19L2 take-outshttps://newkozlukov.gitlab.io/ru/posts/skdl19l2-take-outs/Sergei Kozlukov<ul class="simple">
<li><p>Dropout is similar to ensembling</p></li>
<li><p>Conjugate gradients are akin to momentum, mixing new direction with previous
ones. That is quite different from my previous intuition that we "always
move in new direction" which was correct... up to the metric.</p></li>
<li><p>Nesterov is about two things:</p>
<ul>
<li><p>It is better to use gradient at the destination point than the gradient at
the origin</p></li>
<li><p>With large momentum, we got a meaningful enough estimate of where we end up
to compute that gradient</p></li>
</ul>
</li>
<li><p>Mentioned <a class="reference external" href="https://distill.pub/2017/momentum/">this blog post again</a></p></li>
<li><p>Again this bullshit about SGD being "incorrect".
It's not that SGD is not correct, it's that you define it the wrong way
(being aware that it should be done in different way)
and omit the explicit isomorphism.</p></li>
</ul>https://newkozlukov.gitlab.io/ru/posts/skdl19l2-take-outs/Thu, 28 Mar 2019 08:07:52 GMT