Some books and papers that I like or need to read. This is occasionally updated, though not very often. I'm hoping to build out this page more over time. Maybe I'll add some small notes as well.
This paper builds off of the above "Mad Max" paper by
visualizing the geometry that is explored in that paper. I had
an idea after reading "Mad Max" that motivated me to start
implementing a similar visualization. Once I got stuck and was
doing some research online to help my problems, I discovered
this paper!
This paper builds off of
Evan Miller's Attention is Off By One blog post
which I think is a great (dare I say mathematically "morally
correct") interpretation of the deficiencies of using Softmax in
Attention. I ended up having some questions about Table 3. in
that paper, which benchmarks Evan Miller's Softmax variant, and
so I posted an issue
on the Github page for their paper.