My Bookshelf

Some books and papers that I enjoy or want to read in the future. This page is only occasionally updated, and will forever be out-of-date. For now, it can hopefully give you some insight into my interests and give some better visibilities to authors that inspire me.

Papers I've Enjoyed

Aside: This section is the most chronically out-of-date, since I try to read at least one paper per day and narrowing down favorites retrospectively is a large task. I've written some code to fetch and parse data on what I've read via Zotero, however I have yet to decide how to organize and display this data to you, dear reader, such that it isn't an overwhelming hose of information.

Regardless, these papers have played a large role in shaping me and still deserve a place here.

Schlag, Imanol, Kazuki Irie, and Jürgen Schmidhuber. "Linear transformers are secretly fast weight programmers." International Conference on Machine Learning. PMLR, 2021.
Von Oswald, Johannes, et al. "Transformers learn in-context by gradient descent." International Conference on Machine Learning. PMLR, 2023.
Balestriero, Randall, and Richard Baraniuk. "Mad max: Affine spline insights into deep learning." arXiv preprint arXiv:1805.06576, 2018.
Little, W.A. "The existence of persistent states in the brain." Mathematical Biosciences, 1974.
Humayun, Ahmed Imtiaz, et al. "Splinecam: Exact visualization and characterization of deep network geometry and decision boundaries." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition., 2023.

Some Notes

This paper builds off of the above "Mad Max" paper by visualizing the geometry that is explored in that paper. I had an idea after reading "Mad Max" that motivated me to start implementing a similar visualization. Once I got stuck and was doing some research online to help my problems, I discovered this paper!
Ilyas, Andrew, et al. "Adversarial examples are not bugs, they are features." Advances in neural information processing systems, 2019.
Xiao, Guangxuan, et al. "Efficient Streaming Language Models with Attention Sinks." arXiv preprint arXiv:2309.17453, 2023.

Some Notes

This paper builds off of Evan Miller's Attention is Off By One blog post which I think is a great (dare I say mathematically "morally correct") interpretation of the deficiencies of using Softmax in Attention. I ended up having some questions about Table 3. in that paper, which benchmarks Evan Miller's Softmax variant, and so I posted an issue on the Github page for their paper.

Books

You can also find me on Goodreads, which tends to be more up to date.

Books I Like

The Art of Doing Science and Engineering: Learning to Learn by Richard Hamming ^[3]
Computation: Finite and Infinite Machines by Marvin Minsky
Computer Lib/Dream Machines by Ted Nelson
Gödel, Escher, Bach: an Eternal Golden Braid by Douglas Hofstadter ^[1]
Remembrance of Earth's Past (Trilogy) by Liu Cixin
Calculus, Fourth Edition by Michael Spivak (ISBN-10 0914098918) ^[2]

Books I'm Working Through

Visual Group Theory by Nathan Carter
To Mock a Mockingbird and Other Logic Puzzles by Raymond M. Smullyan

Books I Want to Read

Chaos: Making a New Science by James Gleick
The Road to Reality by Roger Penrose

^[1] This book has truly changed the way I think about intelligence and formal systems in a profound way. Along with the "Rememeberance of Earth's Past" trilogy, this has now become a book that I recommend to many people. I think it is highly motivating in some of what I want to focus on in my future studies.

^[2] Funnily enough, this is one of my favorite books that I learned from during my undergrad. Calculus by Spivak is a great book, even to look back on!

^[3] More so than the book, I highly, highly, recommend Hamming's lecture series "Learning to Learn", which you can find on YouTube. In particular, the last lecture in that series "You and Your Research", which for me was lifechanging. I think that everyone, whether in research or industry, would take great value from watching that lecture. You can think of it less of a lecture in the academic sense, and more of a sermon on how to lead a great life -- great by your standards.