My Bookshelf
Some books and papers that I enjoy or want to read in the future. This page is only occasionally updated, and will forever be out-of-date. For now, it can hopefully give you some insight into my interests and give some better visibilities to authors that inspire me.
Papers I've Enjoyed
Aside: This section is the most chronically out-of-date, since I try to read
at least one paper per day and narrowing down favorites retrospectively is a large task.
I've written some code to fetch and parse data on what I've read via Zotero, however
I have yet to decide how to organize and display this data to you, dear reader, such
that it isn't an overwhelming hose of information.
Regardless, these papers
have played a large role in shaping me and still deserve a place here.
- Schlag, Imanol, Kazuki Irie, and Jürgen Schmidhuber. "Linear transformers are secretly fast weight programmers." International Conference on Machine Learning. PMLR, 2021.
- Von Oswald, Johannes, et al. "Transformers learn in-context by gradient descent." International Conference on Machine Learning. PMLR, 2023.
- Balestriero, Randall, and Richard Baraniuk. "Mad max: Affine spline insights into deep learning." arXiv preprint arXiv:1805.06576, 2018.
- Little, W.A. "The existence of persistent states in the brain." Mathematical Biosciences, 1974.
-
Humayun, Ahmed Imtiaz, et al. "Splinecam: Exact visualization and
characterization of deep network geometry and decision boundaries."
Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition., 2023.
Some Notes
This paper builds off of the above "Mad Max" paper by visualizing the geometry that is explored in that paper. I had an idea after reading "Mad Max" that motivated me to start implementing a similar visualization. Once I got stuck and was doing some research online to help my problems, I discovered this paper! - Ilyas, Andrew, et al. "Adversarial examples are not bugs, they are features." Advances in neural information processing systems, 2019.
-
Xiao, Guangxuan, et al. "Efficient Streaming Language Models with
Attention Sinks."
arXiv preprint arXiv:2309.17453, 2023.
Some Notes
This paper builds off of Evan Miller's Attention is Off By One blog post which I think is a great (dare I say mathematically "morally correct") interpretation of the deficiencies of using Softmax in Attention. I ended up having some questions about Table 3. in that paper, which benchmarks Evan Miller's Softmax variant, and so I posted an issue on the Github page for their paper.
Books
You can also find me on Goodreads, which tends to be more up to date.
Books I Like
- The Art of Doing Science and Engineering: Learning to Learn by Richard Hamming [3]
- Computation: Finite and Infinite Machines by Marvin Minsky
- Computer Lib/Dream Machines by Ted Nelson
- Gödel, Escher, Bach: an Eternal Golden Braid by Douglas Hofstadter [1]
- Remembrance of Earth's Past (Trilogy) by Liu Cixin
- Calculus, Fourth Edition by Michael Spivak (ISBN-10 0914098918) [2]
Books I'm Working Through
- Visual Group Theory by Nathan Carter
- To Mock a Mockingbird and Other Logic Puzzles by Raymond M. Smullyan
Books I Want to Read
- Chaos: Making a New Science by James Gleick
- The Road to Reality by Roger Penrose