Projects
click a card to open the dossier
— 3 —
click a card to open the dossier
— 3 —
Mark Muchane, Sean Richardson, Kiho Park & Victor Veitch
Sparse autoencoders are a popular tool for pulling human-readable concepts out of a language model, but they ordinarily treat every concept as unrelated to the others. This work redesigns the architecture so it learns an explicit hierarchy — concepts nested under broader ones — and shows that doing so improves reconstruction, makes the features easier to interpret, and runs considerably faster.
2025 · first author · arxiv.org/abs/2506.01197
Mark Muchane, George Sokolik, Micah Goldblum & Sanae Lotfi
Small language models are usually just shrunken-down versions of big ones. This paper asks whether you can build them more cleverly by reusing the same weights across attention heads and layers, then capturing the small differences with low-rank adapters. Under careful parameter-matched experiments from 100M to 1B parameters, sharing whole layers and attention matrices holds up well and trades a little extra compute for a meaningfully smaller memory footprint — yielding a practical recipe for compact models.
2026 · first author · ICLR Sci4DL workshop · openreview.net/forum?id=gdMMeemjRB
Todd Nief, Harvey Yiyun Fu, Mark Muchane & Ari Holtzman
“Subliminal learning” is the surprising claim that a model can pass on a behavioral quirk — say, a fondness for cats — to a student model trained only on the teacher’s harmless-looking number sequences. We trace this effect back to LoRA finetuning: it depends on LoRA rank, vanishes under full finetuning, and only shows up around the exact tokens (like the default system prompt and chat template) shared between training and evaluation. In short, the phenomenon is a fragile artifact of LoRA settings and context rather than a reliable channel for transmitting behavior.
2026 · co-author · arxiv.org/abs/2606.00831v1