Projects

Sanity Checks for Attribution Methods on Vision TransformersPaper
Extended attribution faithfulness tests to Vision Transformers, comparing seven methods across ResNet-50 and ViT-B/16 and finding architecture-dependent disagreements.
Measuring the Reliability of Natural Language AutoencodersPaper
Used Generalizability Theory to measure how reliable natural language autoencoders are for LLM interpretability, with concrete sampling-budget recommendations.