Projects

  • Sanity Checks for Attribution Methods on Vision TransformersPaper

    Extended attribution faithfulness tests to Vision Transformers, comparing seven methods across ResNet-50 and ViT-B/16 and finding architecture-dependent disagreements.

  • Measuring the Reliability of Natural Language AutoencodersPaper

    Used Generalizability Theory to measure how reliable natural language autoencoders are for LLM interpretability, with concrete sampling-budget recommendations.