Do transformers notice their own mistakes? Finding a linear hallucination detector inside LLMs
A linear signal in LLMs reveals hallucinations, is detected by a frozen observer, and steered with a single vector.
May 8, 2025
Parsed is an Al interpretability lab focused on supercharging model performance and robustness through the lens of evaluations and mechanistic interpretability.
Led by LocalGlobe and backed by notable angels including co-founder & CSO @ HuggingFace, ex-director @ DeepMind, director @ Meta Al Research, head of startups @ OpenAl, ex-chair of the NHS, etc.
We believe that Parsed is the most scalable way to actually improve lives. It applies horizontally across mission-critical use cases, is at the frontier of AI research, and has immediate impact for our customers. Deep expertise interpretability is essential for this mission and is the DNA of our founding team. We’re growing a lean, all-star team.