Do transformers notice their own mistakes? Finding a linear hallucination detector inside LLMs
A linear signal in LLMs reveals hallucinations, is detected by a frozen observer, and steered with a single vector.
May 8, 2025
Parsed is an Al interpretability lab focused on supercharging healthcare model performance and robustness through the lens of mechanistic interpretability.
Led by LocalGlobe and backed by notable angels including co-founder & CSO @ HuggingFace, ex-director @ DeepMind, director @ Meta Al Research, head of startups @ OpenAl, ex-chair of the NHS, etc.
We believe that Parsed is the most scalable way to actually improve patient lives. It applies horizontally across healthcare, is at the frontier of AI research, and has immediate impact for our customers. Deep dual expertise in both medicine and interpretability is essential for this mission and is the DNA of our founding team. We’re growing a lean, all-star team.