BYO SWE-grep: automatically train blazing fast search sub-agents on your knowledge base (Pt. 1)
RL-trained search subagents that learn your knowledge base’s structure for fast, reliable retrieval
Research
Nov 11, 2025
We're not here to ride the AI hype bubble. We believe robust and fundamental technical breakthroughs applied to real-world systems is what really moves the needle.
BYO SWE-grep: automatically train blazing fast search sub-agents on your knowledge base (Pt. 1)
RL-trained search subagents that learn your knowledge base’s structure for fast, reliable retrieval
Research
Nov 11, 2025
BYO SWE-grep: automatically train blazing fast search sub-agents on your knowledge base (Pt. 1)
RL-trained search subagents that learn your knowledge base’s structure for fast, reliable retrieval
Research
Nov 11, 2025
Purpose-built LLMs for dental note-taking
Frontier thinking model performance at a fraction of the latency.
Case study
Nov 5, 2025
Purpose-built LLMs for dental note-taking
Frontier thinking model performance at a fraction of the latency.
Case study
Nov 5, 2025
Lumina: building self-improving evaluation through customer-in-the-loop refinement
Lumina: an adaptive evaluation engine that learns to judge like a subject matter expert.
Research
Oct 30, 2025
Lumina: building self-improving evaluation through customer-in-the-loop refinement
Lumina: an adaptive evaluation engine that learns to judge like a subject matter expert.
Research
Oct 30, 2025
Upweight the strategy, not the tokens: faster training with explicit reasoning through RGT (Rationale-Guided Training)
Teach the why, not just the what: Rationale-Guided Training
Research
Oct 28, 2025
Upweight the strategy, not the tokens: faster training with explicit reasoning through RGT (Rationale-Guided Training)
Teach the why, not just the what: Rationale-Guided Training
Research
Oct 28, 2025
Attention-based attribution: what your model is actually looking at
Cosine similarity is cosplay. Attention is attribution.
Research
Oct 28, 2025
Attention-based attribution: what your model is actually looking at
Cosine similarity is cosplay. Attention is attribution.
Research
Oct 28, 2025
Robust, sample efficient SFT with prompt mutations
Low-KL divergence prompt mutations: better performance at a fraction of the cost.
Research
Oct 27, 2025
Robust, sample efficient SFT with prompt mutations
Low-KL divergence prompt mutations: better performance at a fraction of the cost.
Research
Oct 27, 2025
Training loss predicts evaluation performance, even for non-verifiable tasks
Loss: the cheapest evaluation you’ll ever run.
Research
Oct 27, 2025
Training loss predicts evaluation performance, even for non-verifiable tasks
Loss: the cheapest evaluation you’ll ever run.
Research
Oct 27, 2025
Building production AI for regulated industries with a leading digital insurer
From frontier OpenAI/Google models to open-source — delivering 8x the speed and outperforming GPT-5-level accuracy.
Case study
Oct 20, 2025
Building production AI for regulated industries with a leading digital insurer
From frontier OpenAI/Google models to open-source — delivering 8x the speed and outperforming GPT-5-level accuracy.
Case study
Oct 20, 2025
Iterative SFT (iSFT): dense reward learning
Iterative SFT: dense, high-bandwidth learning
Research
Oct 15, 2025
Iterative SFT (iSFT): dense reward learning
Iterative SFT: dense, high-bandwidth learning
Research
Oct 15, 2025
Write small, learn forever: rank-1 LoRA for continual learning
Why rank-1 LoRA updates might be the missing link between static fine-tuning and truly continuous, live-on-GPU learning.
Research
Oct 12, 2025
Write small, learn forever: rank-1 LoRA for continual learning
Why rank-1 LoRA updates might be the missing link between static fine-tuning and truly continuous, live-on-GPU learning.
Research
Oct 12, 2025
Practical LoRA Research
Fine-tuning at Scale: What LoRA Gets Right (and LoRA-XS Doesn’t).
Research
Oct 10, 2025
Practical LoRA Research
Fine-tuning at Scale: What LoRA Gets Right (and LoRA-XS Doesn’t).
Research
Oct 10, 2025
A letter to the C-suite: the shifting role of MLEs
Your MLEs are brilliant, but you’re giving them the wrong job.
Position
Sep 8, 2025
A letter to the C-suite: the shifting role of MLEs
Your MLEs are brilliant, but you’re giving them the wrong job.
Position
Sep 8, 2025
Fine-tuning small open-source LLMs to outperform large closed-source models by 60% on specialized tasks
27B open-source model outperforms biggest OpenAI/Anthropic/Google models on real healthcare task.
Case study
Aug 15, 2025
Fine-tuning small open-source LLMs to outperform large closed-source models by 60% on specialized tasks
27B open-source model outperforms biggest OpenAI/Anthropic/Google models on real healthcare task.
Case study
Aug 15, 2025
Amnesiac generalist behemoths are not the future of language models
You don’t need a generic genius. You need a specialist learner.
Position
Jul 28, 2025
Amnesiac generalist behemoths are not the future of language models
You don’t need a generic genius. You need a specialist learner.
Position
Jul 28, 2025
The bitter lesson of LLM evals
Turning expert judgment into a compounding moat. Because in LLM evals, scaling care beats scaling compute.
Position
Jul 13, 2025
The bitter lesson of LLM evals
Turning expert judgment into a compounding moat. Because in LLM evals, scaling care beats scaling compute.
Position
Jul 13, 2025
Do transformers notice their own mistakes? Finding a linear hallucination detector inside LLMs
A linear signal in LLMs reveals hallucinations, is detected by a frozen observer, and steered with a single vector.
Research
May 8, 2025
Do transformers notice their own mistakes? Finding a linear hallucination detector inside LLMs
A linear signal in LLMs reveals hallucinations, is detected by a frozen observer, and steered with a single vector.
Research
May 8, 2025
Resurrecting the salmon: seeing clearer inside LLMs with domain-specific SAEs
A powerful, efficient, and domain-robust strategy for safeguarding medical-text generation.
Research
Feb 15, 2025
Resurrecting the salmon: seeing clearer inside LLMs with domain-specific SAEs
A powerful, efficient, and domain-robust strategy for safeguarding medical-text generation.
Research
Feb 15, 2025
Why mechanistic interpretability needs a paradigm inversion
The conventional scaling paradigm for language models themselves may be fundamentally misaligned with interp.
Research
Jan 13, 2025
Why mechanistic interpretability needs a paradigm inversion
The conventional scaling paradigm for language models themselves may be fundamentally misaligned with interp.
Research
Jan 13, 2025
From training to deployment, we help you launch a specialist LLM that outperforms generic models, adapts automatically, and runs reliably at scale.
From training to deployment, we help you launch a specialist LLM that outperforms generic models, adapts automatically, and runs reliably at scale.