Inside the Black Box: How a New Tool Lets Engineers Actually Debug AI Models
A new tool called Silico is giving AI developers the ability to look inside language models and tweak their behavior with surgical precision, moving the field away from trial-and-error toward something closer to actual engineering. The San Francisco-based startup Goodfire released Silico as the first off-the-shelf tool that lets researchers and engineers examine individual neurons within AI models, understand what they do, and adjust them to reduce hallucinations, fix logical errors, or even change ethical reasoning.
For years, building large language models (LLMs), the AI systems behind ChatGPT and Gemini, has been more art than science. Engineers rely on rules of thumb rather than deep understanding of how these systems actually work. "Nobody really knows exactly how or why they work, and that can make it hard to fix their flaws or block unwanted behaviors," according to Goodfire's CEO Eric Ho.
What Is Mechanistic Interpretability and Why Does It Matter?
Goodfire is pioneering a technique called mechanistic interpretability, which aims to understand what happens inside an AI model by mapping its neurons and the pathways between them. MIT Technology Review named mechanistic interpretability one of its 10 Breakthrough Technologies of 2026. The approach is being pursued by industry leaders including Anthropic, OpenAI, and Google DeepMind, but Goodfire is taking it a step further by using these insights not just to audit finished models, but to guide the design process itself.
The difference is significant. Most interpretability work happens after a model is trained, like performing an autopsy to understand what went wrong. Silico lets developers intervene during training, adjusting the "knobs and dials" that shape how a model learns. "We want to remove the trial and error and turn training models into precision engineering," Ho explained.
How Does Silico Actually Work?
Silico lets you zoom in on specific parts of a trained model, such as individual neurons or groups of neurons, and run experiments to see what those neurons do. The tool automates much of the complex interpretability work using AI agents, which Ho noted are now strong enough to handle tasks that previously required human researchers.
Here's what developers can do with the tool:
- Identify Problem Neurons: Check what inputs make different neurons activate and trace how they influence other neurons upstream and downstream in the network.
- Adjust Behavior in Real Time: Boost or suppress specific neurons to change how a model responds, such as making it more transparent about ethical concerns or reducing hallucinations.
- Filter Training Data: Use interpretability insights to remove problematic training data before it shapes a model's parameters, preventing unwanted behaviors from forming in the first place.
Goodfire demonstrated these capabilities with concrete examples. In one case, researchers found a neuron in the open-source model Qwen 3 associated with the trolley problem, a classic ethical dilemma. Activating this neuron changed the model's responses to frame outputs as explicit moral questions.
In another example, when asked whether a company should disclose that its AI behaves deceptively in 0.3% of cases affecting 200 million users, a model initially said no, citing business impact. By boosting neurons associated with transparency, researchers flipped the answer to yes in nine out of 10 trials. "The model already had the ethical reasoning circuitry, but it was being outweighed by the commercial risk assessment," Ho noted.
The tool can also fix logical errors. Many models incorrectly say that 9.11 is greater than 9.9. By looking inside the model, Goodfire found neurons associated with Bible verses and code repositories influencing the comparison. Using this information, developers can retrain the model to avoid those pathways when doing math.
Who Can Actually Use This Tool?
Currently, Silico works best with open-source models where developers have access to the model's inner workings. Most people won't be able to use it to examine closed models like ChatGPT or Gemini, which keep their internals proprietary.
However, the tool's release is significant because it puts techniques previously available only to a handful of top AI labs into the hands of smaller firms and research teams. Goodfire will charge a fee determined on a case-by-case basis, though the company declined to provide specific pricing details.
Leonard Bereska, a researcher at the University of Amsterdam who has worked on mechanistic interpretability, sees value in democratizing these tools. "Frontier labs already have internal interpretability teams," he noted. "Silico arms the next tier of companies, where the value is not having to hire interpretability researchers".
What Are the Practical Implications for AI Safety?
Tools like Silico could be essential for safety-critical applications in healthcare and finance, where understanding and controlling model behavior is non-negotiable. By making it easier to build trustworthy models, the tool addresses a real gap in the AI development ecosystem.
However, some experts urge caution about the framing. Bereska pushes back on Goodfire's loftier aspirations, arguing that "in reality, they are adding precision to the alchemy. Calling it engineering makes it sound more principled than it is".
Still, Ho's vision is clear: if model training becomes more like building software, there's no reason why only a handful of companies should be designing AI systems. "If we can make training models a lot more like building software, there's no reason why there can't be many more companies designing models that fit their needs," he said.
The release of Silico marks a shift in how the AI industry approaches one of its thorniest problems: the opacity of large language models. By giving developers better tools to see inside these systems and adjust them with precision, the field moves closer to an era where AI development is less about scaling compute and more about understanding what's actually happening under the hood.