Why Your Favorite AI Model Might Just Be the One You Use Most Often
Your preference for a particular AI model probably says more about your situation than about the model itself. A new analysis of how developers and organizations choose between large language models (LLMs) reveals that decisions are driven far less by rigorous testing and far more by accident, employer approval, influencer recommendations, and how long someone has been using a tool. The result is a landscape where marketing budgets and access programs shape collective understanding more than actual capability comparisons .
How Are Organizations Actually Choosing Their AI Models?
In most corporate environments, the process of selecting an AI model happens almost by accident. Someone on the team tries Claude Code one weekend, gets excited, and suddenly the entire organization is using it. Nobody evaluated alternatives. Nobody ran a formal comparison. The decision was made by whoever had a company card and free time .
This matters because when that same person recommends their favorite model to others, they are really telling you which model they have gotten the most practice with. There is a genuine learning function at play: you get faster with the tool, your prompts improve, and the model starts to feel intuitive. But that is not the same as the model being objectively superior .
What Factors Actually Influence Model Selection?
- Access and Familiarity: Developers tend to prefer models they have spent the most time using, regardless of whether alternatives might perform better for their specific tasks.
- Employer Procurement Decisions: Corporate environments often lock teams into a single model based on initial adoption, without evaluating competing options.
- Marketing and Influencer Bias: Model providers spend significant budgets on early access programs, credits, and curated demo environments that create positive bias among people with large audiences.
- Cost Considerations: Developers paying for their own tokens tend to have more calibrated opinions about model selection, while those using corporate accounts may default to more expensive, powerful models unnecessarily.
- Geopolitical Concerns: Some developers deliberately avoid certain models based on their country of origin, while others use them because they are cheaper and capable, creating a real but largely hidden conversation about trade-offs.
The distinction between what people say they prefer and what actually works best is significant. When someone tells you they love Claude Opus, they might be reflecting genuine superiority, or they might be reflecting that their employer pays for tokens and they have never seriously tested alternatives .
How Should You Actually Evaluate AI Models?
A practical approach to model evaluation requires sustained, hands-on testing rather than casual experimentation. One developer who forced themselves to test outside their comfort zone spent a full week using Codex seriously and found it nearly indistinguishable from Claude Sonnet 4.6 for most coding tasks, while running at roughly half the cost when factoring in token efficiency. That developer set a minimum threshold of one week of sustained use before forming a firm opinion, noting that anything less is just rating a first impression .
The same principle applies to Anthropic's model lineup. Many users default to Haiku for well-scoped, mechanical tasks, use Sonnet for almost everything else, and reserve Opus for genuine breadth like architecture questions or strategic framing. However, developers in corporate environments often leave the dial on Opus permanently because they are not paying for tokens themselves. High-powered models can actually work against you on simple tasks; they overthink, add unnecessary abstractions, and restructure things that did not need restructuring. When you have a clearly templated class to write, Haiku gets it right at a tenth of the cost without second-guessing the design .
The benchmarks in this space are starting to feel managed, and influencer coverage is clearly shaped by access programs and marketing budgets. None of this means the models are bad; some are genuinely remarkable. But when you ask someone which model to use, you are getting an answer filtered through their employer's procurement decisions, the influencers they follow, what they can afford, and how long they have been using that particular tool. The answer tells you a lot about their situation. It tells you almost nothing about the model .
What Does This Mean for Your AI Strategy?
The practical implication is clear: take model recommendations with appropriate skepticism, including from people with large platforms. If you are currently locked into a single model because of how your organization adopted it, consider running a serious evaluation of alternatives. Set a minimum threshold of one week of sustained, real-world use before forming an opinion. Pay attention to total cost of ownership, not just capability, especially if you are paying for tokens yourself. And be aware that the models generating the most buzz might be benefiting from marketing budgets rather than superior performance for your specific use case .
Meanwhile, Anthropic continues to face separate challenges related to code security. The company recently experienced a leak of approximately 3,000 files containing nearly 500,000 lines of source code from its Claude Code tool, which Anthropic attributed to human error in the configuration of its content management system . While the core AI model was not compromised, experts warned that the exposed code could reveal important internal systems and potentially help competitors understand how the tool works .