How Teams Can Enforce Test-Driven Development With AI Coding Agents Using Context Files
Google Cloud Developer Advocates are recommending that teams create persistent context files to guide AI coding agents through a structured test-first workflow, a practice that can reduce security remediation time by half and prevent the cascading bugs that plague unguided AI code generation. Rather than letting coding agents generate hundreds of lines of code in a single pass, teams can enforce a four-step cycle that catches vulnerabilities early in the development process, before code even reaches production pipelines.
What Is the "70% Problem" in AI-Generated Code?
When AI coding agents work without guardrails, they can rapidly prototype applications and reach about 70% functionality in a single burst. However, completing the remaining 30% becomes a security and stability nightmare. Developers fall into what experts call the "two steps back pattern," where fixing one bug introduces other functional regressions. More critically, applications built primarily through conversational prompts to AI without formal structure, often called "vibe coding," are highly susceptible to security vulnerabilities.
"Vibe coding is fun until you start leaking database credentials," noted Aron Eidelman, Google Cloud Developer Advocate.
Aron Eidelman, Google Cloud Developer Advocate
Recent research from DORA, a DevOps metrics organization, reveals that unconstrained AI adoption for writing code creates several common problems:
- Massive Code Changes: Coding agents tend to make sweeping changes that overwhelm developers, making it difficult to track quality and security issues unless clear boundaries are established.
- Review Fatigue: Letting an agent run without constraints can result in hundreds of lines of code generated in a single batch, which increases review effort and elevates the risk of introducing vulnerabilities that static tools struggle to catch.
- Vulnerability Accumulation: Large, unguided code generation batches make it harder for both human reviewers and AI-assisted review tools to spot security issues before they reach production.
How Can Teams Enforce Test-Driven Development With AI Agents?
Google Cloud advocates recommend that teams create a persistent context file, such as GEMINI.md or ANTIGRAVITY.md, that guides AI agents through a structured four-stage workflow known as PRGR, which stands for Plan, Red, Green, Refactor. These context files are configuration documents that teams create themselves to establish clear boundaries and expectations for how AI agents should approach code generation.
- Plan Stage: The agent articulates the architecture and design before writing any code, ensuring alignment with team goals and existing patterns.
- Red Stage: The agent writes a single failing test, ideally from the user's perspective, to prove the test covers new behavior before implementation begins.
- Green Stage: The agent's focus is limited exclusively to making that single test pass, keeping changes minimal and focused.
- Refactor Stage: Code is cleaned up to maintain quality, security scans are run to catch easy-to-fix issues, and the persistent context file is expanded to prevent recurring problems, all before any commit.
This local structure ensures that outcomes are defined and results are reviewed, replacing direct line-by-line developer interaction with a systematic process. If during the Plan or Refactor stage the team realizes functionality needs to be divided, they can update the plan and start a new flow, preventing too many tests and too much functionality from being crammed into a single commit.
What Does the Research Say About Speed Versus Security?
One of the most counterintuitive findings from DORA research is that the assumed trade-off between speed and stability is a myth. High-performing teams excel at both delivery speed and system stability simultaneously. This means that enforcing small, iterative improvements does not slow teams down; instead, it reduces human review effort and lowers the risk of introducing vulnerabilities.
Additionally, integrating security objectives directly into daily activities, from design to coding to testing, enables teams to spend 50% less time remediating security issues. Providing the AI agent with clear team conventions, APIs, and requirements guides its scope and boosts individual effectiveness. The key insight is that security should not be a late-stage gate that forces developers to choose between shipping on time or shipping securely. When security is treated as a late-stage review process, 81% of developers admit to knowingly shipping vulnerable code due to competing business priorities.
How Does the "Paved Road" Concept Change Security Culture?
To resolve the tension between speed and security, modern engineering teams are deploying what is called the "paved road" concept. This approach, which originated from Netflix's response to the Heartbleed security vulnerability, takes the security team's expertise and bakes it directly into self-service tooling, making secure configurations the easiest and most natural path for developers. Instead of playing a reactive game of "whack-a-mole" against vulnerability findings, teams systematically eradicate vulnerability classes by providing pre-configured, secure-by-default modules.
Under this model, security scanning shifts from detecting vulnerabilities to verifying whether developers are using the paved patterns. This represents a fundamental shift in how teams approach security, moving from a reactive, late-stage review process to a proactive, integrated approach where secure practices are the default path. The result is that teams catch issues in the local editor, which reduces costs compared to handling a fire drill in the pipeline.
For teams adopting test-driven development practices with AI agents and using context files to guide their behavior, the message is clear: structure and guardrails do not slow down development. Instead, they accelerate it by reducing rework, catching bugs early, and building security into the development process from the start.