Even strong security programs have blind spots - and AI changes what’s possible to see
From the Tenzai Trenches is a series of real-world stories from building and deploying AI hacking agents in production enterprise environments. These posts share what we’re seeing firsthand — what works, what breaks, and what surprised us — as organizations put AI-driven offensive security to the test.
Life in the Trenches: “We Already Passed the Pentest”
In this case, our customer is a global provider of software and services with roughly 30,000 employees, and the target was an internal web application. The security team had already completed a penetration test with an external firm and closed every reported finding. In other words, the developers not only did their job, they did their job well.
The implicit assumption is that the pen test covered not just application specific attacks but also the basics of web application security. This assumption is reasonable, practical and wrong.
Then we ran Tenzai.
The Environment
This wasn’t an exotic target. It was a standard internal web application with authentication workflows, with framework defaults doing most of the heavy lifting. There was nothing unusual about the stack, nothing that stands out in the code or the SBOM.
That’s exactly why this story matters. In real life, the failures tend to live in defaults and seams, not in complex or novel code.
Grinders, Not Gamblers
One pattern we keep seeing across real deployments is that agents are grinders.
Their advantage isn’t insight or creativity. It’s that they don’t get bored, don’t “sample and move on,” and don’t assume something is unlikely just because it’s tedious to check. Instead, they apply structured, systematic coverage across endpoints and states, repeating the same classes of checks until patterns emerge.
This isn’t brute force; the agent knows it can’t test all N endpoints with M attacks. The agent understands the application, the tech stack, the deployment and goes over possible attack vectors. The agent can perform checks that humans skip for good reason because agents do not get bored and agents can endlessly go on.
Nuanced Analysis, Accurate Results
One thing that our agent demonstrates very often is an ability to apply nuance when analyzing vulnerabilities.
In this test, while probing the application, our agent identified a set of endpoints that blindly trusted the X-Forwarded-Host header. This header is commonly added by proxies and CDNs to preserve the original Host header from the client's request - since proxies often rewrite the Host header when forwarding traffic to backend servers.
.avif)
In our case, the agent observed that when it supplied this header by itself, the application treated it as authoritative in places it shouldn’t have. In one case, the application returned a page containing dozens of CSS and JavaScript URLs pointing at the attacker domain:
.avif)
Our agent observed similar behavior across eight different endpoints. While this behavior is certainly concerning, the agent demonstrated a high level of nuanced reasoning - rather than flagging the issue as critical, it analyzed the context,and recognized that exploitation requires specific conditions to be met. Based on that, it delivered an accurate, measured risk assessment.
Why This Survives a Good Pentest
This is a question we keep returning to in the trenches: how do real vulnerabilities survive competent teams and processes?
Traditional pentests are time-boxed and by necessity prioritized. Even excellent testers prioritize depth on a subset of surfaces. Header trust issues often fall into the category of “known class, easy to test once,” but difficult to validate exhaustively across all the services that make up an application: : asset generation, localization routes, authentication flows (login, OAuth callbacks, SAML handling, etc.), error paths, redirect handlers for assets and more.
Most testers know to check forwarded headers (it’s a classic attack!) but because it’s rarely an issue in real life it always drops off the priority list. Agents don’t make that tradeoff. They keep grinding the coverage and other metrics.
The Enterprise Implication
Internal applications are dangerous precisely because they’re trusted by default. Employees trust internal login pages, and those systems often sit close to privileged workflows and sensitive data. Redirect hijacking and asset poisoning are especially effective in these environments because everything looks legitimate to internal users who want to get their work done.
The lesson here isn’t “test this header once.” It’s to test exhaustively because edge cases happen in seams.
What Security Teams Should Take Away
If your application accepts forwarded headers at all, there are a few concrete steps that matter:
- Strip or overwrite X-Forwarded-* headers at the trust boundary so only trusted components can set them
- Explicitly configure and review server and application trusted-proxy settings, especially as infrastructure changes
- Enforce a canonical base URL or strict host allowlist for absolute URL generation, including redirects, OAuth flows, and asset URLs
- Keep SSO redirect and callback logic static and validated rather than dynamically derived from request headers, if you have to handle multiple environments, have static configuration files per environment.
- Log and alert on anomalous Host and X-Forwarded-Host values, particularly in authentication paths
What Comes Next from the Tenzai Trenches
This won’t be the last time we see a “clean” pentest miss something with real impact. As applications grow more complex - and as AI accelerates both development velocity and attack surface - the gap between what gets sampled and what actually exists continues to widen.
Stay tuned for the next story deep from the Tenzai trenches 👀

.avif)