The Generalist Advantage in Agentic Pentesting

Daniel Goldberg

Security Researcher

A real-life Tenzai agentic pentesting case study: From open registration to RCE on Oracle infrastructure via AI agent IDOR, SSH override, and cross-domain chain - six domains, one run.

The Generalist Advantage in Agentic Pentesting

Table of contents

This is some text inside of a div bloc

Tenzai from the Trenches is a series of real-world stories from building and deploying AI hacking agents in production enterprise environments. These posts share what we’re seeing firsthand - what works, what breaks, and what surprised us - as organizations put AI-driven offensive security to the test.

Our agent is good at many things, but one thing Tenzai’s AI hacker is particularly good at is combining knowledge across different domains and turning a set of small leads into one large finding.

In one production application found in a publicly-traded, multinational enterprise company, Tenzai found a chain that crossed from classic web broken access control, into AI agent metadata leaks, and then into a tool execution call with broken authorization to reach code execution on production Oracle infrastructure.

Our target

Tenzai's AI hacker found a chain that crossed from classic web broken access control, into AI agent metadata leaks, and then into a tool execution call.

In this case, the impact of an entire chain was RCE, but the route crossed web authorization, product semantics, agent tooling, SSH execution, Oracle/Linux assumptions, and repeated validation of failed paths. A narrow specialist might stop when the attack leaves their domain or run out of time in a time bounded pen test. The agentic workflow kept the thread alive across domains and finished in a few hours.

Easy entry through normal web flaws

The first useful crack was boring in the best way: registration. The app lets a fresh user create a Member account. No credential hunt. No social engineering detour.

Surprisingly, the Member role was already useful. From that account, the Tenzai Agent could call GET /api/admin/sessions/{id}/activities and read the API call history for other users, including Super Admins.

This is a session-management finding. Pretty bad on its own. The sneaky bit was hiding in the path strings. Activity logs are a strange kind of loot: half audit trail, half navigation history. Here the activities carried agent IDs from admin activity.

The first test had no special guidance; it roamed the app, touched 92 endpoints, and made 1,096 tool calls. Out of that pile a jumble of logs that contained some nuggets: admin request history leaked agent UUIDs. UUIDs are a pretty useful thing to collect and were indeed used later on.

An execution primitive that did not execute

The second useful crack was clearer. The platform let agents run SSH tools, and there is a profile flag meant to control whether a user could override the default SSH commands. In the UI, the control looked gated, in the API, nothing.

In the UI, the control looked gated, in the API, nothing.

The backend accepted can_override_ssh_commands: true through PUT /api/agents/{id}/tools/{toolId} even when the user's profile setting said they should not have that power.

Then the command path reached the SSH executor. Command strings were passed through to the SSH layer, and the SSH library tried to use them. The run failed at the socket, with 127.0.0.1:22 refusing the connection.

The agent had configured the target SSH server as localhost inside the application container. That container was serving the web app; it was not running sshd.

So, the finding remained at “High” and the agent itself said: The vulnerability is fully confirmed. The only thing missing is a reachable SSH endpoint on the other side.

UUIDs become agent intelligence

A UUID is a handle. As part of our planning sub-system, we ran a follow-up test focused on extracting information and improving coverage that was left open from the first run.

Among the follow-ups were for agent UUIDs pulled out of admin activity logs, the owners around them, and the hope that one of those agents carried SSH access.

The follow-up was narrower in scope: 148 tool calls.

The obvious move was to try to run the tools used by agents the Member did not own. POST /api/agents/{id}/tools enforced ownership, so the lead termed direct non-owned tool injection was blocked right there.

At that point the Tenzai hacker realized it needed to understand the application further. The internal research question changed from “can we use the leaked UUID to touch someone else’s SSH tool?” to “what metadata does this product expose around someone else’s agent?” Following up on the second question made the agent investigate the /card endpoint.

Agent /card IDOR turns metadata into a real target

Investigating the /card endpoint paid off for the agent.

GET /api/agents/{id}/card returned metadata for agents the Member did not own. At scale, 71 of 104 tested UUIDs answered and one of them was a DBA agent.

Card endpoint information for the DBA server agent

The card exposed no credentials, but it provided the next hop: a real internal SSH target tied to Oracle administration work.

Then an internal note captured the agent realizing what it had: "the hostname is now known and the tool inventory is mapped." These internal chain of thoughts show how agent realizes what it needs to target changes.

At this point our hacker needs to guess a lot less. It had the ID of an agent with relevant tooling, a specific hostname, and hopefully the right primitives to execute code

Tenzai’s agent knows many things that no human being will ever remember. Beyond the typical LLM knowledge base, our AI has access to unique knowledge particularly around how we exploit and validate findings. In this case, Oracle boxes tend to have familiar OS users. DBA agents tend to touch RAC, ASM, Data Guard, backups, listener checks, disk groups.

Internal Tenzai task for one of our sub agents

At this point, the agent needed a way to execute these tools. The parent agent reads had an ownership check while some sub-endpoints did not.

/card returned metadata for agents the Member did not own. /a2a did not leak credentials or execute commands, but it leaked something important about the backend: the request reached PostgreSQL task-handling logic for a non-owned agent and failed on a missing a2a_tasks relation. That’s not good enough but the Tenzai agent learned the ownership check was inconsistent across agent sub-endpoints. If /card and /a2a could both cross the ownership boundary, /execute was the next endpoint worth testing.

/execute was the obvious next thing to try and the agent followed that track.

Final step: /execute turns product context into RCE

POST /api/agents/{id}/execute accepted a request against the DBA agent from a Member account. The platform ran the DBA agent's configured execution path anyway, using the access already stored with that agent.

The final route was short: Invoke the existing DBA agent and inherit its configured execution path.

That is our RCE. The web app authorization mistake crossed into agent execution, then into stored infrastructure access, then into commands running as the oracle OS user on a related Oracle server.

Conclusion: One workflow kept the thread alive

Each finding was ordinary enough and pretty boring: admin metadata exposure to users, ability to override SSH tool calsl, agent metadata, an /execute requiring a UUID to exploit.

Some might have been reported separately and never chained.

Many different tools can find these. Many different tools can create example code for each of these. A specialized offensive agent such as Tenzai can find all these findings and chain them together for a full compromise, with receipts.

‍