The instructions are not the point

Kaelyn Huang,
Software Engineer, AI Agents

[Mar 25, 2026 - 7 min read]

Learn how Zip engineered a composable AI agent platform that lets enterprises build custom procurement agents—same blocks, different structures, no new code.

If you grew up with Lego, you know the feeling. You spend three hours following the instruction booklet for a Lego Eiffel Tower. Step 47: attach the 2x4 flat gray to the underside of the base plate. And when you're done, you have something genuinely impressive. You put it on a shelf. You're proud of it.

Then your younger sibling walks over, looks at it for five seconds and tears it apart.

Annoying at the moment, but in some ways kind of the point. The instruction booklet wasn’t the end goal, but rather a demonstration of one option for what the pieces could become. Once you deeply understood the potential of the blocks, the instructions became optional. You could build anything.

That is the idea behind Custom Agents at Zip. The out-of-the-box agents we ship, such as MSA Risk Review, SOC 2 analysis, Adverse Media Research, are the Eiffel Towers. Faithful, functional, great for what they are. But enterprise procurement is not just one Eiffel Tower. It is a thousand different structures, each shaped by a different organization's policies, risk tolerances, and supplier relationships. Custom Agents let customers take the same blocks and build something that fits their world precisely on top of the Integration Platform we had already built.

This is the story of how we got there.

The problem we are solving

Enterprise procurement is full of repetitive, judgment-heavy tasks. Someone needs to check whether a vendor's SOC 2 is current. Someone else needs to validate that the pricing on a renewal matches what was originally contracted. A third person needs to screen a new supplier for adverse media before the company commits to a multi-year agreement.

These tasks follow patterns, but they require nuance. They are exactly the kind of work where AI agents can deliver outsized value, but only if the agent understands your context.

A generic MSA review agent might catch the standard risk categories. Your MSA review agent needs to know that your company has a hard stance on indemnification caps above $5M, that your legal team has flagged arbitration clauses in certain jurisdictions, and that there is a specific data residency requirement for any vendor touching your European customer data.

‍Our out-of-the-box agents cover the common cases. Custom Agents are for everything else.

The challenge wasn't in starting from scratch as we had Zip's App Studio, our workflow automation engine that already triggers actions across procurement workflows. But making it the foundation for a flexible, extensible agent platform required real engineering work.

Therefore we already had ideas for what to build, so the question we had to ask ourselves was "how do we make what we have flexible enough to power any procurement use case a customer might bring, while keeping it reliable enough to live inside a live approval workflow?"

The answer, as it turns out, starts with the blocks.

The agent catalog, aka our Eiffel Towers

Before we talk about Custom Agents, it's worth understanding what we ship out of the box. Zip currently offers 10 purpose-built agents covering the most common procurement automation needs:

AI Agent Use Case	Description
Data Validation	Reviews purchase request fields against uploaded documents to flag discrepancies and surface one-click corrections
MSA Risk Review	Analyzes Master Service Agreements against a company's legal playbook to categorize risks by severity
Procurement Pre-Check	Surfaces risks, gaps, and follow-ups from request documents requiring legal or security context
SOC2 Risk Review	Structured SOC 2 report analysis — scope, control design, audit findings, gap identification
DORA Screening	Assesses whether a supplier falls within scope of the Digital Operational Resilience Act
Adverse Media Research	Scans global news and public sources for supplier risk signals
Price Benchmarking	Validates software purchase pricing against contracted terms, historical purchases, and market benchmarks
DPA Risk Review	Scans Data Processing Agreements against GDPR/CCPA standards for gap identification
Renewal Assist	Compares renewal requests against original agreements to surface pricing and scope changes
Duplicate Vendor Check	Name and email domain matching to prevent vendor duplication

‍

Each one of these agents is a working Eiffel Tower. A customer can deploy any of them into their procurement workflow and immediately get value from it.

But the insight that made Custom Agents possible was this: every single one of these agents runs on the same underlying platform. They differ only in their configuration, i.e. the prompt that defines their behavior, the tools they are allowed to use, and the format of their output. Same blocks, different structure.

The Architecture: What the blocks actually are

All agents at Zip, pre-built and custom alike, run within our App Studio as a system action we call GenericAiAction. The App Studio is our workflow automation engine. It triggers actions at specific points in procurement workflows: when a request reaches an approval step, when a contract is uploaded, when a vendor is added. Agents are one type of action in this system.

The key architectural insight: agents are not standalone services. They are composable actions embedded in procurement workflows, with full access to the procurement data graph, like requests, contracts, vendors, documents, intake forms. When an agent runs, it does not operate in a vacuum. It runs in the context of a specific purchase request, with access to all of the documents and structured data attached to that request.

Configuration, not code

Each agent is defined as an Integration Platform recipe, which is a declarative configuration that specifies:

Model: Which LLM to use, or AUTO for dynamic selection
Tools: Which data sources the agent can access
Output Format: raw, markdown, or structured (with a JSON schema)
User Prompt: The instructions that define the agent's behavior
Citations: Whether to include source citations in output

This is the crucial part for Custom Agents: creating a new agent does not require writing new code. A customer's custom agent configuration is a recipe file that references the same underlying execution engine that powers our Data Validation agent; the blocks are shared, and only the instructions change.

When we built our Adverse Media Research agent, we didn't need to build a new execution engine. Rather, we wrote a new recipe for the same one. When a customer at the Zip AI Lab needs an agent that evaluates supplier financial health against their specific risk threshold, same thing; new recipe, same engine underneath.

The execution engine: four nodes and a loop

The factory itself is a LangGraph state graph with four nodes in a linear flow:

PreprocessingNode → OrchestrationNode → FinalLlmCallNode → PostProcessingNode

Each node reads from and writes to a shared ReactAgentState. The entire graph executes within an AgentContext that carries the recipe configuration.

class ReactAgentState(TypedDict, total=False):
	# Procurement entities from prompt
    referenced_objects: list[ObjectContext]
    # User prompt for document search
    vector_search_string: str
    # Results gathered by tools 
    accumulated_context: list[ToolExecutionResult]
    # LLM synthesis response
    final_llm_call_result: str 
    # Processed output 
    final_result: Any

‍

class AgentContext(Context):
	provider: LLMProvider
	model: LLMModel  # e.g. gpt-4.1, claude-4-5-sonnet, auto
	tools: list[LLMTool] # enabled data source categories
	output_format: LLMOutputFormat # raw, markdown, structured
	user_prompt: str
	include_citations: bool

‍

Let's walk through each node.

Node 1: Preprocessing: Figure out what we’re working with

The preprocessing node does one job: figure out which procurement objects the user is asking about. Purchase requests, vendors, contracts, purchase orders, invoices; these are the entities that give an agent its context.

No LLM call is needed. GUIDs are extracted from the user prompt, resolved against the database and filtered to supported entity types. The raw user prompt is also passed forward as a vector_search_string for downstream document retrieval.

This simplicity is intentional. There is no intelligence required here, just extraction. We reserve the expensive operations for the steps that need them.

Node 2: Orchestration: Where the intelligence lives

The orchestration node contains a nested ReAct agent, which is an autonomous agent-within-a-graph that dynamically decides which tools to call based on the user's request and the configuration of the recipe.

ReAct (Reason + Act) is a loop:

Reason: The LLM analyzes the user's request and the data it has so far
Act: It calls one or more tools to gather information
Observe: It reads the tool results
Repeat until it has enough context, or hits a recursion limit

The agent is created via LangChain's create_agent() with the tools enabled in the recipe and a system prompt describing the available tools and the referenced procurement objects. Each tool call returns a ToolExecutionResult:

class ToolExecutionResult(BaseModel):
	tool_name: str # e.g. "document_retrieval", "api_data"
	output_type: str # category for downstream processing
	ai_readable_string: str # human-readable summary for the LLM
	raw_output: Any # full structured data for post-processing

‍

That dual-output pattern deserves a moment. The orchestration agent sees a clean, human-readable summary of what the tool returned — just enough to reason about what to do next. Meanwhile, the full structured data travels forward to post-processing, where it powers features like citation linking and one-click corrections. One tool call, two consumers, each getting exactly the format they need. This separation is what lets us add new post-processing capabilities without changing a single tool.

Node 3: Final LLM Call — Synthesis

Once the orchestration agent has gathered all the context it needs, the final LLM call node takes over and does the following:

Reads all accumulated_context from the state and concatenates the ai_readable_string fields
Builds a final system prompt that includes the accumulated context, citation templates (if enabled), and output formatting instructions
Makes a single LLM call: [SystemMessage(final_prompt), HumanMessage(user_prompt)]

The separation between orchestration and final synthesis is deliberate. The orchestration agent optimizes for one thing: gathering the right information. The final LLM call optimizes for another: producing a high-quality, well-formatted response. These are different tasks, and conflating them into a single LLM call would mean asking one model to be both a diligent researcher and an eloquent writer simultaneously. Separating them lets us optimize each independently, including using different model tiers for each.

For structured output, we handle the formatting at the provider level so that agent recipes don't need to care which LLM they're running on. Whether the underlying model is OpenAI or Anthropic, the recipe just declares a JSON schema and gets back conformant output.This means agent output can be consumed programmatically by downstream workflow steps — not just displayed to a human reviewer.

Node 4: Post-Processing: Citations, corrections, and formatting

The post-processing node handles everything that happens after the LLM generates its response:

Citation processing: If the agent used document retrieval and citations are enabled, process_markdown_citations() maps LLM-generated citation tags back to specific document chunks, pages, and OCR word coordinates. A user reading an MSA risk review can click a citation and jump directly to the relevant clause in the original PDF.
Data validation actions: Actionable corrections: When an agent identifies discrepancies between request fields and source documents, the post-processing node can surface those as one-click corrections
Output formatting: Applies the requested output format — raw text, markdown, or structured JSON.

The Tool System: The blocks themselves

If the recipe is the instruction booklet and the execution engine is the factory, then the tools are the Lego blocks themselves. Each tool is a structured interface to a different part of Zip's procurement data graph.

Every tool extends GenericAiToolBase[INPUT, OUTPUT], a generic base class that enforces a clean contract:

class GenericAiToolBase(ABC, Generic[INPUT, OUTPUT]):
	name: str # "document_retrieval", "api_data", etc.
	description: str # LLM reads to decide when to call it
	input_schema: type[INPUT] # Pydantic model defining tool's parameters

	async def execute(self, input: INPUT) -> OUTPUT: ...
	def get_ai_readable_string(self, output: OUTPUT) -> str: ...

	async def run(self, input: INPUT) -> ToolExecutionResult:
		raw_output = await self.execute(input)
		return ToolExecutionResult(
			tool_name=self.name, 
			ai_readable_string=self.get_ai_readable_string(raw_output),
			raw_output=raw_output,
		)

‍

Here is what each tool actually does:

Tool	What it does
document_retrieval	Vector search across all documents attached to referenced objects. Returns relevant chunks with metadata for citation linking.
api_data	Fetches structured metadata from purchase requests, contracts, POs, invoices — status, amounts, dates, vendor info, custom fields.
ai_company_context	Retrieves company-specific policies and procurement guidelines from a global reference library.

‍

The ai_company_context tool deserves special attention in the context of Custom Agents. This is how a customer's specific policies and risk tolerances make it into the agent's reasoning. A customer's legal team can load their standard playbook into the reference library, like indemnification thresholds, data residency requirements, approved vendor categories, and the agent will consult that context during every run. The Eiffel Tower instructions become your instructions.

Tool dependencies

Tools can build on each other's output. An agent reviewing a renewal might first pull the original contract via document retrieval, then check that contract's terms against company policy via the context library. What's notable is that this sequencing isn't hardcoded anywhere. The ReAct agent figures it out from the tool descriptions: it gathers information first, then reasons over it. The blocks snap together because of how they are shaped, not because someone wrote a rule about what order to use them.

What building a custom agent actually looks like

A customer comes to Zip with a specific need. They work with suppliers across both their US and EU entities, and every renewal needs to validate that the data processing terms are consistent with both their US and EU legal playbooks, not just one. No out-of-the-box agent does this exactly.

We build them a recipe. It enables document_retrieval to read the renewal document and existing contracts, ai_company_context to pull both the US and EU policy documents from their reference library, and a custom prompt that instructs the agent to compare the renewal terms against both playbooks and flag any inconsistencies between them. The output is structured JSON, categorized by jurisdiction, consumed by a downstream workflow step that routes findings to the right legal reviewer automatically.

No new tools were written. No new execution engine was built. The same platform that runs our SOC 2 Risk Review agent handled a use case we had never seen before, because the blocks were already there.

Closing: The instructions were always optional

The out-of-the-box agents we ship are complete. They do exactly what they're designed to do and they do it well. But the reason we built them on a shared, configurable platform rather than ten standalone services was always that they were demonstrations, or proof of what the blocks could build.

The real ambition was a platform where any procurement workflow, at any company, could be served by an agent that understood that company's specific context, policies, and risk tolerances. Not an approximation, and not a general-purpose tool with a thin configuration layer, but something that could read your legal playbook, reason against your supplier data, and surface exactly what your team needs at exactly the right moment in the workflow.

That's what Custom Agents are built for. And it turns out, the best way to get there was to stop thinking about the instructions and start thinking about the blocks.

‍