WebMCPAI AgentsDevelopmentArchitecture

WebMCP AI Agents: How to Build Agents That Use WebMCP Tools

Q: Can I build a WebMCP agent without a browser extension?

Yes, if you're targeting Chrome 146 or later. The navigator.modelContext API is built into the browser natively (behind a flag in Canary, expected to ship enabled by default later in 2026). For older browsers, you'll need either the WebMCP polyfill loaded on the target page or a browser extension like MCP-B that injects WebMCP support. Your agent code stays the same either way.

Mukul Dutt

·March 13, 2026·12 min read

Most WebMCP content tells you how to add tools to your website. How to register a search function, how to expose your product catalog. How to make forms agent-discoverable.

But there's another side to this story that almost nobody is writing about: the agent side. Who's building the AI agents that actually discover and use those tools? And how does that work in practice?

If you want to build AI agents that use WebMCP, you need to understand tool discovery, schema parsing, execution patterns, and how to handle the messy reality of tools that vary wildly across websites. This guide covers all of it, from the basic architecture to advanced multi-site agent workflows.

The agent-builder perspective matters because it directly shapes how website owners should design their tools. If you understand how agents consume tools, you'll build better tools. And if you're building agents, you'll understand why some websites are easy to work with and others are a nightmare.

The agent-side architecture

How AI agents discover WebMCP tools

When your agent navigates to a webpage, the first question it needs to answer is: what can I do here?

That's where navigator.modelContext comes in. This browser API is the agent's entry point to every WebMCP tool on the page. Calling navigator.modelContext.tools() returns an array of tool objects, each with a name, description, and input schema. The agent doesn't need to guess what's available. The page tells it.

Here's what that looks like in practice:

// Agent lands on a page and discovers tools
const tools = await navigator.modelContext.tools();
// Returns something like:
// [
//   { name: "search-products", description: "Search products by keyword...", inputSchema: {...} },
//   { name: "get-pricing", description: "Get pricing for a specific plan...", inputSchema: {...} },
//   { name: "contact-support", description: "Submit a support request...", inputSchema: {...} }
// ]

That's the runtime discovery path. Your agent visits a page and reads what's available in real time.

But there's also manifest-based discovery for situations where you want to know what tools a site offers before navigating there. Some websites publish a WebMCP manifest file (similar to a robots.txt or sitemap.xml) that lists their available tools. Your agent can fetch this manifest, evaluate the tools, and decide whether it's worth navigating to the page at all.

The third approach is using a browser extension as the agent runtime. Extensions like the MCP-B connector sit between the browser and your agent's LLM backbone, forwarding tool registrations to the AI model. This is how Claude and other chat-based agents can consume WebMCP tools without needing to run inside the page's JavaScript context directly.

Tool selection: how agents choose which tools to call

Discovering tools is the easy part. The harder question is: which tool should my agent call for this particular user request?

This is where schema analysis and intent matching come in. Your agent has a user's request ("find me running shoes under $100") and a list of available tools. It needs to figure out which tool matches the intent, what parameters to pass, and how confident it is in the match.

The tool's description field is the most important signal. A well-written description like "Search products by keyword, with optional price and category filters" gives the agent clear context. A vague description like "Search stuff" forces the agent to guess.

After description matching, the agent examines the inputSchema. This tells it what parameters the tool accepts, what types they expect, and which ones are required. The agent uses this to construct a valid call:

// Agent analyzes schema and builds the call
const tool = tools.find(t => t.name === "search-products");
// tool.inputSchema shows: { query: string (required), maxPrice: number (optional), category: string (optional) }

const result = await navigator.modelContext.callTool("search-products", {
  query: "running shoes",
  maxPrice: 100
});

Confidence scoring matters when multiple tools could match a request. If a page has both search-products and search-articles, the agent needs to pick the right one. Most agent frameworks handle this by passing the tool list to the LLM and letting it reason about which tool fits the user's intent. The LLM reads the descriptions and schemas, then makes a selection.

When confidence is low, a good agent asks the user for clarification rather than guessing wrong.

Building a basic WebMCP agent

Agent architecture overview

A WebMCP agent has two main components: an LLM backbone that handles reasoning and intent matching, and a browser runtime that handles tool discovery and execution.

The LLM backbone can be any capable model. GPT-4, Claude, Gemini, or an open-source model like Llama. Its job is to understand what the user wants, evaluate available tools, decide which one to call, and interpret the results. The model never touches the browser directly. It works through a structured interface.

The browser runtime is where WebMCP tools actually live. Your agent needs a real browser context (Chromium with navigator.modelContext support) to discover and call tools. This can be a headed browser the user sees, a headless browser running in the background, or a browser extension that bridges the gap.

The agent loop looks like this:

Receive user request ("Find the cheapest flight to Tokyo next week")
Navigate to relevant websites (airline sites, travel aggregators)
Discover tools on each page via navigator.modelContext.tools()
Match user intent to available tools using the LLM
Execute the matched tool with appropriate parameters
Process the response and present results to the user
Repeat if the task requires multiple sites or follow-up actions

The key insight here is that the LLM and the browser have separate responsibilities. The LLM reasons about what to do. The browser does it. Keeping these concerns separate makes your agent architecture cleaner and easier to debug.

Implementing tool discovery

Let me walk through a concrete implementation. Your agent receives a request and needs to find relevant tools on a website.

First, navigate to the target page. If you're using Playwright or Puppeteer as your browser runtime (yes, they're still useful for driving the browser, even if WebMCP replaces the clicking part), this is straightforward:

const { chromium } = require('playwright');

async function discoverTools(url) {
  const browser = await chromium.launch({ channel: 'chrome-canary' });
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: 'networkidle' });

  // Discover WebMCP tools
  const tools = await page.evaluate(() => {
    if (!navigator.modelContext) return [];
    return navigator.modelContext.tools().map(tool => ({
      name: tool.name,
      description: tool.description,
      schema: tool.inputSchema
    }));
  });

  return { page, browser, tools };
}

Once you have the tools, pass them to your LLM for intent matching. The LLM sees the tool descriptions and schemas and decides which one to call:

async function matchToolToIntent(tools, userRequest, llm) {
  const prompt = `Given these available tools:
${tools.map(t => `- ${t.name}: ${t.description}`).join('\n')}

User request: "${userRequest}"

Which tool should I call, and with what parameters?
Respond with JSON: { "tool": "name", "params": {...} }`;

  const response = await llm.complete(prompt);
  return JSON.parse(response);
}

Then execute the tool call through the browser:

async function executeTool(page, toolName, params) {
  return await page.evaluate(async ({ name, params }) => {
    return await navigator.modelContext.callTool(name, params);
  }, { name: toolName, params });
}

That's a minimal but functional WebMCP agent. Discover tools, match intent, execute, return results.

Advanced agent patterns

Multi-site agent workflows

The real power of WebMCP agents shows up when they work across multiple websites.

Say a user asks: "Find me the best price for a Sony WH-1000XM5 headphone." A multi-site agent would navigate to several electronics retailers, discover their search-products tools, call each one with the same query, and aggregate the results for comparison.

Without WebMCP, this kind of cross-site comparison requires custom scraping logic for each retailer. Different DOM structures, different selectors, entirely different page layouts. With WebMCP, every site that has a product search tool exposes it through the same navigator.modelContext interface. The agent's code works the same regardless of whether it's talking to Amazon, Best Buy, or a small independent shop.

Here's what multi-site execution looks like:

async function compareAcrossSites(urls, query) {
  const results = [];

  for (const url of urls) {
    const { page, browser, tools } = await discoverTools(url);
    const searchTool = tools.find(t =>
      t.description.toLowerCase().includes('search') &&
      t.description.toLowerCase().includes('product')
    );

    if (searchTool) {
      const data = await executeTool(page, searchTool.name, { query });
      results.push({ site: url, products: data });
    }

    await browser.close();
  }

  return results;
}

The speed advantage is dramatic here. Each site interaction takes milliseconds instead of the seconds that browser automation would require. A five-site comparison that would take a minute with Playwright finishes in under ten seconds with WebMCP tools.

Error handling and recovery

Real-world WebMCP tool calls fail. Sites go down. Tools return unexpected data. Schemas change between visits. Your agent needs to handle all of this gracefully.

Tool execution failures are the most common issue. The tool might throw an error, return null, or time out. Wrap every tool call in error handling and decide whether to retry, skip, or fall back:

async function safeToolCall(page, toolName, params, retries = 2) {
  for (let i = 0; i <= retries; i++) {
    try {
      const result = await executeTool(page, toolName, params);
      if (result && !result.error) return result;
      if (i < retries) await new Promise(r => setTimeout(r, 1000));
    } catch (err) {
      if (i === retries) return { error: err.message, fallback: true };
    }
  }
}

Schema validation errors happen when your agent sends parameters that don't match what the tool expects. Check the inputSchema before calling. If a required parameter is missing or a type is wrong, fix it before sending the call rather than letting it fail.

And sometimes the best fallback is browser automation. If a site's WebMCP tools are broken or missing, your agent should be able to drop back to Playwright-style interaction. This hybrid approach means your agent works everywhere, just faster and more reliably on sites with working WebMCP tools.

Which AI platforms support WebMCP agents?

Current platform support

The platform landscape for WebMCP agent development is still forming, but the pieces are falling into place.

Chrome's built-in AI integration is the most direct path. Chrome 146 Canary has native navigator.modelContext support behind a flag. Google is building Gemini's agent capabilities to work directly with this API, meaning Chrome itself becomes the agent runtime. Your agent code runs in the browser, discovers tools on whatever page the user visits, and calls them through the native API.

For ChatGPT and Claude, the path runs through browser extensions and MCP bridges. Claude's MCP ecosystem already understands tool schemas, so bridging from WebMCP's browser-side tools to Claude's tool consumption is architecturally straightforward. ChatGPT's browsing mode doesn't support WebMCP natively yet, but OpenAI's trajectory toward deeper browser integration suggests this is a matter of when, not whether.

Open-source agent frameworks like LangChain, CrewAI, and AutoGen are adding WebMCP support through community plugins. These frameworks already handle tool discovery and execution as core concepts. WebMCP is just another tool source, and the integration patterns map cleanly onto existing framework abstractions.

Building platform-agnostic agents

The smartest architecture for a WebMCP agent abstracts the LLM backend entirely.

Your tool discovery and execution code stays the same regardless of which LLM powers the reasoning. The browser interacts with navigator.modelContext the same way whether GPT-4 or Claude is making decisions about which tools to call. Only the intent-matching prompt changes.

This means you can swap LLM backends without touching your WebMCP integration code. Test with Claude for development and deploy with GPT-4 for production. Switch to an open-source model later if costs matter. The WebMCP layer doesn't care.

The tool schema acts as the universal interface between your agent and any website. As long as the site exposes tools with clear descriptions and typed schemas, your agent works with it. As long as your LLM can reason about tool selection, the backend works too. This decoupling is one of the strongest architectural benefits of building on WebMCP.

What this means for agent builders

The WebMCP agent ecosystem is early, and that's exactly why building now matters.

Most agent frameworks still default to browser automation or custom API integrations. Building WebMCP consumption into your agent gives it an advantage on every site that supports the protocol. And that list is growing.

If you're a website owner reading this from the other side, take note: the quality of your tool descriptions directly impacts how well agents can use your tools. Vague descriptions produce confused agents. Clear, specific descriptions with well-typed schemas make your tools easy to discover and call correctly.

The agent side and the tool side are two halves of the same system. Building one well requires understanding the other.

Frequently asked questions

Can I build a WebMCP agent without a browser extension?

Yes, if you're targeting Chrome 146 or later. The navigator.modelContext API is built into the browser natively (behind a flag in Canary, expected to ship enabled by default later in 2026). For older browsers, you'll need either the WebMCP polyfill loaded on the target page or a browser extension like MCP-B that injects WebMCP support. Your agent code stays the same either way.

How do agents handle authentication on WebMCP sites?

Sites can register auth-gated tools that only appear when the user is logged in. Your agent inherits the browser's session state, so if the user has authenticated with the website in their browser, the agent sees the authenticated tools. For headless agent setups, you'd need to handle cookie injection or OAuth flows before tool discovery. The tools themselves don't manage authentication. That's the browser's job.

What happens if a website's tools are poorly designed?

Your agent struggles. Vague tool descriptions make intent matching unreliable. Missing schema definitions mean your agent can't validate parameters before calling. Overly broad tools that try to do too much return noisy results. The best mitigation is confidence scoring: if your agent isn't confident about a tool match, it should ask the user for guidance rather than calling a tool it doesn't understand.

How is this different from building MCP server integrations?

MCP servers expose tools through a backend protocol. WebMCP exposes tools through the browser. An MCP server integration requires the website to run a server-side process. WebMCP tools run in the browser's JavaScript context and require no server infrastructure. The agent-side consumption pattern is similar (discover tools, match intent, call tool), but the runtime environment is different. Many agents will eventually support both.

What's the minimum viable WebMCP agent?

About 50 lines of JavaScript. You need Playwright (or similar) to launch a browser, page.evaluate() to call navigator.modelContext.tools() for discovery, a simple LLM call for intent matching, and another page.evaluate() to execute the matched tool. The code examples in this article form a working foundation. Add error handling and multi-site support, and you have a production-capable agent.