# Oya Browser > The browser built for AI agents. A real browser that agents control via MCP. ## Overview Oya Browser is a desktop browser (Electron) that connects to the Oya Browser server via WebSocket. AI tools (Claude Desktop, Cursor, Windsurf, any MCP client) connect to the server via MCP to control the browser. Users run the browser on their machines with their real cookies, logins, and sessions. The server is hosted at browser.oya.ai. Each user creates an API key from the dashboard and uses it to connect their browser. Architecture: - User's machine: Oya Browser app → WebSocket → Oya Server - AI tool: MCP client → Oya Server → WebSocket → Browser ## Quickstart 1. Go to https://browser.oya.ai/dashboard and click Generate to create an API key 2. Download Oya Browser (macOS .dmg or Linux .AppImage) 3. macOS: after install run `xattr -cr /Applications/Oya\ Browser.app` (app is not notarized yet). To open multiple instances: `open -n "/Applications/Oya Browser.app" --args --user-data-dir=/tmp/oya-2` 4. Open the app, enter `wss://browser.oya.ai/ws` as server URL and paste your API key 4. Your browser appears in the dashboard — connect AI tools via MCP ## MCP Endpoint Each connected browser gets an MCP endpoint: https://browser.oya.ai/mcp/{BROWSER_ID} MCP config for Cursor / Claude Desktop: { "mcpServers": { "oya-browser": { "url": "https://browser.oya.ai/mcp/BROWSER_ID", "transport": "streamable-http", "headers": { "Authorization": "Bearer YOUR_API_KEY" } } } } ## MCP Tools ### analyze_page Analyzes the current page. Returns full page as structured markdown with every interactive element numbered as [#id type "label"]. Includes viewport size, scroll position, visibility flags. No parameters. IMPORTANT: Always call analyze_page BEFORE click or type. Element IDs only exist after analysis and reset on every call. After navigating or clicking a link that changes the page, call analyze_page again. Returns: - Page metadata (url, title, viewport, scroll position) - Full page content as markdown with inline element annotations - Element index grouped by visible/off-screen ### navigate Navigate the browser to a URL. Parameters: url (string, required) ### click Click an interactive element by its ID number from analyze_page. Parameters: element_id (number, required) ### type Type text into an input element. Clears existing content first, types character by character. Parameters: element_id (number, required), text (string, required) ### press_key Press a keyboard key. Parameters: key (string, required) — "Enter", "Escape", "Tab", "Backspace", "ArrowDown", "ArrowUp", or any character ### screenshot Capture the visible tab as a base64 PNG image. No parameters. ### scroll Scroll the page up or down. Parameters: direction ("up" or "down", required), amount (number, optional, default 500) ### list_tabs List all open tabs with ID, title, URL, and which is active. No parameters. ### open_tab Open a new browser tab. Parameters: url (string, optional) ### switch_tab Switch to a different tab. Parameters: tab_id (number, required) ### close_tab Close a tab. Closes active tab if no tab_id specified. Parameters: tab_id (number, optional) ### wait Wait for an element matching a CSS selector to appear. Parameters: selector (string, required), timeout (number, optional, default 10000ms) ### read_elements List interactive elements on the page. Lighter than analyze_page. Parameters: selector (string, optional), limit (number, optional, default 50) ## Workflow Pattern 1. analyze_page → understand the page, get element IDs 2. Act: click(element_id), type(element_id, text), press_key("Enter"), scroll("down") 3. If page changed → analyze_page again (old IDs are invalid) 4. Repeat until task is done ## Element Annotation Format analyze_page returns elements inline in the markdown: [#5 button "Submit"] [#9 input:email placeholder="you@example.com" required] [#12 link "Settings" → /settings] [#8 ☑ "Remember me"] Element IDs are real attributes (data-ac-id) on the DOM — click(5) resolves via querySelector('[data-ac-id="5"]'). ## Browser Pool (scale mode) Run hundreds or thousands of browsers as a single pool with round-robin dispatch and automatic cookie sync. ### Setup Set FLEET_TOKEN env var on the server. All browsers use that token as their API key. They auto-join the pool. ### Pool MCP Endpoint POST /mcp/pool Authorization: Bearer FLEET_TOKEN Same tools as per-browser MCP. navigate and analyze_page advance the round-robin. click/type/screenshot stay pinned to the last-used browser so element IDs remain valid. Additional tool: pool_status — shows pool size and connected browsers. ### Cookie Sync Cookies are scoped to the API key. Browsers sharing a key share cookies; different API keys are fully isolated from each other. - On connect: browser sends its cookies, receives the jar for its API key - On change: cookie changes are broadcast to other browsers with the same key - Login once with key K → every browser using K gets the session; other keys are unaffected ### Pool REST API GET /pool Pool status (size + browser list) POST /pool/command Round-robin command dispatch GET /pool/cookies View shared cookie jar DELETE /pool/cookies Clear shared cookie jar POST /fleet/provision?count=N Batch-generate N API keys (admin only) ### Pool MCP Config (for AI clients) { "mcpServers": { "oya-pool": { "url": "https://browser.oya.ai/mcp/pool", "transport": "streamable-http", "headers": { "Authorization": "Bearer FLEET_TOKEN" } } } } ## REST API All endpoints require Authorization: Bearer API_KEY header (except /health). API keys are created via the dashboard after sign-in (POST /auth/keys). Interactive API testing (Swagger UI): GET /swagger GET /health Server status + browser count POST /auth/signup Create user account POST /auth/login Sign in → access_token POST /auth/keys Create API key (user JWT required) GET /browsers List connected browsers (scoped to your key) POST /browsers/:id/command Send command (body: { "action": "...", "params": {} }) POST /browsers/:id/chat Chat with LLM (body: { "messages": [...] }) GET /live/:id?key=... SSE live view frame stream GET /mcp/:id MCP Streamable HTTP endpoint POST /mcp/:id MCP Streamable HTTP endpoint POST /mcp/pool Pool MCP endpoint (round-robin) GET /pool Pool status POST /pool/command Pool round-robin command GET /pool/cookies Per-key cookie jar (admin sees all) POST /fleet/provision?count=N Batch-generate API keys (admin) GET /config Get server settings (admin) POST /config Update server settings (admin) ### Command API — POST /browsers/:id/command Each action uses only specific params. Send { "action": "...", "params": { ... } }. Navigation actions: navigate — params: url (required) — Navigate to a URL open_tab — params: url (optional) — Open a new tab switch_tab — params: tab_id (required) — Activate a tab by ID close_tab — params: tab_id (optional) — Close a tab (defaults to active) list_tabs — no params — List all open tabs Page analysis actions: analyze — no params — Full page as markdown + numbered elements read_page — params: selector, limit (default 50) — Lightweight element listing screenshot — no params — Capture page as PNG Interaction actions: click — params: selector (e.g. [data-ac-id="3"]) — Click an element type — params: selector + text — Type into an input press_key — params: key (Enter, Tab, Escape, etc.) — Press a keyboard key scroll — params: direction (up/down), amount (default 500) — Scroll the page wait — params: selector, timeout (ms, default 10000) — Wait for element to appear Examples: { "action": "navigate", "params": { "url": "https://google.com" } } { "action": "analyze" } { "action": "click", "params": { "selector": "[data-ac-id=\"3\"]" } } { "action": "type", "params": { "selector": "[data-ac-id=\"9\"]", "text": "hello" } } { "action": "press_key", "params": { "key": "Enter" } } { "action": "scroll", "params": { "direction": "down", "amount": 500 } } { "action": "screenshot" } { "action": "list_tabs" } { "action": "open_tab", "params": { "url": "https://gmail.com" } } { "action": "switch_tab", "params": { "tab_id": 2 } } { "action": "close_tab", "params": { "tab_id": 3 } } { "action": "wait", "params": { "selector": ".results", "timeout": 10000 } } { "action": "read_page", "params": { "limit": 20 } } ## WebSocket Protocol Browsers connect via WebSocket at wss://browser.oya.ai/ws Auth: { "type": "auth", "api_key": "...", "browser_id": "...", "browser_name": "..." } Response: { "type": "auth_ok", "browser_id": "..." } Commands (server → browser): { "type": "cmd", "id": "uuid", "action": "analyze", "params": {} } Results (browser → server): { "type": "cmd_result", "id": "uuid", "ok": true, "data": { ... } } Ping/pong: Both sides send { "type": "ping" } and respond with { "type": "pong" } every 15-20s. Live stream: { "type": "stream_start", "fps": 2 } / { "type": "frame", "data": "data:image/jpeg;base64,..." } / { "type": "stream_stop" } ## Key Scoping Each API key only sees browsers connected with that key. Users cannot see or control other users' browsers. Admin keys (set via API_KEYS env var) can see all browsers. ## OpenAPI Schema Machine-readable OpenAPI 3.0 spec: GET /openapi.json