Documentation
Everything you need to connect your browser to AI agents via Oya Browser.
Quickstart
- Go to the dashboard and click Generate to create an API key
- Oya Browser for your OS
- Open the app, enter
wss://browser.oya.ai/wsas server URL and paste your API key - Your browser appears in the dashboard — you can now send commands or connect AI tools
Create API Key
Go to the dashboard. Click the Generate button next to the API key field. This creates a random key and registers it with the server.
Your key is scoped — you only see browsers connected with your key. Other users' browsers are invisible to you.
Download Browser
| Platform | Download |
|---|---|
| macOS (Intel + Apple Silicon) | Oya Browser.dmg |
| Linux (arm64) | Oya Browser.AppImage |
macOS:Open the .dmg, drag to Applications. On first launch, macOS may block the app because it's not notarized. Fix:
xattr -cr /Applications/Oya\ Browser.appOr: right-click the app → Open → Open (bypasses Gatekeeper once).
Linux: chmod +x the AppImage and run it.
Running multiple instances
To open multiple browser windows (e.g. different accounts or different API keys):
# macOS — open another instance
open -n "/Applications/Oya Browser.app"
# With separate sessions (own cookies, own config)
open -n "/Applications/Oya Browser.app" --args --user-data-dir=/tmp/oya-2
open -n "/Applications/Oya Browser.app" --args --user-data-dir=/tmp/oya-3
# Linux
./Oya-Browser.AppImage --user-data-dir=/tmp/oya-2Each --user-data-dir gets its own cookies, logins, and config — fully isolated sessions.
Connect
Open Oya Browser. The setup screen appears on first launch.
| Field | Value |
|---|---|
| Server URL | wss://browser.oya.ai/ws |
| API Key | The key you generated in the dashboard |
| Browser Name | Optional — how it shows in the dashboard |
Click Connect. The green dot in the toolbar confirms the connection. Your browser now appears in the dashboard.
MCP Setup
Oya Browser exposes each connected browser as an MCP server at:
https://browser.oya.ai/mcp/{BROWSER_ID}Get your browser's ID from the dashboard (shown under each browser name, or in the MCP Tools tab).
Cursor
Add to .cursor/mcp.json in your project:
{
"mcpServers": {
"oya-browser": {
"url": "https://browser.oya.ai/mcp/YOUR_BROWSER_ID",
"transport": "streamable-http",
"headers": {
"Authorization": "Bearer YOUR_API_KEY"
}
}
}
}Claude Desktop
Add to Claude Desktop's MCP config (Settings → Developer → Edit Config):
{
"mcpServers": {
"oya-browser": {
"url": "https://browser.oya.ai/mcp/YOUR_BROWSER_ID",
"transport": "streamable-http",
"headers": {
"Authorization": "Bearer YOUR_API_KEY"
}
}
}
}Claude Code
Same config — add to your project's .claude/mcp.json or use the /browse skill command included in the repo.
analyze_page
Analyzes the current page. Returns the full page as structured markdown with every interactive element numbered.
// No parameters
analyze_page()Returns:
- Page metadata — URL, title, viewport size, scroll position
- Full page content as markdown with inline element annotations like
[#5 button "Submit"] - Element index — all elements listed with IDs, types, labels, visibility flags
analyze_page before using click or type. Element IDs only exist after analysis and reset on every call.navigate
Navigate the browser to a URL.
navigate({ url: "https://example.com" })analyze_page again — old element IDs are invalid on the new page.click
Click an interactive element by its ID number from analyze_page.
click({ element_id: 13 })The element was tagged with data-ac-id="13" during analysis — the click resolves via a single querySelector.
type
Type text into an input element. Clears existing content first, then types character by character with realistic key events.
type({ element_id: 9, text: "hello world" })press_key
Press a keyboard key. Useful for submitting forms (Enter), dismissing dialogs (Escape), or navigating (Tab, arrows).
press_key({ key: "Enter" })Supported keys: Enter, Escape, Tab, Backspace, ArrowDown, ArrowUp, or any character.
screenshot
Capture the visible tab as a base64 PNG image.
screenshot()scroll
Scroll the page up or down.
scroll({ direction: "down", amount: 500 })| Param | Type | Description |
|---|---|---|
direction | "up" | "down" | Scroll direction |
amount | number (optional) | Pixels to scroll, default 500 |
Tab Management
list_tabs
List all open tabs with ID, title, URL, and which is active.
list_tabs()open_tab
Open a new tab, optionally at a URL.
open_tab({ url: "https://gmail.com" })switch_tab
Switch to a tab by ID (from list_tabs).
switch_tab({ tab_id: 2 })close_tab
Close a tab. Closes the active tab if no ID specified.
close_tab({ tab_id: 3 })wait
Wait for an element matching a CSS selector to appear on the page.
wait({ selector: ".results", timeout: 10000 })Anonymity
Create and manage browser profiles with unique fingerprints, proxy routing, and isolated cookie stores. Each profile is a complete identity — different canvas hash, WebGL renderer, navigator properties, and session storage. Switch identities with a single MCP call.
Fingerprint Spoofing
Each profile generates a coherent set of browser fingerprints that are internally consistent per platform. A Win32 profile gets Windows GPU strings, Windows fonts, and matching screen resolutions.
- Canvas — deterministic pixel noise on
toDataURLandtoBlob - WebGL — spoofed vendor/renderer strings from real GPU database
- AudioContext — noise on
OfflineAudioContext.startRendering - ClientRects — sub-pixel noise on
getBoundingClientRect(bypassed internally for click accuracy) - Navigator — platform, hardwareConcurrency, deviceMemory, languages, vendor
- Screen — width, height, colorDepth, devicePixelRatio
- WebRTC — ICE candidates stripped to prevent local IP leak
- Fonts — platform-consistent font sets
Proxy Support
Each profile can include a SOCKS5 or HTTP/HTTPS proxy. The proxy is applied at the Electron session level — all traffic routes through it, including DNS (for SOCKS5). Timezone and locale auto-match the proxy's geographic location via CDP Emulation.
create_profile({
platform: "Win32",
timezone: "America/New_York",
proxy_type: "socks5",
proxy_host: "1.2.3.4",
proxy_port: 1080,
proxy_username: "user",
proxy_password: "pass"
})Anti-Detection Stealth
Always active — no configuration needed. The stealth layer removes automation indicators that anti-bot systems check for:
navigator.webdriverremoved- Electron globals (
window.process,window.require) deleted window.chromefixed to match real Chrome (app, runtime, csi, loadTimes)navigator.pluginspopulated with PDF viewersnavigator.permissions.querypatched- Sec-CH-UA headers rewritten to hide Electron
- Google telemetry domains blocked at the network level
list_profiles
List all available anonymity profiles on the connected browser. Shows which profile is active.
list_profiles()Returns each profile's ID, platform, timezone, and whether it has a proxy configured.
create_profile
Create a new anonymity profile with a randomized browser fingerprint. All values are generated to be internally consistent for the chosen platform.
create_profile({
platform: "Win32",
timezone: "Europe/London",
locale: "en-GB"
})| Param | Type | Description |
|---|---|---|
platform | string | Win32, MacIntel, or Linux x86_64 |
timezone | string | IANA timezone (e.g. America/New_York) |
locale | string | Locale (e.g. en-US, en-GB) |
proxy_type | string | http or socks5 |
proxy_host | string | Proxy server hostname or IP |
proxy_port | number | Proxy server port |
proxy_username | string | Proxy auth username |
proxy_password | string | Proxy auth password |
set_profile
Switch to a different anonymity profile. This closes all open tabs and reopens the browser with the new profile's fingerprint, proxy, timezone, and isolated cookie store.
set_profile({ profile_id: "profile-a1b2c3" })| Param | Type | Description |
|---|---|---|
profile_id | string (required) | ID of the profile to activate |
Dashboard
The dashboard at /dashboard is the control panel. It shows your connected browsers and lets you interact with them.
- Generate key — click Generate in the API key bar to create a new key
- Browser list — shows all browsers connected with your key
- Commands tab — quick buttons for analyze, screenshot, scroll + input fields for navigate, click, type
- MCP Tools tab — shows the MCP endpoint URL, copy-paste config for Cursor/Claude, and a tool runner
Chat
Control the browser with natural language — available in both the web dashboard and the desktop app's dev panel. Type "go to google and search for cats" and the AI navigates, types, clicks, and reports back.
- Formatted markdown responses with bold,
code, lists, and headings - Tool call badges showing which MCP tools the AI used (analyze_page, click, type, etc.)
- Copy button on hover to copy any response
- Automatic context trimming when conversations get long
- Conversation history preserved across messages
OPENAI_API_KEYenv var on the server. This is optional — you don't need it for MCP tools.Dev Panel (Desktop App)
The desktop app's dev panel ({} button in the toolbar) has four tabs:
- Chat — natural language browser control with formatted responses and tool badges
- Actions — quick-fire buttons and input fields for every command: analyze, screenshot, navigate, click by element #, type, press keys, hover, scroll, wait, tab management
- Network — live WebSocket traffic with IN/OUT badges, expandable payloads, filter by direction or type (All, In, Out, Commands, Results)
- Source — view the page as AI sees it: toggle between Markdown (analyzePage output) and HTML source, refresh on demand
Live View
The Commands tab shows a live view of the browser below the command buttons. Frames are streamed as JPEG via SSE at ~2fps.
Settings
Click the gear icon next to the API key bar. Configure:
- OpenAI API Key — for the Chat feature
- Chat Model — default
gpt-4o-mini - Base URL — override for compatible APIs (Azure OpenAI, local LLMs, etc.)
Settings are saved on the server and persist across restarts.
REST API
All endpoints require Authorization: Bearer YOUR_API_KEY header (except health and register). Interactive API testing available at /swagger.
| Method | Endpoint | Description |
|---|---|---|
GET | /health | Server status + browser count |
POST | /register-key | Register a new API key ({ "key": "..." }) |
GET | /browsers | List your connected browsers |
POST | /browsers/:id/command | Send command ({ "action": "...", "params": {} }) |
POST | /browsers/:id/chat | Chat ({ "messages": [...] }) |
GET | /live/:id?key=... | SSE live view frame stream |
GET/POST | /mcp/:id | MCP Streamable HTTP endpoint |
GET | /config | Get server settings |
POST | /config | Update server settings |
Command API Reference
Send commands via POST /browsers/:id/command. Each action uses only specific params — the rest are ignored.
Navigation Actions
| Action | Params | Description |
|---|---|---|
navigate | url (required) | Navigate to a URL |
open_tab | url (optional) | Open a new tab |
switch_tab | tab_id (required) | Activate a tab by ID |
close_tab | tab_id (optional, defaults to active) | Close a tab |
list_tabs | none | List all open tabs |
Page Analysis Actions
| Action | Params | Description |
|---|---|---|
analyze | none | Full page as markdown + numbered elements |
read_page | selector (optional), limit (default 50) | Lightweight element listing |
screenshot | none | Capture page as PNG |
Interaction Actions
| Action | Params | Description |
|---|---|---|
click | selector (e.g. [data-ac-id="3"]) | Click an element |
type | selector + text | Type into an input |
press_key | key (e.g. Enter, Tab, Escape) | Press a keyboard key |
scroll | direction (up/down), amount (px, default 500) | Scroll the page |
wait | selector, timeout (ms, default 10000) | Wait for element to appear |
Examples
// Navigate to a page
{ "action": "navigate", "params": { "url": "https://google.com" } }
// Analyze current page (no params needed)
{ "action": "analyze" }
// Click element #3 from analyze results
{ "action": "click", "params": { "selector": "[data-ac-id=\"3\"]" } }
// Type into element #9
{ "action": "type", "params": { "selector": "[data-ac-id=\"9\"]", "text": "hello world" } }
// Press Enter
{ "action": "press_key", "params": { "key": "Enter" } }
// Scroll down
{ "action": "scroll", "params": { "direction": "down", "amount": 500 } }
// Screenshot (no params needed)
{ "action": "screenshot" }
// List all tabs
{ "action": "list_tabs" }
// Open new tab
{ "action": "open_tab", "params": { "url": "https://gmail.com" } }
// Switch to tab
{ "action": "switch_tab", "params": { "tab_id": 2 } }
// Close tab (omit tab_id to close active tab)
{ "action": "close_tab", "params": { "tab_id": 3 } }
// Wait for element
{ "action": "wait", "params": { "selector": ".results", "timeout": 10000 } }
// Read page elements (lightweight)
{ "action": "read_page", "params": { "limit": 20 } }Typical Workflow
1. navigate → go to the page
2. analyze → understand the page, get element IDs
3. click / type / press_key / scroll → interact
4. analyze → re-analyze after page changes (old IDs are invalid)
5. repeat until task is doneWebSocket Protocol
Browsers connect via WebSocket at wss://browser.oya.ai/ws.
Auth
First message from browser:
{ "type": "auth", "api_key": "...", "browser_id": "...", "browser_name": "..." }Server responds:
{ "type": "auth_ok", "browser_id": "..." }Commands
Server → Browser:
{ "type": "cmd", "id": "uuid", "action": "analyze", "params": {} }Browser → Server:
{ "type": "cmd_result", "id": "uuid", "ok": true, "data": { ... } }Ping/Pong
Both sides send { "type": "ping" } and respond with { "type": "pong" } every 15-20 seconds.
Live Stream
Server → Browser: { "type": "stream_start", "fps": 2 }
Browser → Server: { "type": "frame", "data": "data:image/jpeg;base64,..." }
Server → Browser: { "type": "stream_stop" }