--- name: playwright-browser description: Use when automating browser interactions, taking screenshots, scraping authenticated pages, or running multi-step web workflows. Triggers on "automate browser", "take a screenshot of", "run this in the browser", "Playwright", "browser automation", or when a task requires interacting with a web UI. --- # Playwright browser automation You have two ways to control browsers with Playwright. Pick whichever fits the task. ## Option 1: Playwright MCP server (simpler tasks) The Playwright MCP server is installed globally. It gives you tools like `browser_navigate`, `browser_click`, `browser_type`, `browser_screenshot`, etc. Good for quick tasks: navigate to a page, click something, take a screenshot. The MCP tools handle browser lifecycle for you. Just call them directly. If the MCP server isn't responding, the user can reinstall it: ```bash claude mcp add -s user playwright -- npx @anthropic-ai/mcp-server-playwright@latest ``` ## Option 2: Playwright scripts via CDP (complex automation) For multi-step workflows, loops, conditional logic, or connecting to an existing browser session where the user is already logged in, write a Node.js script and run it with Bash. ### Connecting to an existing Chrome session This is the most common pattern -- the user has a browser open and logged into something. You connect to it without disturbing their session. **Launch a separate Chrome instance** (so you don't kill their tabs): ```bash # macOS /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \ --remote-debugging-port=9223 \ --user-data-dir=/tmp/chrome-playwright \ --no-first-run \ "https://example.com" & # Linux google-chrome \ --remote-debugging-port=9223 \ --user-data-dir=/tmp/chrome-playwright \ --no-first-run \ "https://example.com" & ``` The user logs in manually, then you connect: ```javascript const { chromium } = require('playwright'); const browser = await chromium.connectOverCDP('http://localhost:9223'); const page = browser.contexts()[0].pages()[0]; ``` **Verify the connection** before running scripts: ```bash curl -s http://localhost:9223/json/version ``` ### Install Playwright (if needed) ```bash cd /tmp && npm init -y && npm install playwright ``` ## Writing automation scripts ### Typing into contenteditable fields Many modern web apps use contenteditable divs instead of regular inputs. Playwright's `type()` can be unreliable with these. Use clipboard paste instead: ```javascript async function typeText(page, text) { await page.evaluate(async (t) => { await navigator.clipboard.writeText(t); }, text); // Use 'Meta+v' on Mac, 'Control+v' on Linux/Windows const modifier = process.platform === 'darwin' ? 'Meta' : 'Control'; await page.keyboard.press(`${modifier}+v`); } ``` ### Waiting for dynamic content When the page is loading or generating content, poll for a visual indicator rather than using fixed waits: ```javascript async function waitForCompletion(page, opts = {}) { const { indicator, // selector that's visible while loading timeoutMs = 120000, settleMs = 3000 // how long indicator must be gone before we trust it } = opts; const startTime = Date.now(); await page.waitForTimeout(3000); // initial grace period while (Date.now() - startTime < timeoutMs) { const isActive = await page.locator(indicator).isVisible().catch(() => false); if (!isActive) { await page.waitForTimeout(settleMs); const stillGone = !(await page.locator(indicator).isVisible().catch(() => false)); if (stillGone) return true; } await page.waitForTimeout(2000); } return false; // timed out } ``` ### Scrolling screenshots When content is longer than the viewport, capture it in chunks: ```javascript async function scrollingScreenshots(page, container, baseName, dir) { const fs = require('fs'); const el = page.locator(container); if (!(await el.count())) { await page.screenshot({ path: `${dir}/${baseName}.png` }); return; } // Scroll to bottom to force all content to render await el.evaluate(e => e.scrollTop = e.scrollHeight); await page.waitForTimeout(1000); const scrollHeight = await el.evaluate(e => e.scrollHeight); const clientHeight = await el.evaluate(e => e.clientHeight); // Short content -- just screenshot if (scrollHeight <= clientHeight * 1.5) { await page.screenshot({ path: `${dir}/${baseName}.png` }); return; } // Scroll back to top await el.evaluate(e => e.scrollTop = 0); await page.waitForTimeout(500); const overlap = 80; const step = clientHeight - overlap; let position = 0; let idx = 1; const maxScreenshots = 5; while (idx <= maxScreenshots) { await el.evaluate((e, pos) => e.scrollTop = pos, position); await page.waitForTimeout(300); const suffix = idx === 1 ? '' : `-${idx}`; await page.screenshot({ path: `${dir}/${baseName}${suffix}.png` }); position += step; if (position >= scrollHeight - clientHeight) { if (idx < maxScreenshots) { idx++; await el.evaluate(e => e.scrollTop = e.scrollHeight); await page.waitForTimeout(300); await page.screenshot({ path: `${dir}/${baseName}-${idx}.png` }); } break; } idx++; } } ``` ### Copying text from the page If the page has a "Copy" button, use it and read from clipboard: ```javascript async function copyFromButton(page, buttonSelector) { try { const btn = page.locator(buttonSelector).last(); await btn.waitFor({ state: 'visible', timeout: 5000 }); await btn.click(); await page.waitForTimeout(1000); return await page.evaluate(async () => await navigator.clipboard.readText()); } catch (e) { return null; } } ``` ### Null safety with component libraries Many component libraries (Fluent UI, Material UI, etc.) return null from `textContent()` or `getAttribute()`. Always guard: ```javascript const text = (await el.textContent().catch(() => '')) || ''; const aria = (await el.getAttribute('aria-label').catch(() => '')) || ''; ``` ### Navigation in SPAs Single-page apps often keep loading resources indefinitely. Use `domcontentloaded` instead of `networkidle`: ```javascript await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 }); await page.waitForTimeout(5000); // let the SPA render ``` ### Nested iframes Some apps embed content in iframes. Access them with `frameLocator`: ```javascript const frame = page.frameLocator('iframe[src*="target-domain"]'); await frame.locator('button').click(); ``` Cross-origin iframes may not be accessible. If `frameLocator` can't find elements, the iframe is probably cross-origin and you'll need to work around it (screenshot the parent page, or have the user do that step manually). ### Finding selectors in unfamiliar apps When you don't know the selectors, dump what's on the page: ```javascript // List all visible buttons with their labels const buttons = page.locator('button:visible'); const count = await buttons.count(); for (let i = 0; i < count; i++) { const btn = buttons.nth(i); const text = (await btn.textContent().catch(() => '')) || ''; const aria = (await btn.getAttribute('aria-label').catch(() => '')) || ''; const box = await btn.boundingBox(); if (box) console.log(`Button: "${text.trim()}" aria="${aria}" y=${Math.round(box.y)}`); } ``` Also try `[role="option"]`, `[role="menuitem"]`, `[role="tab"]` for dropdown/menu items. ## Script template Here's a starting point for a multi-step automation: ```javascript const { chromium } = require('playwright'); const PROMPTS = [ { name: 'task-1', prompt: 'Your prompt here' }, { name: 'task-2', prompt: 'Another prompt' }, ]; const OUTPUT_DIR = '/tmp/playwright-output'; (async () => { const browser = await chromium.connectOverCDP('http://localhost:9223'); const page = browser.contexts()[0].pages()[0]; const fs = require('fs'); fs.mkdirSync(OUTPUT_DIR, { recursive: true }); for (const { name, prompt } of PROMPTS) { console.log(`Running: ${name}`); // Type and send const textbox = page.locator('[role="textbox"]'); await textbox.click(); await typeText(page, prompt); await page.waitForTimeout(500); await page.keyboard.press('Enter'); // Wait for response (customize the indicator selector) // await waitForCompletion(page, { indicator: 'button:has-text("Stop")' }); // Screenshot await page.screenshot({ path: `${OUTPUT_DIR}/${name}.png` }); await page.waitForTimeout(2000); } console.log('Done.'); })(); ``` ## Gotchas - **Port 9222 vs 9223**: Use 9223 (or any non-default port) to avoid conflicts with other debugging tools. Some apps use 9222 internally. - **Mac vs Linux keyboard**: `Meta+v` on Mac, `Control+v` on Linux/Windows. Use `process.platform` to detect. - **Timeouts**: `page.goto()` defaults to 30s. For slow-loading apps, set `timeout: 60000`. - **Multiple pages**: `browser.contexts()[0].pages()` gives you all open tabs. `pages()[0]` is usually the active one, but check if the user has multiple tabs open. - **Clipboard permissions**: CDP connections inherit the browser's permissions. If clipboard access fails, the user may need to grant permission to the site first. - **The Chrome instance persists**: The user-data-dir at `/tmp/chrome-playwright` keeps cookies and sessions until reboot. Good for staying logged in across script runs.