Complete Workflow Guide
This guide describes the complete end-to-end workflow of BrowseGenius for AI-powered test automation using Evaluations (Evals) and Phases.
Terminology
- Eval (Evaluation): A complete testing session for a project containing multiple phases
- Phase: A specific navigation section of your application (e.g., Dashboard, Settings, User Management)
- Each phase can have up to 10 test cases
- Evals organize and track testing across all phases
- Computer Use Model: Vision-based AI that sees and interacts with UI like a human (no DOM selectors needed)
- Home Screen: Central hub showing all test plans for the active project
- Plan Details: Detailed view of a single plan with execution controls
- Entry Point Path: Custom path appended to project hostname (e.g.,
/admin,/login) - Authenticated Workflow: Toggle to enable/disable authentication flow and credential handling
- Computer Use Logger: Real-time monitoring tool for Computer Use API calls and actions
Workflow Overview
BrowseGenius uses a hub-based navigation with two main flows:
Creation Flow
Home Screen → Wizard (Capture) → Wizard (Discover) → Wizard (Plan) → Save → Home ScreenExecution Flow
Home Screen → Plan Details → Execute → Reports → Home ScreenNavigation Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ BROWSEGENIUS WORKFLOW │
└─────────────────────────────────────────────────────────────────────┘
HOME SCREEN (Central Hub) PLAN DETAILS
┌────────────────────┐ ┌────────────────────┐
│ All Test Plans │ │ Single Plan │
│ for Project │ │ Details │
│ │ │ │
│ • Filter by status│◀─────────────│ • Entry point │
│ • Create new │ │ • Auth toggle │
│ • View details │─────────────>│ • Credentials │
│ • Search plans │ │ • Test cases │
│ │ │ • Execute button │
│ [+ New Plan] │ │ [← Back] [▶ Run] │
└────────────────────┘ └────────────────────┘
│ │
│ Create New │ Execute
▼ ▼
┌────────────────────┐ ┌────────────────────┐
│ WIZARD MODE │ │ WIZARD MODE │
│ (Creation) │ │ (Execution) │
│ │ │ │
│ Phase 1: CAPTURE │ │ Phase 4: EXECUTE │
┌────────────────────┐ ┌────────────────────┐
│ Take Screenshots │ │ AI Analyzes │
│ of Key Screens │──────────────>│ Application │
│ │ │ │
│ • Select Tab │ │ • App Overview │
│ • Click Capture │ │ • User Roles │
│ • AI Vision │ │ • Navigation │
│ • Screenshot PNG │ │ Phases │
│ • NO DOM needed │ │ • Priorities │
│ Max: 5 screens │ │ │
└────────────────────┘ │ Select phases ✓ │
│ └────────────────────┘
│ │
▼ │
┌────────────────────┐ ▼
│ SAVED EVALS │ ┌────────────────────┐
│ │ │ Phase 3: PLAN │
│ • View existing │ │ │
│ • Select eval │◀─────────────│ AI Generates │
│ • New eval │ │ Tests Per Phase │
│ • Sync from backend│ │ │
│ • Credentials │ │ • Up to 10 tests │
│ • Sitemap links │ │ per phase │
│ │ │ • Steps │
└────────────────────┘ │ • Expectations │
│ │ • Save eval │
│ Select Eval └────────────────────┘
└─────────────────────────> │
│
▼
┌────────────────────┐
│ Phase 4: EXECUTE │
│ │
│ Computer Use │
│ Vision-Based │
│ │
│ ┌──────────────┐ │
│ │ Screenshot │ │
│ └──────┬───────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ AI Vision │ │
│ │ + Credentials│ │
│ │ determines │ │
│ │ action │ │
│ └──────┬───────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Execute │ │
│ │ click(x,y) │ │
│ │ type(text) │ │
│ │ Auto-login │ │
│ └──────┬───────┘ │
│ │ │
│ │ Record │
│ │ actions │
│ │ (JSON) │
│ │ │
│ └──────┐ │
│ Repeat │ │
│ (max │ │
│ 40x) │ │
│ │ │
└────────────────┼───┘
│
▼
┌────────────────────┐
│ Phase 5: REPORTS │
│ │
│ View Results │
│ │
│ • Summary stats │
│ • Test details │
│ • Download │
│ - JSON │
│ - HTML │
│ • Submit to │
│ backend │
└────────────────────┘Data Flow Diagram
┌──────────────┐
│ User Browser │
└──────┬───────┘
│
│ 1. Capture screenshots
▼
┌──────────────────────────────┐
│ Chrome Extension │
│ │
│ ┌────────────────────────┐ │
│ │ FlowCaptureSection │ │
│ │ │ │
│ │ • captureActiveScreen()│ │
│ │ └─> CDP Screenshot │ │
│ │ └─> Vision analysis │ │
│ │ └─> NO DOM capture │ │
│ └────────────────────────┘ │
│ │ │
│ │ 2. AI Vision │
│ ▼ │
│ ┌────────────────────────┐ │
│ │ OpenAI Vision API │──┼──> External API
│ │ (GPT-4o) │ │
│ │ describeScreenshot() │ │
│ │ Returns: UI analysis │ │
│ └────────────────────────┘ │
│ │ │
│ │ 3. Store │
│ ▼ │
│ ┌────────────────────────┐ │
│ │ Zustand State │ │
│ │ │ │
│ │ ScreenCapture { │ │
│ │ imageDataUrl, │ │
│ │ imageDescription │ │
│ │ (domSnapshot: legacy)│ │
│ │ } │ │
│ └────────────────────────┘ │
│ │ │
│ │ 4. Discover │
│ ▼ │
│ ┌────────────────────────┐ │
│ │ discoveryService │ │
│ │ │ │
│ │ GPT-4 analyzes: │──┼──> External API
│ │ App, Actors, Phases │ │
│ └────────────────────────┘ │
│ │ │
│ │ 5. Generate │
│ ▼ │
│ ┌────────────────────────┐ │
│ │ flowDiscovery │ │
│ │ │ │
│ │ GPT-4 generates: │──┼──> External API
│ │ Test cases per phase │ │
│ └────────────────────────┘ │
│ │ │
│ │ 6. Save │
│ ▼ │
│ ┌────────────────────────┐ │
│ │ evalsAPI.create() │──┼──┐
│ │ │ │ │
│ │ Backend creates eval │ │ │ 7. Backend API
│ │ Returns: UUID │ │ │
│ └────────────────────────┘ │ │
│ │ │ │
│ │ 8. Execute │ │
│ ▼ │ │
│ ┌────────────────────────┐ │ │
│ │ testOrchestratorCU │ │ │
│ │ │ │ │
│ │ For each test (1st): │ │ │
│ │ 1. Load credentials │ │ │
│ │ 2. Capture screenshot│ │ │
│ │ 3. Send to Computer │──┼──> OpenAI Computer Use
│ │ Use API + creds │ │ │ (Vision + Actions)
│ │ 4. Execute action │ │ │
│ │ (click, type, etc)│ │ │
│ │ 5. Record action │ │ │
│ │ (DOM selectors) │ │ │
│ │ 6. Auto-login if │ │ │
│ │ detected │ │ │
│ │ 7. Repeat │ │ │
│ │ │ │ │
│ │ For repeat runs: │ │ │
│ │ 1. Replay recorded │ │ │
│ │ actions (DOM) │ │ │
│ │ 2. Fallback to │ │ │
│ │ Computer Use │ │ │
│ └────────────────────────┘ │ │
│ │ │ │
│ │ 9. Report │ │
│ ▼ │ │
│ ┌────────────────────────┐ │ │
│ │ submitTestReport() │──┼──┘
│ │ │ │
│ │ Backend stores report │ │
│ │ Deducts credits │ │
│ └────────────────────────┘ │
└──────────────────────────────┘
│
│ 10. View results
▼
┌──────────────┐
│ Reports UI │
└──────────────┘Phase 1: Capture Key Screens
User Interface
╔════════════════════════════════════════════════════════════════╗
║ Phase 1: Capture Key Screens [+ New Eval] ║
╠════════════════════════════════════════════════════════════════╣
║ ║
║ Take full-page screenshots of up to 5 important screens. ║
║ ║
╠════════════════════════════════════════════════════════════════╣
║ SAVED EVALS 3 saved [Sync] ║
╠════════════════════════════════════════════════════════════════╣
║ ┌────────────────────────┐ ┌────────────────────────┐ ║
║ │ Dashboard Eval │ │ Admin Panel Eval │ ║
║ │ Test dashboard... │ │ Test admin features... │ ║
║ │ │ │ │ ║
║ │ 3 phases · 15 tests │ │ 2 phases · 12 tests │ ║
║ │ Last: Oct 14, 2025 │ │ Last: Oct 13, 2025 │ ║
║ │ │ │ │ ║
║ │ [Select Eval] [▶] [🗑] │ │ [Select Eval] [▶] [🗑] │ ║
║ └────────────────────────┘ └────────────────────────┘ ║
╠════════════════════════════════════════════════════════════════╣
║ KEY SCREENS ║
╠════════════════════════════════════════════════════════════════╣
║ [Tab ▼ example.com/login ] [📷 Capture] ║
║ ║
║ ┌─────────────────┐ ┌─────────────────┐ ║
║ │ Screen 1 │ │ Screen 2 │ ║
║ │ [Screenshot] │ │ [Screenshot] │ ║
║ │ │ │ │ ║
║ │ Login Page │ │ Dashboard │ ║
║ │ /login │ │ /dashboard │ ║
║ │ │ │ │ ║
║ │ [✏️ Notes] [🗑] │ │ [✏️ Notes] [🗑] │ ║
║ └─────────────────┘ └─────────────────┘ ║
╚════════════════════════════════════════════════════════════════╝Technical Process
User Action System Process
──────────── ──────────────
1. Select tab
[example.com ▼]
│
2. Click "Capture" │
▼
┌─────────────────┐
│ attachDebugger │
│ (tabId) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Page.enable() │
│ getLayoutMetrics│
└────────┬────────┘
│
▼
┌─────────────────┐
│ captureScreenshot│
│ • width: full │
│ • height: max │
│ 800px │
└────────┬────────┘
│
▼
┌─────────────────────┐
│ describeScreenshot │
│ (OpenAI Vision API) │
│ │
│ Analyzes: │
│ • UI elements │
│ • User flows │
│ • Forms, buttons │
│ • Navigation │
│ • Visual layout │
│ • Actionable items │
│ │
│ NO DOM parsing! │
│ Pure vision AI │
└────────┬────────────┘
│
▼
┌─────────────────┐
│ Store in Zustand│
│ │
│ ScreenCapture { │
│ image, │
│ description, │
│ metadata, │
│ url │
│ } │
└─────────────────┘
│
3. Screenshot saved │
✓ Screen 1 │
▼
┌─────────────────┐
│ detachDebugger │
└─────────────────┘Phase 2: Discover Eval Phases
AI Analysis Flow
Input: Screenshots Output: Phase Discovery
───────────────── ───────────────────────
┌─────────────────┐
│ Screen 1 │ ┌────────────────────────┐
│ • imageDataUrl │ │ App Overview │
│ • domSnapshot │ │ "E-commerce platform │
│ • description │ │ for buying products" │
└────────┬────────┘ └────────────────────────┘
│
┌────────▼────────┐ ┌────────────────────────┐
│ Screen 2 │ │ User Actors │
│ • imageDataUrl │ ──────> │ • Guest User │
│ • domSnapshot │ GPT-4 │ • Registered User │
│ • description │ │ • Admin │
└────────┬────────┘ └────────────────────────┘
│
┌────────▼────────┐ ┌────────────────────────┐
│ Screen 3 │ │ Navigation Phases │
│ • imageDataUrl │ │ │
│ • domSnapshot │ │ ✓ Dashboard (P0) │
│ • description │ │ ✓ Settings (P1) │
└─────────────────┘ │ User Mgmt (P0) │
│ ✓ Profile (P2) │
└────────────────────────┘Selection Interface
╔════════════════════════════════════════════════════════════════╗
║ Phase 2: Discover Eval Phases ║
╠════════════════════════════════════════════════════════════════╣
║ AI analyzes your 3 captured screens to discover navigation ║
║ sections and create testable phases ║
╠════════════════════════════════════════════════════════════════╣
║ [🔍 Discover Phases] ║
╠════════════════════════════════════════════════════════════════╣
║ 💡 APPLICATION OVERVIEW ║
║ ║
║ E-commerce platform for buying products online with user ║
║ authentication and shopping cart functionality. ║
╠════════════════════════════════════════════════════════════════╣
║ 👥 USER ROLES ║
║ ║
║ [Guest User] [Registered User] [Admin] ║
╠════════════════════════════════════════════════════════════════╣
║ DISCOVERED PHASES (3 selected) [Select All] [Clear] ║
╠════════════════════════════════════════════════════════════════╣
║ ┌────────────────────────────────────────────────────┐ ║
║ │ ☑ Dashboard [P0] ✓ │ ║
║ │ Main application dashboard and overview │ ║
║ │ Screens: Dashboard, Home │ ║
║ └────────────────────────────────────────────────────┘ ║
║ ┌────────────────────────────────────────────────────┐ ║
║ │ ☑ Settings [P1] ✓ │ ║
║ │ Application settings and configuration │ ║
║ │ Screens: Settings, Preferences │ ║
║ └────────────────────────────────────────────────────┘ ║
║ ┌────────────────────────────────────────────────────┐ ║
║ │ ☐ User Management [P0] │ ║
║ │ Manage users, roles, and permissions │ ║
║ │ Screens: Users, Roles │ ║
║ └────────────────────────────────────────────────────┘ ║
║ ┌────────────────────────────────────────────────────┐ ║
║ │ ☑ Profile [P2] ✓ │ ║
║ │ User profile viewing and editing │ ║
║ │ Screens: Profile, Account │ ║
║ └────────────────────────────────────────────────────┘ ║
╠════════════════════════════════════════════════════════════════╣
║ ⚠ Select at least one phase to continue ║
╚════════════════════════════════════════════════════════════════╝Phase 3: Generate Test Cases Per Phase
Test Case Structure
Each phase can have up to 10 test cases:
Phase: "Dashboard"
├── PhaseTestCase 1
│ ├── id: "uuid-1234"
│ ├── phaseName: "Dashboard"
│ ├── title: "Verify dashboard loads correctly"
│ ├── narrative: "As a user, I want to see my dashboard..."
│ ├── priority: "P0"
│ ├── status: "idle"
│ └── steps: [
│ ┌──────────────────────────────────────────────┐
│ │ Step 1 │
│ │ action: "Navigate to dashboard" │
│ │ expectation: "Dashboard page loads" │
│ └──────────────────────────────────────────────┘
│ ┌──────────────────────────────────────────────┐
│ │ Step 2 │
│ │ action: "Verify widgets displayed" │
│ │ expectation: "All widgets render correctly" │
│ └──────────────────────────────────────────────┘
│ ]
├── PhaseTestCase 2 (up to 10 total per phase)
└── ...Save Eval Workflow
User Action Frontend Backend
─────────── ──────── ───────
1. Click "Save Eval"
│
2. Enter name/desc │
│
3. Click "Save" │
▼
┌─────────────┐
│ saveEval() │
└──────┬──────┘
│
│ POST /evals
▼
┌──────────────┐
│ Create eval │
│ evalId = │
│ UUID() │
└──────┬───────┘
│
◀─────────────────────────┘
│ Returns: { id: UUID }
│
┌──────▼──────┐
│ Store │
│ locally │
│ with backend│
│ UUID ✅ │
└─────────────┘
│
│ Associate with project
▼
┌─────────────┐
│ project. │
│ evalIds. │
│ push(UUID) │
└─────────────┘
│
4. ✓ Eval saved │
X tests across Y phases │
│
Future runs use │
same UUID ───────────────┘Phase 4: Execute Tests
Vision-Based Automation with Computer Use
BrowseGenius uses OpenAI's Computer Use model for test execution, which provides several advantages over traditional DOM-based automation:
Benefits:
- Visual Understanding: AI sees the UI like a human, understanding visual context and layout
- Resilient to Changes: No brittle selectors - works even when DOM structure changes
- Natural Interactions: Coordinate-based clicking, typing, and scrolling
- Context-Aware: AI understands the current state from screenshots
- Fewer Selectors: No need to maintain CSS selectors, XPath, or data-testid attributes
How It Works:
- Capture screenshot of current browser state
- Send screenshot to OpenAI Computer Use API with task instructions
- Model analyzes visual UI and determines next action (click, type, scroll)
- Execute action using coordinates or keyboard input
- Wait for UI update, capture new screenshot
- Repeat until task completes or fails
AI Automation Loop (Computer Use Model)
┌──────────────────────────────────────────────────────────────┐
│ TEST EXECUTION LOOP │
│ (Computer Use - Vision-Based Actions) │
│ (Max 40 iterations) │
└──────────────────────────────────────────────────────────────┘
Iteration N Action Result
─────────── ────── ──────
│
│ 1. Capture screenshot
▼
┌───────────────┐
│ CDP: │ Screenshot (PNG/Base64)
│ captureScreen │───────> 1920x1080 viewport
└───────┬───────┘ Full browser viewport
│ with all UI elements
│ 2. Send to Computer Use API
▼
┌─────────────────┐
│ OpenAI Computer │ Context:
│ Use Model │ • Screenshot (vision)
│ (gpt-4o) │ • Test instructions
└───────┬─────────┘ • Previous actions
│ • Task goal
│
│ 3. AI Response (computer_call)
▼
┌───────────────────────────────────┐
│ { │
│ thought: "I see login button", │
│ action: "click", │
│ parsedAction: { │
│ type: "click", │
│ x: 640, │
│ y: 350, │
│ button: "left" │
│ } │
│ } │
└────────┬──────────────────────────┘
│
│ 4. Execute computer action
▼
┌────────────────────┐
│ executeComputer │───> Playwright/CDP:
│ Action() │ • click(x, y)
│ • click(x, y) │ • type(text)
│ • double_click │ • scroll(dx, dy)
│ • type(text) │ • keypress(keys)
│ • scroll(dx, dy) │ • move(x, y)
│ • keypress(keys) │ • wait(ms)
│ • finish │
│ • fail │
└────────┬───────────┘
│
│ 5. Wait for UI update (1.5s)
▼
Sleep(1500)
│
│ 6. Next iteration (capture new screenshot)
└─────────────────┐
│
┌─────────────────┘
│
▼
[Repeat until:]
• AI calls finish() → ✅ PASSED
• AI calls fail() → ❌ FAILED
• Max 40 actions → ⚠️ BLOCKED
• Error occurs → ❌ FAILEDAvailable Computer Actions
The Computer Use model can perform these actions:
| Action | Parameters | Description |
|---|---|---|
click | x, y, button | Click at coordinates (left, right, middle, back, forward) |
double_click | x, y | Double-click at coordinates |
type | text | Type text at current cursor position |
keypress | keys | Press keyboard keys (Enter, Tab, Escape, etc.) |
scroll | x, y, scroll_x, scroll_y | Scroll viewport by offset |
move | x, y | Move mouse cursor to coordinates |
wait | ms | Pause execution (default 1000ms) |
finish | - | Mark test as PASSED |
fail | reason | Mark test as FAILED with reason |
Example Response:
{
"thought": "I need to click the login button in the center of the screen",
"action": "click",
"parsedAction": {
"type": "click",
"x": 640,
"y": 450,
"button": "left"
}
}Execution States
Test Case Lifecycle
────────────────────
┌──────┐
│ IDLE │ Initial state
└───┬──┘
│ Click "Run Tests"
▼
┌─────────┐
│ RUNNING │ AI executing actions
└───┬─────┘
│
├─> AI: finish() ──> ┌────────┐
│ │ PASSED │
│ └────────┘
│
├─> AI: fail() ────> ┌────────┐
│ │ FAILED │
│ └────────┘
│
├─> Max actions ───> ┌─────────┐
│ │ BLOCKED │
│ └─────────┘
│
└─> Exception ─────> ┌────────┐
│ FAILED │
└────────┘Phase 5: View Reports
Report Structure
SuiteReport
├── id: "report-uuid"
├── startedAt: "2025-10-14T21:00:00Z"
├── completedAt: "2025-10-14T21:05:30Z"
├── status: "complete"
│
├── summary
│ ├── total: 10
│ ├── passed: 7
│ ├── failed: 2
│ ├── blocked: 1
│ └── skipped: 0
│
├── cases: [
│ {
│ caseId: "test-1",
│ status: "passed",
│ durationMs: 45000
│ },
│ {
│ caseId: "test-2",
│ status: "failed",
│ failureReason: "Login button not found",
│ durationMs: 12000
│ }
│ ]
│
└── artifacts: [
{
id: "artifact-1",
caseId: "test-1",
timestamp: "2025-10-14T21:00:15Z",
type: "log",
message: "Clicked login button"
},
{
id: "artifact-2",
caseId: "test-2",
timestamp: "2025-10-14T21:02:30Z",
type: "screenshot",
data: "base64..."
}
]Reports UI
╔════════════════════════════════════════════════════════════════╗
║ Phase 5: Test Results ║
╠════════════════════════════════════════════════════════════════╣
║ ║
║ SUMMARY ║
║ ║
║ ┌─────────┐ ┌─────────┐ ┌─────────┐ ║
║ │ Total │ │ Passed │ │ Failed │ ║
║ │ 10 │ │ 7 │ │ 2 │ ║
║ └─────────┘ └─────────┘ └─────────┘ ║
║ ║
║ Started: Oct 14, 2025 9:00 PM ║
║ Duration: 5m 30s ║
║ ║
║ [Download Report ▼] ║
║ • JSON Only ║
║ • HTML Only ║
║ • Both (Bundle) ║
╠════════════════════════════════════════════════════════════════╣
║ TEST CASES ║
╠════════════════════════════════════════════════════════════════╣
║ ┌────────────────────────────────────────────────────────┐ ║
║ │ #1 [P0] [✓ PASSED] │ ║
║ │ Verify successful login with valid credentials │ ║
║ │ Duration: 45s │ ║
║ └────────────────────────────────────────────────────────┘ ║
║ ┌────────────────────────────────────────────────────────┐ ║
║ │ #2 [P1] [✗ FAILED] │ ║
║ │ Search products by category │ ║
║ │ ⚠ Login button not found │ ║
║ │ Duration: 12s │ ║
║ └────────────────────────────────────────────────────────┘ ║
║ ┌────────────────────────────────────────────────────────┐ ║
║ │ #3 [P0] [⚠ BLOCKED] │ ║
║ │ Complete checkout process │ ║
║ │ ⚠ Exceeded max actions (40) │ ║
║ │ Duration: 2m 15s │ ║
║ └────────────────────────────────────────────────────────┘ ║
╚════════════════════════════════════════════════════════════════╝Backend Integration
API Architecture
┌──────────────────┐ ┌──────────────────┐
│ Chrome Extension │ │ Cloudflare Worker│
│ │ │ │
│ ┌────────────┐ │ │ ┌────────────┐ │
│ │ API Client │──┼───── HTTPS ───────>│ │ Hono Router│ │
│ └────────────┘ │ │ └─────┬──────┘ │
│ │ │ │ │
│ ┌────────────┐ │ │ ┌─────▼──────┐ │
│ │ State │ │ │ │ Auth │ │
│ │ (Zustand) │ │ │ │ Middleware │ │
│ └────────────┘ │ │ └─────┬──────┘ │
└──────────────────┘ │ │ │
│ ┌─────▼──────┐ │
│ │ Routes │ │
│ │ /projects │ │
│ │ /test-plans│ │
│ │ /users │ │
│ └─────┬──────┘ │
│ │ │
│ ┌─────▼──────┐ │
│ │ D1 Database│ │
│ │ │ │
│ │ • projects │ │
│ │ • plans │ │
│ │ • reports │ │
│ └────────────┘ │
└──────────────────┘Request Flow Example
POST /test-plans/:id/report
────────────────────────────
Client Server
────── ──────
1. Build report object
{
summary: {...},
cases: [...],
artifacts: [...]
}
│
│ POST with X-Api-Key
▼
2. Auth middleware
verifies API key
│
▼
3. Get userId from key
│
▼
4. Verify plan exists
and belongs to user
│
├─> 404 if not found
│
▼
5. Calculate credits
cost = base + (cases * 1)
│
▼
6. Deduct credits
│
├─> 402 if insufficient
│
▼
7. Insert into test_reports
reportId = UUID()
│
▼
8. Update plan.last_run_at
│
▼
◀───────────────────────9. Return 201
{ reportId, creditsUsed }Debugging Tools
API Logs Interface
╔════════════════════════════════════════════════════════════════╗
║ API Request Logs [Clear All] ║
╠════════════════════════════════════════════════════════════════╣
║ ┌────────────────────────────────────────────────────────┐ ║
║ │ ✓ [GET] [200] 145ms 21:05:30.123 │ ║
║ │ /api/v1/projects │ ║
║ │ [▼ Expand] │ ║
║ └────────────────────────────────────────────────────────┘ ║
║ ┌────────────────────────────────────────────────────────┐ ║
║ │ ✗ [POST] [404] 946ms 21:11:20.456 │ ║
║ │ /api/v1/test-plans/.../report │ ║
║ │ ⚠ Test plan not found or access denied │ ║
║ │ [▲ Collapse]│ ║
║ │ ┌────────────────────────────────────────────────────┐ │ ║
║ │ │ Request Headers: │ │ ║
║ │ │ { │ │ ║
║ │ │ "Content-Type": "application/json", │ │ ║
║ │ │ "X-Api-Key": "***" │ │ ║
║ │ │ } │ │ ║
║ │ │ │ │ ║
║ │ │ Request Body: │ │ ║
║ │ │ { │ │ ║
║ │ │ "report": {...} │ │ ║
║ │ │ } │ │ ║
║ │ │ │ │ ║
║ │ │ Response Data: │ │ ║
║ │ │ { │ │ ║
║ │ │ "success": false, │ │ ║
║ │ │ "error": "Test plan not found" │ │ ║
║ │ │ } │ │ ║
║ │ └────────────────────────────────────────────────────┘ │ ║
║ └────────────────────────────────────────────────────────┘ ║
╚════════════════════════════════════════════════════════════════╝Quick Reference
Key Components
| Component | Purpose | Location |
|---|---|---|
App.tsx | Main navigation & screen routing | Common |
HomeScreen.tsx | Central hub with plan list | Pages/Components |
PlanDetailsScreen.tsx | Plan details & execution | Pages/Components |
CapturePhase.tsx | Phase 1 UI (Wizard) | Pages/Components |
DiscoveryPhase.tsx | Phase 2 UI (Wizard) | Pages/Components |
PlanPhase.tsx | Phase 3 UI (Wizard) | Pages/Components |
ExecutePhase.tsx | Phase 4 UI (Wizard) | Pages/Components |
ReportsPhase.tsx | Phase 5 UI (Wizard) | Pages/Components |
EvalNavigationSetup.tsx | Entry point & auth config | Pages/Components |
ComputerUseLoggerModal.tsx | Computer Use monitoring | Pages/Components |
DOMRecordingView.tsx | Action recording viewer | Pages/Components |
flowDiscovery.ts | Capture & generation | Services |
discoveryService.ts | Flow discovery | Services |
testOrchestratorComputerUse.ts | Test execution (Computer Use) | Services |
computerUse.ts | Computer Use API integration | Services |
actionRecorder.ts | Action recording (DOM selectors) | Services |
replayEngine.ts | Action replay with fallback | Services |
computerUseLogger.ts | Computer Use logging store | State |
test-plans-api.ts | Test plans API client | Services |
api-client.ts | Backend API | Services |
State Flow
User Action → Component → State Action → API Call → Backend → Response → State Update → UI UpdateCommon Patterns
Capture Screen:
await captureActiveScreen(notes?, tabId, fullPage=true, generateDescription=true)Generate Tests:
await generateTestPlanFromCaptures(customPrompt?, selectedPhaseNames)Save Eval:
saveEval(name, description) // Creates in backend, stores locally with backend UUIDRun Tests:
await runGeneratedSuite({ stopOnFailure: false })Submit Report:
await evalsAPI.submitReport(evalId, activeReport)Computer Use vs DOM-Based Automation
Traditional DOM-Based Approach ❌
// Brittle selectors that break when HTML changes
const loginButton = await page.$('[data-testid="login-btn"]');
await loginButton.click();
// Requires maintaining selectors
const email = await page.$('#email-input');
await email.type('user@example.com');Problems:
- Selectors break when developers change HTML
- Requires data-testid attributes or stable CSS classes
- Can't handle dynamic content well
- Needs constant maintenance
Computer Use Approach ✅
// AI sees the screen and understands visually
const screenshot = await captureScreenshot();
const action = await computerUseAPI.determineAction(screenshot, {
instruction: "Click the login button"
});
// Returns: { type: "click", x: 640, y: 450 }
await executeAction(action);Advantages:
- No selectors needed - AI sees like a human
- Works even when HTML changes
- Resilient to UI updates
- Natural interaction with coordinates
- Context-aware decision making
New Features
Home Screen (Central Hub)
The Home Screen is the default view showing all test plans for the active project:
Features:
- Plan List: View all plans with status badges (New, Discovering, Testing, Completed)
- Status Filters: Filter plans by status
- Plan Cards: Show plan name, description, test case count, last run date
- Quick Actions: Create new plan, view details, execute
- Entry Point Indicators: See which plans have custom entry points
- Auth Badges: Identify authenticated workflows at a glance
Plan Details Screen
Detailed view of a single test plan with all configuration:
Displays:
- Entry Point URL: Full URL constructed from hostname + entry path
- Authentication Status: ON/OFF badge with credential visibility
- Credentials: Masked credentials (username, password, API key) with show/hide toggle
- Test Cases: List of all configured test cases
- Actions: Back to home, edit plan, execute tests
Entry Point Configuration
Configure custom entry points for tests that don't start at the root URL:
Usage:
Project Hostname: example.com
Entry Point Path: /admin
Result: https://example.com/adminCommon Use Cases:
- Admin portals:
/admin,/dashboard - Login pages:
/login,/auth - Specific features:
/products,/checkout - Subdirectories:
/app,/portal
Authenticated Workflow Toggle
Control whether tests require authentication:
When ON (default):
- Shows credential inputs in navigation setup
- Loads credentials from eval → project → none
- Auto-login detection enabled
- Computer Use receives credentials in context
- Login instructions included in prompts
When OFF:
- Hides all credential fields
- No credential loading
- No auto-login attempts
- Computer Use runs without credentials
- Perfect for public pages and unauthenticated flows
Computer Use Logger
Real-time monitoring tool for debugging Computer Use interactions:
Features:
- Action Log: Every Computer Use API call with timestamps
- Context Display: Instruction, screenshot (truncated), credentials presence
- Response Details: AI thought process, action type, coordinates
- Error Tracking: Failed actions with error messages
- Login Detection: State changes with confidence and reasoning
- Statistics: Total calls, success rate, error count
Access:
- Click Monitor icon in top bar
- Badge shows log count (red if errors)
- Expandable entries with full details
- Clear all functionality
Workflow Summary
The workflow emphasizes hub-based navigation, flexible entry points, and vision-based automation with intelligent replay:
Home Screen: Central hub for all test plans
- View all plans for active project
- Filter by status
- Create new or select existing plan
Plan Configuration (Creation Flow):
- Capture Phase: Create screenshots (vision only, no DOM)
- Discover Phase: AI identifies navigation phases
- Plan Phase:
- Generate up to 10 tests per phase
- Entry Point: Set custom starting URL path
- Auth Toggle: Enable/disable authentication workflow
- Credentials: Configure eval-level or use project defaults
- Add sitemap URLs for navigation links
- Save plan → Return to Home
Plan Execution (Execution Flow):
- Select plan from Home → View Plan Details
- Review entry point, auth status, credentials
- Click Execute → Run tests
- First Run: Computer Use (vision) + action recording
- Repeat Runs: DOM replay with Computer Use fallback
- Auto-Login: Automatic when auth toggle ON
- View Reports → Return to Home
Key Benefits:
- ✅ Vision-Based Testing: No brittle selectors needed for first run
- ✅ Intelligent Replay: Fast DOM-based replay on subsequent runs
- ✅ Resilient: Computer Use fallback when DOM changes
- ✅ Auto-Login: Automatic credential handling and login detection
- ✅ Phase Organization: Clear structure and management
- ✅ Flexible Generation: Up to 10 tests per phase
- ✅ Better Scaling: Manage large test suites efficiently
- ✅ No DOM Required: Pure screenshot analysis for test generation
Next Steps
- Backend Integration - Learn about API endpoints
- Test Execution - Deep dive into AI automation
- Troubleshooting - Common issues and fixes