Complete Workflow Guide

This guide describes the complete end-to-end workflow of BrowseGenius for AI-powered test automation using Evaluations (Evals) and Phases.

Terminology

Eval (Evaluation): A complete testing session for a project containing multiple phases
Phase: A specific navigation section of your application (e.g., Dashboard, Settings, User Management)
Each phase can have up to 10 test cases
Evals organize and track testing across all phases
Computer Use Model: Vision-based AI that sees and interacts with UI like a human (no DOM selectors needed)
Home Screen: Central hub showing all test plans for the active project
Plan Details: Detailed view of a single plan with execution controls
Entry Point Path: Custom path appended to project hostname (e.g., /admin, /login)
Authenticated Workflow: Toggle to enable/disable authentication flow and credential handling
Computer Use Logger: Real-time monitoring tool for Computer Use API calls and actions

Workflow Overview

BrowseGenius uses a hub-based navigation with two main flows:

Creation Flow

Home Screen → Wizard (Capture) → Wizard (Discover) → Wizard (Plan) → Save → Home Screen

Execution Flow

Home Screen → Plan Details → Execute → Reports → Home Screen

┌─────────────────────────────────────────────────────────────────────┐
│                       BROWSEGENIUS WORKFLOW                         │
└─────────────────────────────────────────────────────────────────────┘

HOME SCREEN (Central Hub)           PLAN DETAILS
┌────────────────────┐              ┌────────────────────┐
│  All Test Plans    │              │   Single Plan      │
│  for Project       │              │   Details          │
│                    │              │                    │
│  • Filter by status│◀─────────────│  • Entry point     │
│  • Create new      │              │  • Auth toggle     │
│  • View details    │─────────────>│  • Credentials     │
│  • Search plans    │              │  • Test cases      │
│                    │              │  • Execute button  │
│  [+ New Plan]      │              │  [← Back] [▶ Run]  │
└────────────────────┘              └────────────────────┘
         │                                   │
         │ Create New                        │ Execute
         ▼                                   ▼
┌────────────────────┐              ┌────────────────────┐
│  WIZARD MODE       │              │  WIZARD MODE       │
│  (Creation)        │              │  (Execution)       │
│                    │              │                    │
│  Phase 1: CAPTURE  │              │  Phase 4: EXECUTE  │
┌────────────────────┐              ┌────────────────────┐
│  Take Screenshots  │              │   AI Analyzes      │
│  of Key Screens    │──────────────>│   Application      │
│                    │              │                    │
│  • Select Tab      │              │  • App Overview    │
│  • Click Capture   │              │  • User Roles      │
│  • AI Vision       │              │  • Navigation      │
│  • Screenshot PNG  │              │    Phases          │
│  • NO DOM needed   │              │  • Priorities      │
│  Max: 5 screens    │              │                    │
└────────────────────┘              │  Select phases ✓   │
         │                          └────────────────────┘
         │                                   │
         ▼                                   │
┌────────────────────┐                      ▼
│ SAVED EVALS        │              ┌────────────────────┐
│                    │              │ Phase 3: PLAN      │
│ • View existing    │              │                    │
│ • Select eval      │◀─────────────│  AI Generates      │
│ • New eval         │              │  Tests Per Phase   │
│ • Sync from backend│              │                    │
│ • Credentials      │              │  • Up to 10 tests  │
│ • Sitemap links    │              │    per phase       │
│                    │              │  • Steps           │
└────────────────────┘              │  • Expectations    │
         │                          │  • Save eval       │
         │ Select Eval              └────────────────────┘
         └─────────────────────────>            │
                                                 │
                                                 ▼
                                    ┌────────────────────┐
                                    │ Phase 4: EXECUTE   │
                                    │                    │
                                    │  Computer Use      │
                                    │  Vision-Based      │
                                    │                    │
                                    │  ┌──────────────┐  │
                                    │  │ Screenshot   │  │
                                    │  └──────┬───────┘  │
                                    │         ▼          │
                                    │  ┌──────────────┐  │
                                    │  │ AI Vision    │  │
                                    │  │ + Credentials│  │
                                    │  │ determines   │  │
                                    │  │ action       │  │
                                    │  └──────┬───────┘  │
                                    │         ▼          │
                                    │  ┌──────────────┐  │
                                    │  │ Execute      │  │
                                    │  │ click(x,y)   │  │
                                    │  │ type(text)   │  │
                                    │  │ Auto-login   │  │
                                    │  └──────┬───────┘  │
                                    │         │          │
                                    │         │ Record   │
                                    │         │ actions  │
                                    │         │ (JSON)   │
                                    │         │          │
                                    │         └──────┐   │
                                    │         Repeat │   │
                                    │         (max   │   │
                                    │          40x)  │   │
                                    │                │   │
                                    └────────────────┼───┘
                                                     │
                                                     ▼
                                            ┌────────────────────┐
                                            │ Phase 5: REPORTS   │
                                            │                    │
                                            │  View Results      │
                                            │                    │
                                            │  • Summary stats   │
                                            │  • Test details    │
                                            │  • Download        │
                                            │    - JSON          │
                                            │    - HTML          │
                                            │  • Submit to       │
                                            │    backend         │
                                            └────────────────────┘

Data Flow Diagram

┌──────────────┐
│ User Browser │
└──────┬───────┘
       │
       │ 1. Capture screenshots
       ▼
┌──────────────────────────────┐
│ Chrome Extension             │
│                              │
│  ┌────────────────────────┐  │
│  │ FlowCaptureSection     │  │
│  │                        │  │
│  │ • captureActiveScreen()│  │
│  │   └─> CDP Screenshot   │  │
│  │   └─> Vision analysis  │  │
│  │   └─> NO DOM capture   │  │
│  └────────────────────────┘  │
│           │                  │
│           │ 2. AI Vision     │
│           ▼                  │
│  ┌────────────────────────┐  │
│  │ OpenAI Vision API      │──┼──> External API
│  │ (GPT-4o)               │  │
│  │ describeScreenshot()   │  │
│  │   Returns: UI analysis │  │
│  └────────────────────────┘  │
│           │                  │
│           │ 3. Store         │
│           ▼                  │
│  ┌────────────────────────┐  │
│  │ Zustand State          │  │
│  │                        │  │
│  │ ScreenCapture {        │  │
│  │   imageDataUrl,        │  │
│  │   imageDescription     │  │
│  │   (domSnapshot: legacy)│  │
│  │ }                      │  │
│  └────────────────────────┘  │
│           │                  │
│           │ 4. Discover      │
│           ▼                  │
│  ┌────────────────────────┐  │
│  │ discoveryService       │  │
│  │                        │  │
│  │ GPT-4 analyzes:        │──┼──> External API
│  │   App, Actors, Phases  │  │
│  └────────────────────────┘  │
│           │                  │
│           │ 5. Generate      │
│           ▼                  │
│  ┌────────────────────────┐  │
│  │ flowDiscovery          │  │
│  │                        │  │
│  │ GPT-4 generates:       │──┼──> External API
│  │   Test cases per phase │  │
│  └────────────────────────┘  │
│           │                  │
│           │ 6. Save          │
│           ▼                  │
│  ┌────────────────────────┐  │
│  │ evalsAPI.create()      │──┼──┐
│  │                        │  │  │
│  │ Backend creates eval   │  │  │ 7. Backend API
│  │ Returns: UUID          │  │  │
│  └────────────────────────┘  │  │
│           │                  │  │
│           │ 8. Execute       │  │
│           ▼                  │  │
│  ┌────────────────────────┐  │  │
│  │ testOrchestratorCU     │  │  │
│  │                        │  │  │
│  │ For each test (1st):   │  │  │
│  │   1. Load credentials  │  │  │
│  │   2. Capture screenshot│  │  │
│  │   3. Send to Computer  │──┼──> OpenAI Computer Use
│  │      Use API + creds   │  │  │    (Vision + Actions)
│  │   4. Execute action    │  │  │
│  │      (click, type, etc)│  │  │
│  │   5. Record action     │  │  │
│  │      (DOM selectors)   │  │  │
│  │   6. Auto-login if     │  │  │
│  │      detected          │  │  │
│  │   7. Repeat            │  │  │
│  │                        │  │  │
│  │ For repeat runs:       │  │  │
│  │   1. Replay recorded   │  │  │
│  │      actions (DOM)     │  │  │
│  │   2. Fallback to       │  │  │
│  │      Computer Use      │  │  │
│  └────────────────────────┘  │  │
│           │                  │  │
│           │ 9. Report        │  │
│           ▼                  │  │
│  ┌────────────────────────┐  │  │
│  │ submitTestReport()     │──┼──┘
│  │                        │  │
│  │ Backend stores report  │  │
│  │ Deducts credits        │  │
│  └────────────────────────┘  │
└──────────────────────────────┘
       │
       │ 10. View results
       ▼
┌──────────────┐
│ Reports UI   │
└──────────────┘

Phase 1: Capture Key Screens

User Interface

╔════════════════════════════════════════════════════════════════╗
║ Phase 1: Capture Key Screens                    [+ New Eval]  ║
╠════════════════════════════════════════════════════════════════╣
║                                                                 ║
║  Take full-page screenshots of up to 5 important screens.     ║
║                                                                 ║
╠════════════════════════════════════════════════════════════════╣
║ SAVED EVALS                         3 saved  [Sync]           ║
╠════════════════════════════════════════════════════════════════╣
║ ┌────────────────────────┐  ┌────────────────────────┐       ║
║ │ Dashboard Eval         │  │ Admin Panel Eval       │       ║
║ │ Test dashboard...      │  │ Test admin features... │       ║
║ │                        │  │                        │       ║
║ │ 3 phases · 15 tests    │  │ 2 phases · 12 tests    │       ║
║ │ Last: Oct 14, 2025     │  │ Last: Oct 13, 2025     │       ║
║ │                        │  │                        │       ║
║ │ [Select Eval] [▶] [🗑] │  │ [Select Eval] [▶] [🗑] │       ║
║ └────────────────────────┘  └────────────────────────┘       ║
╠════════════════════════════════════════════════════════════════╣
║ KEY SCREENS                                                    ║
╠════════════════════════════════════════════════════════════════╣
║ [Tab ▼ example.com/login          ]  [📷 Capture]            ║
║                                                                 ║
║ ┌─────────────────┐  ┌─────────────────┐                     ║
║ │ Screen 1        │  │ Screen 2        │                     ║
║ │ [Screenshot]    │  │ [Screenshot]    │                     ║
║ │                 │  │                 │                     ║
║ │ Login Page      │  │ Dashboard       │                     ║
║ │ /login          │  │ /dashboard      │                     ║
║ │                 │  │                 │                     ║
║ │ [✏️ Notes] [🗑] │  │ [✏️ Notes] [🗑] │                     ║
║ └─────────────────┘  └─────────────────┘                     ║
╚════════════════════════════════════════════════════════════════╝

Technical Process

User Action                    System Process
────────────                   ──────────────

1. Select tab
   [example.com ▼]
                              │
2. Click "Capture"            │
                              ▼
                        ┌─────────────────┐
                        │ attachDebugger  │
                        │ (tabId)         │
                        └────────┬────────┘
                                 │
                                 ▼
                        ┌─────────────────┐
                        │ Page.enable()   │
                        │ getLayoutMetrics│
                        └────────┬────────┘
                                 │
                                 ▼
                        ┌─────────────────┐
                        │ captureScreenshot│
                        │ • width: full   │
                        │ • height: max   │
                        │   800px         │
                        └────────┬────────┘
                                 │
                                 ▼
                        ┌─────────────────────┐
                        │ describeScreenshot  │
                        │ (OpenAI Vision API) │
                        │                     │
                        │ Analyzes:           │
                        │ • UI elements       │
                        │ • User flows        │
                        │ • Forms, buttons    │
                        │ • Navigation        │
                        │ • Visual layout     │
                        │ • Actionable items  │
                        │                     │
                        │ NO DOM parsing!     │
                        │ Pure vision AI      │
                        └────────┬────────────┘
                                 │
                                 ▼
                        ┌─────────────────┐
                        │ Store in Zustand│
                        │                 │
                        │ ScreenCapture { │
                        │   image,        │
                        │   description,  │
                        │   metadata,     │
                        │   url           │
                        │ }               │
                        └─────────────────┘
                                 │
3. Screenshot saved             │
   ✓ Screen 1                   │
                                 ▼
                        ┌─────────────────┐
                        │ detachDebugger  │
                        └─────────────────┘

Phase 2: Discover Eval Phases

AI Analysis Flow

Input: Screenshots            Output: Phase Discovery
─────────────────            ───────────────────────

┌─────────────────┐
│ Screen 1        │          ┌────────────────────────┐
│ • imageDataUrl  │          │ App Overview           │
│ • domSnapshot   │          │ "E-commerce platform   │
│ • description   │          │  for buying products"  │
└────────┬────────┘          └────────────────────────┘
         │
┌────────▼────────┐          ┌────────────────────────┐
│ Screen 2        │          │ User Actors            │
│ • imageDataUrl  │  ──────> │ • Guest User           │
│ • domSnapshot   │  GPT-4   │ • Registered User      │
│ • description   │          │ • Admin                │
└────────┬────────┘          └────────────────────────┘
         │
┌────────▼────────┐          ┌────────────────────────┐
│ Screen 3        │          │ Navigation Phases      │
│ • imageDataUrl  │          │                        │
│ • domSnapshot   │          │ ✓ Dashboard (P0)       │
│ • description   │          │ ✓ Settings (P1)        │
└─────────────────┘          │   User Mgmt (P0)       │
                             │ ✓ Profile (P2)         │
                             └────────────────────────┘

Selection Interface

╔════════════════════════════════════════════════════════════════╗
║ Phase 2: Discover Eval Phases                                 ║
╠════════════════════════════════════════════════════════════════╣
║ AI analyzes your 3 captured screens to discover navigation    ║
║ sections and create testable phases                           ║
╠════════════════════════════════════════════════════════════════╣
║                                    [🔍 Discover Phases]       ║
╠════════════════════════════════════════════════════════════════╣
║ 💡 APPLICATION OVERVIEW                                        ║
║                                                                 ║
║ E-commerce platform for buying products online with user       ║
║ authentication and shopping cart functionality.                ║
╠════════════════════════════════════════════════════════════════╣
║ 👥 USER ROLES                                                  ║
║                                                                 ║
║ [Guest User]  [Registered User]  [Admin]                      ║
╠════════════════════════════════════════════════════════════════╣
║ DISCOVERED PHASES (3 selected)           [Select All] [Clear]  ║
╠════════════════════════════════════════════════════════════════╣
║ ┌────────────────────────────────────────────────────┐        ║
║ │ ☑ Dashboard                               [P0] ✓   │        ║
║ │   Main application dashboard and overview          │        ║
║ │   Screens: Dashboard, Home                         │        ║
║ └────────────────────────────────────────────────────┘        ║
║ ┌────────────────────────────────────────────────────┐        ║
║ │ ☑ Settings                                [P1] ✓   │        ║
║ │   Application settings and configuration           │        ║
║ │   Screens: Settings, Preferences                   │        ║
║ └────────────────────────────────────────────────────┘        ║
║ ┌────────────────────────────────────────────────────┐        ║
║ │ ☐ User Management                         [P0]     │        ║
║ │   Manage users, roles, and permissions             │        ║
║ │   Screens: Users, Roles                            │        ║
║ └────────────────────────────────────────────────────┘        ║
║ ┌────────────────────────────────────────────────────┐        ║
║ │ ☑ Profile                                 [P2] ✓   │        ║
║ │   User profile viewing and editing                 │        ║
║ │   Screens: Profile, Account                        │        ║
║ └────────────────────────────────────────────────────┘        ║
╠════════════════════════════════════════════════════════════════╣
║ ⚠ Select at least one phase to continue                       ║
╚════════════════════════════════════════════════════════════════╝

Phase 3: Generate Test Cases Per Phase

Test Case Structure

Each phase can have up to 10 test cases:

Phase: "Dashboard"
├── PhaseTestCase 1
│   ├── id: "uuid-1234"
│   ├── phaseName: "Dashboard"
│   ├── title: "Verify dashboard loads correctly"
│   ├── narrative: "As a user, I want to see my dashboard..."
│   ├── priority: "P0"
│   ├── status: "idle"
│   └── steps: [
│         ┌──────────────────────────────────────────────┐
│         │ Step 1                                       │
│         │ action: "Navigate to dashboard"             │
│         │ expectation: "Dashboard page loads"         │
│         └──────────────────────────────────────────────┘
│         ┌──────────────────────────────────────────────┐
│         │ Step 2                                       │
│         │ action: "Verify widgets displayed"          │
│         │ expectation: "All widgets render correctly" │
│         └──────────────────────────────────────────────┘
│       ]
├── PhaseTestCase 2 (up to 10 total per phase)
└── ...

Save Eval Workflow

User Action              Frontend                  Backend
───────────             ────────                  ───────

1. Click "Save Eval"
                         │
2. Enter name/desc       │
                         │
3. Click "Save"          │
                         ▼
                    ┌─────────────┐
                    │ saveEval()  │
                    └──────┬──────┘
                           │
                           │ POST /evals
                           ▼
                                              ┌──────────────┐
                                              │ Create eval  │
                                              │ evalId =     │
                                              │ UUID()       │
                                              └──────┬───────┘
                                                     │
                           ◀─────────────────────────┘
                           │ Returns: { id: UUID }
                           │
                    ┌──────▼──────┐
                    │ Store       │
                    │ locally     │
                    │ with backend│
                    │ UUID ✅     │
                    └─────────────┘
                           │
                           │ Associate with project
                           ▼
                    ┌─────────────┐
                    │ project.    │
                    │ evalIds.    │
                    │ push(UUID)  │
                    └─────────────┘
                           │
4. ✓ Eval saved            │
   X tests across Y phases │
                           │
   Future runs use         │
   same UUID ───────────────┘

Phase 4: Execute Tests

Vision-Based Automation with Computer Use

BrowseGenius uses OpenAI's Computer Use model for test execution, which provides several advantages over traditional DOM-based automation:

Benefits:

Visual Understanding: AI sees the UI like a human, understanding visual context and layout
Resilient to Changes: No brittle selectors - works even when DOM structure changes
Natural Interactions: Coordinate-based clicking, typing, and scrolling
Context-Aware: AI understands the current state from screenshots
Fewer Selectors: No need to maintain CSS selectors, XPath, or data-testid attributes

How It Works:

Capture screenshot of current browser state
Send screenshot to OpenAI Computer Use API with task instructions
Model analyzes visual UI and determines next action (click, type, scroll)
Execute action using coordinates or keyboard input
Wait for UI update, capture new screenshot
Repeat until task completes or fails

AI Automation Loop (Computer Use Model)

┌──────────────────────────────────────────────────────────────┐
│                     TEST EXECUTION LOOP                      │
│              (Computer Use - Vision-Based Actions)           │
│                      (Max 40 iterations)                     │
└──────────────────────────────────────────────────────────────┘

Iteration N              Action                    Result
───────────              ──────                    ──────

    │
    │ 1. Capture screenshot
    ▼
┌───────────────┐
│ CDP:          │        Screenshot (PNG/Base64)
│ captureScreen │───────> 1920x1080 viewport
└───────┬───────┘          Full browser viewport
        │                  with all UI elements
        │ 2. Send to Computer Use API
        ▼
┌─────────────────┐
│ OpenAI Computer │        Context:
│ Use Model       │        • Screenshot (vision)
│ (gpt-4o)        │        • Test instructions
└───────┬─────────┘        • Previous actions
        │                  • Task goal
        │
        │ 3. AI Response (computer_call)
        ▼
┌───────────────────────────────────┐
│ {                                 │
│   thought: "I see login button",  │
│   action: "click",                │
│   parsedAction: {                 │
│     type: "click",                │
│     x: 640,                       │
│     y: 350,                       │
│     button: "left"                │
│   }                               │
│ }                                 │
└────────┬──────────────────────────┘
         │
         │ 4. Execute computer action
         ▼
┌────────────────────┐
│ executeComputer    │───> Playwright/CDP:
│ Action()           │      • click(x, y)
│ • click(x, y)      │      • type(text)
│ • double_click     │      • scroll(dx, dy)
│ • type(text)       │      • keypress(keys)
│ • scroll(dx, dy)   │      • move(x, y)
│ • keypress(keys)   │      • wait(ms)
│ • finish           │
│ • fail             │
└────────┬───────────┘
         │
         │ 5. Wait for UI update (1.5s)
         ▼
    Sleep(1500)
         │
         │ 6. Next iteration (capture new screenshot)
         └─────────────────┐
                           │
         ┌─────────────────┘
         │
         ▼
    [Repeat until:]
    • AI calls finish() → ✅ PASSED
    • AI calls fail()   → ❌ FAILED
    • Max 40 actions    → ⚠️ BLOCKED
    • Error occurs      → ❌ FAILED

Available Computer Actions

The Computer Use model can perform these actions:

Action	Parameters	Description
`click`	`x, y, button`	Click at coordinates (left, right, middle, back, forward)
`double_click`	`x, y`	Double-click at coordinates
`type`	`text`	Type text at current cursor position
`keypress`	`keys`	Press keyboard keys (Enter, Tab, Escape, etc.)
`scroll`	`x, y, scroll_x, scroll_y`	Scroll viewport by offset
`move`	`x, y`	Move mouse cursor to coordinates
`wait`	`ms`	Pause execution (default 1000ms)
`finish`	-	Mark test as PASSED
`fail`	`reason`	Mark test as FAILED with reason

Example Response:

json

{
  "thought": "I need to click the login button in the center of the screen",
  "action": "click",
  "parsedAction": {
    "type": "click",
    "x": 640,
    "y": 450,
    "button": "left"
  }
}

Execution States

Test Case Lifecycle
────────────────────

┌──────┐
│ IDLE │  Initial state
└───┬──┘
    │ Click "Run Tests"
    ▼
┌─────────┐
│ RUNNING │  AI executing actions
└───┬─────┘
    │
    ├─> AI: finish() ──> ┌────────┐
    │                    │ PASSED │
    │                    └────────┘
    │
    ├─> AI: fail() ────> ┌────────┐
    │                    │ FAILED │
    │                    └────────┘
    │
    ├─> Max actions ───> ┌─────────┐
    │                    │ BLOCKED │
    │                    └─────────┘
    │
    └─> Exception ─────> ┌────────┐
                         │ FAILED │
                         └────────┘

Phase 5: View Reports

Report Structure

SuiteReport
├── id: "report-uuid"
├── startedAt: "2025-10-14T21:00:00Z"
├── completedAt: "2025-10-14T21:05:30Z"
├── status: "complete"
│
├── summary
│   ├── total: 10
│   ├── passed: 7
│   ├── failed: 2
│   ├── blocked: 1
│   └── skipped: 0
│
├── cases: [
│     {
│       caseId: "test-1",
│       status: "passed",
│       durationMs: 45000
│     },
│     {
│       caseId: "test-2",
│       status: "failed",
│       failureReason: "Login button not found",
│       durationMs: 12000
│     }
│   ]
│
└── artifacts: [
      {
        id: "artifact-1",
        caseId: "test-1",
        timestamp: "2025-10-14T21:00:15Z",
        type: "log",
        message: "Clicked login button"
      },
      {
        id: "artifact-2",
        caseId: "test-2",
        timestamp: "2025-10-14T21:02:30Z",
        type: "screenshot",
        data: "base64..."
      }
    ]

Reports UI

╔════════════════════════════════════════════════════════════════╗
║ Phase 5: Test Results                                         ║
╠════════════════════════════════════════════════════════════════╣
║                                                                 ║
║ SUMMARY                                                        ║
║                                                                 ║
║ ┌─────────┐  ┌─────────┐  ┌─────────┐                        ║
║ │ Total   │  │ Passed  │  │ Failed  │                        ║
║ │   10    │  │    7    │  │    2    │                        ║
║ └─────────┘  └─────────┘  └─────────┘                        ║
║                                                                 ║
║ Started: Oct 14, 2025 9:00 PM                                 ║
║ Duration: 5m 30s                                              ║
║                                                                 ║
║                      [Download Report ▼]                       ║
║                      • JSON Only                              ║
║                      • HTML Only                              ║
║                      • Both (Bundle)                          ║
╠════════════════════════════════════════════════════════════════╣
║ TEST CASES                                                     ║
╠════════════════════════════════════════════════════════════════╣
║ ┌────────────────────────────────────────────────────────┐   ║
║ │ #1 [P0] [✓ PASSED]                                     │   ║
║ │ Verify successful login with valid credentials         │   ║
║ │ Duration: 45s                                           │   ║
║ └────────────────────────────────────────────────────────┘   ║
║ ┌────────────────────────────────────────────────────────┐   ║
║ │ #2 [P1] [✗ FAILED]                                     │   ║
║ │ Search products by category                            │   ║
║ │ ⚠ Login button not found                               │   ║
║ │ Duration: 12s                                           │   ║
║ └────────────────────────────────────────────────────────┘   ║
║ ┌────────────────────────────────────────────────────────┐   ║
║ │ #3 [P0] [⚠ BLOCKED]                                    │   ║
║ │ Complete checkout process                              │   ║
║ │ ⚠ Exceeded max actions (40)                            │   ║
║ │ Duration: 2m 15s                                        │   ║
║ └────────────────────────────────────────────────────────┘   ║
╚════════════════════════════════════════════════════════════════╝

Backend Integration

API Architecture

┌──────────────────┐                    ┌──────────────────┐
│ Chrome Extension │                    │ Cloudflare Worker│
│                  │                    │                  │
│  ┌────────────┐  │                    │  ┌────────────┐  │
│  │ API Client │──┼───── HTTPS ───────>│  │ Hono Router│  │
│  └────────────┘  │                    │  └─────┬──────┘  │
│                  │                    │        │         │
│  ┌────────────┐  │                    │  ┌─────▼──────┐  │
│  │ State      │  │                    │  │ Auth       │  │
│  │ (Zustand)  │  │                    │  │ Middleware │  │
│  └────────────┘  │                    │  └─────┬──────┘  │
└──────────────────┘                    │        │         │
                                        │  ┌─────▼──────┐  │
                                        │  │ Routes     │  │
                                        │  │ /projects  │  │
                                        │  │ /test-plans│  │
                                        │  │ /users     │  │
                                        │  └─────┬──────┘  │
                                        │        │         │
                                        │  ┌─────▼──────┐  │
                                        │  │ D1 Database│  │
                                        │  │            │  │
                                        │  │ • projects │  │
                                        │  │ • plans    │  │
                                        │  │ • reports  │  │
                                        │  └────────────┘  │
                                        └──────────────────┘

Request Flow Example

POST /test-plans/:id/report
────────────────────────────

Client                          Server
──────                          ──────

1. Build report object
   {
     summary: {...},
     cases: [...],
     artifacts: [...]
   }
        │
        │ POST with X-Api-Key
        ▼
                                2. Auth middleware
                                   verifies API key
                                        │
                                        ▼
                                3. Get userId from key
                                        │
                                        ▼
                                4. Verify plan exists
                                   and belongs to user
                                        │
                                        ├─> 404 if not found
                                        │
                                        ▼
                                5. Calculate credits
                                   cost = base + (cases * 1)
                                        │
                                        ▼
                                6. Deduct credits
                                        │
                                        ├─> 402 if insufficient
                                        │
                                        ▼
                                7. Insert into test_reports
                                   reportId = UUID()
                                        │
                                        ▼
                                8. Update plan.last_run_at
                                        │
                                        ▼
        ◀───────────────────────9. Return 201
                                   { reportId, creditsUsed }

Debugging Tools

API Logs Interface

╔════════════════════════════════════════════════════════════════╗
║ API Request Logs                           [Clear All]         ║
╠════════════════════════════════════════════════════════════════╣
║ ┌────────────────────────────────────────────────────────┐   ║
║ │ ✓ [GET] [200] 145ms                       21:05:30.123 │   ║
║ │ /api/v1/projects                                        │   ║
║ │                                              [▼ Expand] │   ║
║ └────────────────────────────────────────────────────────┘   ║
║ ┌────────────────────────────────────────────────────────┐   ║
║ │ ✗ [POST] [404] 946ms                      21:11:20.456 │   ║
║ │ /api/v1/test-plans/.../report                          │   ║
║ │ ⚠ Test plan not found or access denied                 │   ║
║ │                                              [▲ Collapse]│   ║
║ │ ┌────────────────────────────────────────────────────┐ │   ║
║ │ │ Request Headers:                                   │ │   ║
║ │ │ {                                                  │ │   ║
║ │ │   "Content-Type": "application/json",             │ │   ║
║ │ │   "X-Api-Key": "***"                              │ │   ║
║ │ │ }                                                  │ │   ║
║ │ │                                                    │ │   ║
║ │ │ Request Body:                                      │ │   ║
║ │ │ {                                                  │ │   ║
║ │ │   "report": {...}                                 │ │   ║
║ │ │ }                                                  │ │   ║
║ │ │                                                    │ │   ║
║ │ │ Response Data:                                     │ │   ║
║ │ │ {                                                  │ │   ║
║ │ │   "success": false,                               │ │   ║
║ │ │   "error": "Test plan not found"                  │ │   ║
║ │ │ }                                                  │ │   ║
║ │ └────────────────────────────────────────────────────┘ │   ║
║ └────────────────────────────────────────────────────────┘   ║
╚════════════════════════════════════════════════════════════════╝

Quick Reference

Key Components

Component	Purpose	Location
`App.tsx`	Main navigation & screen routing	Common
`HomeScreen.tsx`	Central hub with plan list	Pages/Components
`PlanDetailsScreen.tsx`	Plan details & execution	Pages/Components
`CapturePhase.tsx`	Phase 1 UI (Wizard)	Pages/Components
`DiscoveryPhase.tsx`	Phase 2 UI (Wizard)	Pages/Components
`PlanPhase.tsx`	Phase 3 UI (Wizard)	Pages/Components
`ExecutePhase.tsx`	Phase 4 UI (Wizard)	Pages/Components
`ReportsPhase.tsx`	Phase 5 UI (Wizard)	Pages/Components
`EvalNavigationSetup.tsx`	Entry point & auth config	Pages/Components
`ComputerUseLoggerModal.tsx`	Computer Use monitoring	Pages/Components
`DOMRecordingView.tsx`	Action recording viewer	Pages/Components
`flowDiscovery.ts`	Capture & generation	Services
`discoveryService.ts`	Flow discovery	Services
`testOrchestratorComputerUse.ts`	Test execution (Computer Use)	Services
`computerUse.ts`	Computer Use API integration	Services
`actionRecorder.ts`	Action recording (DOM selectors)	Services
`replayEngine.ts`	Action replay with fallback	Services
`computerUseLogger.ts`	Computer Use logging store	State
`test-plans-api.ts`	Test plans API client	Services
`api-client.ts`	Backend API	Services

State Flow

User Action → Component → State Action → API Call → Backend → Response → State Update → UI Update

Common Patterns

Capture Screen:

typescript

await captureActiveScreen(notes?, tabId, fullPage=true, generateDescription=true)

Generate Tests:

typescript

await generateTestPlanFromCaptures(customPrompt?, selectedPhaseNames)

Save Eval:

typescript

saveEval(name, description) // Creates in backend, stores locally with backend UUID

Run Tests:

typescript

await runGeneratedSuite({ stopOnFailure: false })

Submit Report:

typescript

await evalsAPI.submitReport(evalId, activeReport)

Computer Use vs DOM-Based Automation

Traditional DOM-Based Approach ❌

typescript

// Brittle selectors that break when HTML changes
const loginButton = await page.$('[data-testid="login-btn"]');
await loginButton.click();

// Requires maintaining selectors
const email = await page.$('#email-input');
await email.type('user@example.com');

Problems:

Selectors break when developers change HTML
Requires data-testid attributes or stable CSS classes
Can't handle dynamic content well
Needs constant maintenance

Computer Use Approach ✅

typescript

// AI sees the screen and understands visually
const screenshot = await captureScreenshot();
const action = await computerUseAPI.determineAction(screenshot, {
  instruction: "Click the login button"
});
// Returns: { type: "click", x: 640, y: 450 }

await executeAction(action);

Advantages:

No selectors needed - AI sees like a human
Works even when HTML changes
Resilient to UI updates
Natural interaction with coordinates
Context-aware decision making

New Features

Home Screen (Central Hub)

The Home Screen is the default view showing all test plans for the active project:

Features:

Plan List: View all plans with status badges (New, Discovering, Testing, Completed)
Status Filters: Filter plans by status
Plan Cards: Show plan name, description, test case count, last run date
Quick Actions: Create new plan, view details, execute
Entry Point Indicators: See which plans have custom entry points
Auth Badges: Identify authenticated workflows at a glance

Plan Details Screen

Detailed view of a single test plan with all configuration:

Displays:

Entry Point URL: Full URL constructed from hostname + entry path
Authentication Status: ON/OFF badge with credential visibility
Credentials: Masked credentials (username, password, API key) with show/hide toggle
Test Cases: List of all configured test cases
Actions: Back to home, edit plan, execute tests

Entry Point Configuration

Configure custom entry points for tests that don't start at the root URL:

Usage:

Project Hostname: example.com
Entry Point Path: /admin
Result: https://example.com/admin

Common Use Cases:

Admin portals: /admin, /dashboard
Login pages: /login, /auth
Specific features: /products, /checkout
Subdirectories: /app, /portal

Authenticated Workflow Toggle

Control whether tests require authentication:

When ON (default):

Shows credential inputs in navigation setup
Loads credentials from eval → project → none
Auto-login detection enabled
Computer Use receives credentials in context
Login instructions included in prompts

When OFF:

Hides all credential fields
No credential loading
No auto-login attempts
Computer Use runs without credentials
Perfect for public pages and unauthenticated flows

Computer Use Logger

Real-time monitoring tool for debugging Computer Use interactions:

Features:

Action Log: Every Computer Use API call with timestamps
Context Display: Instruction, screenshot (truncated), credentials presence
Response Details: AI thought process, action type, coordinates
Error Tracking: Failed actions with error messages
Login Detection: State changes with confidence and reasoning
Statistics: Total calls, success rate, error count

Access:

Click Monitor icon in top bar
Badge shows log count (red if errors)
Expandable entries with full details
Clear all functionality

Workflow Summary

The workflow emphasizes hub-based navigation, flexible entry points, and vision-based automation with intelligent replay:

Home Screen: Central hub for all test plans
- View all plans for active project
- Filter by status
- Create new or select existing plan
Plan Configuration (Creation Flow):
- Capture Phase: Create screenshots (vision only, no DOM)
- Discover Phase: AI identifies navigation phases
- Plan Phase:
  - Generate up to 10 tests per phase
  - Entry Point: Set custom starting URL path
  - Auth Toggle: Enable/disable authentication workflow
  - Credentials: Configure eval-level or use project defaults
  - Add sitemap URLs for navigation links
  - Save plan → Return to Home
Plan Execution (Execution Flow):
- Select plan from Home → View Plan Details
- Review entry point, auth status, credentials
- Click Execute → Run tests
- First Run: Computer Use (vision) + action recording
- Repeat Runs: DOM replay with Computer Use fallback
- Auto-Login: Automatic when auth toggle ON
- View Reports → Return to Home

Key Benefits:

✅ Vision-Based Testing: No brittle selectors needed for first run
✅ Intelligent Replay: Fast DOM-based replay on subsequent runs
✅ Resilient: Computer Use fallback when DOM changes
✅ Auto-Login: Automatic credential handling and login detection
✅ Phase Organization: Clear structure and management
✅ Flexible Generation: Up to 10 tests per phase
✅ Better Scaling: Manage large test suites efficiently
✅ No DOM Required: Pure screenshot analysis for test generation

Next Steps

Backend Integration - Learn about API endpoints
Test Execution - Deep dive into AI automation
Troubleshooting - Common issues and fixes

Complete Workflow Guide ​

Terminology ​

Workflow Overview ​

Creation Flow ​

Execution Flow ​

Navigation Architecture ​

Data Flow Diagram ​

Phase 1: Capture Key Screens ​

User Interface ​

Technical Process ​

Phase 2: Discover Eval Phases ​

AI Analysis Flow ​

Selection Interface ​

Phase 3: Generate Test Cases Per Phase ​

Test Case Structure ​

Save Eval Workflow ​

Phase 4: Execute Tests ​

Vision-Based Automation with Computer Use ​

AI Automation Loop (Computer Use Model) ​

Available Computer Actions ​

Execution States ​

Phase 5: View Reports ​

Report Structure ​

Reports UI ​

Backend Integration ​

API Architecture ​

Request Flow Example ​

Debugging Tools ​

API Logs Interface ​

Quick Reference ​

Key Components ​

State Flow ​

Common Patterns ​

Computer Use vs DOM-Based Automation ​

Traditional DOM-Based Approach ❌ ​

Computer Use Approach ✅ ​

New Features ​

Home Screen (Central Hub) ​

Plan Details Screen ​

Entry Point Configuration ​

Authenticated Workflow Toggle ​

Computer Use Logger ​

Workflow Summary ​

Complete Workflow Guide

Terminology

Workflow Overview

Creation Flow

Execution Flow

Navigation Architecture

Data Flow Diagram

Phase 1: Capture Key Screens

User Interface

Technical Process

Phase 2: Discover Eval Phases

AI Analysis Flow

Selection Interface

Phase 3: Generate Test Cases Per Phase

Test Case Structure

Save Eval Workflow

Phase 4: Execute Tests

Vision-Based Automation with Computer Use

AI Automation Loop (Computer Use Model)

Available Computer Actions

Execution States

Phase 5: View Reports

Report Structure

Reports UI

Backend Integration

API Architecture

Request Flow Example

Debugging Tools

API Logs Interface

Quick Reference

Key Components

State Flow

Common Patterns

Computer Use vs DOM-Based Automation

Traditional DOM-Based Approach ❌

Computer Use Approach ✅

New Features

Home Screen (Central Hub)

Plan Details Screen

Entry Point Configuration

Authenticated Workflow Toggle

Computer Use Logger

Workflow Summary