Skip to content

Complete Workflow Guide

This guide describes the complete end-to-end workflow of BrowseGenius for AI-powered test automation using Evaluations (Evals) and Phases.

Terminology

  • Eval (Evaluation): A complete testing session for a project containing multiple phases
  • Phase: A specific navigation section of your application (e.g., Dashboard, Settings, User Management)
  • Each phase can have up to 10 test cases
  • Evals organize and track testing across all phases
  • Computer Use Model: Vision-based AI that sees and interacts with UI like a human (no DOM selectors needed)
  • Home Screen: Central hub showing all test plans for the active project
  • Plan Details: Detailed view of a single plan with execution controls
  • Entry Point Path: Custom path appended to project hostname (e.g., /admin, /login)
  • Authenticated Workflow: Toggle to enable/disable authentication flow and credential handling
  • Computer Use Logger: Real-time monitoring tool for Computer Use API calls and actions

Workflow Overview

BrowseGenius uses a hub-based navigation with two main flows:

Creation Flow

Home Screen → Wizard (Capture) → Wizard (Discover) → Wizard (Plan) → Save → Home Screen

Execution Flow

Home Screen → Plan Details → Execute → Reports → Home Screen
┌─────────────────────────────────────────────────────────────────────┐
│                       BROWSEGENIUS WORKFLOW                         │
└─────────────────────────────────────────────────────────────────────┘

HOME SCREEN (Central Hub)           PLAN DETAILS
┌────────────────────┐              ┌────────────────────┐
│  All Test Plans    │              │   Single Plan      │
│  for Project       │              │   Details          │
│                    │              │                    │
│  • Filter by status│◀─────────────│  • Entry point     │
│  • Create new      │              │  • Auth toggle     │
│  • View details    │─────────────>│  • Credentials     │
│  • Search plans    │              │  • Test cases      │
│                    │              │  • Execute button  │
│  [+ New Plan]      │              │  [← Back] [▶ Run]  │
└────────────────────┘              └────────────────────┘
         │                                   │
         │ Create New                        │ Execute
         ▼                                   ▼
┌────────────────────┐              ┌────────────────────┐
│  WIZARD MODE       │              │  WIZARD MODE       │
│  (Creation)        │              │  (Execution)       │
│                    │              │                    │
│  Phase 1: CAPTURE  │              │  Phase 4: EXECUTE  │
┌────────────────────┐              ┌────────────────────┐
│  Take Screenshots  │              │   AI Analyzes      │
│  of Key Screens    │──────────────>│   Application      │
│                    │              │                    │
│  • Select Tab      │              │  • App Overview    │
│  • Click Capture   │              │  • User Roles      │
│  • AI Vision       │              │  • Navigation      │
│  • Screenshot PNG  │              │    Phases          │
│  • NO DOM needed   │              │  • Priorities      │
│  Max: 5 screens    │              │                    │
└────────────────────┘              │  Select phases ✓   │
         │                          └────────────────────┘
         │                                   │
         ▼                                   │
┌────────────────────┐                      ▼
│ SAVED EVALS        │              ┌────────────────────┐
│                    │              │ Phase 3: PLAN      │
│ • View existing    │              │                    │
│ • Select eval      │◀─────────────│  AI Generates      │
│ • New eval         │              │  Tests Per Phase   │
│ • Sync from backend│              │                    │
│ • Credentials      │              │  • Up to 10 tests  │
│ • Sitemap links    │              │    per phase       │
│                    │              │  • Steps           │
└────────────────────┘              │  • Expectations    │
         │                          │  • Save eval       │
         │ Select Eval              └────────────────────┘
         └─────────────────────────>            │


                                    ┌────────────────────┐
                                    │ Phase 4: EXECUTE   │
                                    │                    │
                                    │  Computer Use      │
                                    │  Vision-Based      │
                                    │                    │
                                    │  ┌──────────────┐  │
                                    │  │ Screenshot   │  │
                                    │  └──────┬───────┘  │
                                    │         ▼          │
                                    │  ┌──────────────┐  │
                                    │  │ AI Vision    │  │
                                    │  │ + Credentials│  │
                                    │  │ determines   │  │
                                    │  │ action       │  │
                                    │  └──────┬───────┘  │
                                    │         ▼          │
                                    │  ┌──────────────┐  │
                                    │  │ Execute      │  │
                                    │  │ click(x,y)   │  │
                                    │  │ type(text)   │  │
                                    │  │ Auto-login   │  │
                                    │  └──────┬───────┘  │
                                    │         │          │
                                    │         │ Record   │
                                    │         │ actions  │
                                    │         │ (JSON)   │
                                    │         │          │
                                    │         └──────┐   │
                                    │         Repeat │   │
                                    │         (max   │   │
                                    │          40x)  │   │
                                    │                │   │
                                    └────────────────┼───┘


                                            ┌────────────────────┐
                                            │ Phase 5: REPORTS   │
                                            │                    │
                                            │  View Results      │
                                            │                    │
                                            │  • Summary stats   │
                                            │  • Test details    │
                                            │  • Download        │
                                            │    - JSON          │
                                            │    - HTML          │
                                            │  • Submit to       │
                                            │    backend         │
                                            └────────────────────┘

Data Flow Diagram

┌──────────────┐
│ User Browser │
└──────┬───────┘

       │ 1. Capture screenshots

┌──────────────────────────────┐
│ Chrome Extension             │
│                              │
│  ┌────────────────────────┐  │
│  │ FlowCaptureSection     │  │
│  │                        │  │
│  │ • captureActiveScreen()│  │
│  │   └─> CDP Screenshot   │  │
│  │   └─> Vision analysis  │  │
│  │   └─> NO DOM capture   │  │
│  └────────────────────────┘  │
│           │                  │
│           │ 2. AI Vision     │
│           ▼                  │
│  ┌────────────────────────┐  │
│  │ OpenAI Vision API      │──┼──> External API
│  │ (GPT-4o)               │  │
│  │ describeScreenshot()   │  │
│  │   Returns: UI analysis │  │
│  └────────────────────────┘  │
│           │                  │
│           │ 3. Store         │
│           ▼                  │
│  ┌────────────────────────┐  │
│  │ Zustand State          │  │
│  │                        │  │
│  │ ScreenCapture {        │  │
│  │   imageDataUrl,        │  │
│  │   imageDescription     │  │
│  │   (domSnapshot: legacy)│  │
│  │ }                      │  │
│  └────────────────────────┘  │
│           │                  │
│           │ 4. Discover      │
│           ▼                  │
│  ┌────────────────────────┐  │
│  │ discoveryService       │  │
│  │                        │  │
│  │ GPT-4 analyzes:        │──┼──> External API
│  │   App, Actors, Phases  │  │
│  └────────────────────────┘  │
│           │                  │
│           │ 5. Generate      │
│           ▼                  │
│  ┌────────────────────────┐  │
│  │ flowDiscovery          │  │
│  │                        │  │
│  │ GPT-4 generates:       │──┼──> External API
│  │   Test cases per phase │  │
│  └────────────────────────┘  │
│           │                  │
│           │ 6. Save          │
│           ▼                  │
│  ┌────────────────────────┐  │
│  │ evalsAPI.create()      │──┼──┐
│  │                        │  │  │
│  │ Backend creates eval   │  │  │ 7. Backend API
│  │ Returns: UUID          │  │  │
│  └────────────────────────┘  │  │
│           │                  │  │
│           │ 8. Execute       │  │
│           ▼                  │  │
│  ┌────────────────────────┐  │  │
│  │ testOrchestratorCU     │  │  │
│  │                        │  │  │
│  │ For each test (1st):   │  │  │
│  │   1. Load credentials  │  │  │
│  │   2. Capture screenshot│  │  │
│  │   3. Send to Computer  │──┼──> OpenAI Computer Use
│  │      Use API + creds   │  │  │    (Vision + Actions)
│  │   4. Execute action    │  │  │
│  │      (click, type, etc)│  │  │
│  │   5. Record action     │  │  │
│  │      (DOM selectors)   │  │  │
│  │   6. Auto-login if     │  │  │
│  │      detected          │  │  │
│  │   7. Repeat            │  │  │
│  │                        │  │  │
│  │ For repeat runs:       │  │  │
│  │   1. Replay recorded   │  │  │
│  │      actions (DOM)     │  │  │
│  │   2. Fallback to       │  │  │
│  │      Computer Use      │  │  │
│  └────────────────────────┘  │  │
│           │                  │  │
│           │ 9. Report        │  │
│           ▼                  │  │
│  ┌────────────────────────┐  │  │
│  │ submitTestReport()     │──┼──┘
│  │                        │  │
│  │ Backend stores report  │  │
│  │ Deducts credits        │  │
│  └────────────────────────┘  │
└──────────────────────────────┘

       │ 10. View results

┌──────────────┐
│ Reports UI   │
└──────────────┘

Phase 1: Capture Key Screens

User Interface

╔════════════════════════════════════════════════════════════════╗
║ Phase 1: Capture Key Screens                    [+ New Eval]  ║
╠════════════════════════════════════════════════════════════════╣
║                                                                 ║
║  Take full-page screenshots of up to 5 important screens.     ║
║                                                                 ║
╠════════════════════════════════════════════════════════════════╣
║ SAVED EVALS                         3 saved  [Sync]           ║
╠════════════════════════════════════════════════════════════════╣
║ ┌────────────────────────┐  ┌────────────────────────┐       ║
║ │ Dashboard Eval         │  │ Admin Panel Eval       │       ║
║ │ Test dashboard...      │  │ Test admin features... │       ║
║ │                        │  │                        │       ║
║ │ 3 phases · 15 tests    │  │ 2 phases · 12 tests    │       ║
║ │ Last: Oct 14, 2025     │  │ Last: Oct 13, 2025     │       ║
║ │                        │  │                        │       ║
║ │ [Select Eval] [▶] [🗑] │  │ [Select Eval] [▶] [🗑] │       ║
║ └────────────────────────┘  └────────────────────────┘       ║
╠════════════════════════════════════════════════════════════════╣
║ KEY SCREENS                                                    ║
╠════════════════════════════════════════════════════════════════╣
║ [Tab ▼ example.com/login          ]  [📷 Capture]            ║
║                                                                 ║
║ ┌─────────────────┐  ┌─────────────────┐                     ║
║ │ Screen 1        │  │ Screen 2        │                     ║
║ │ [Screenshot]    │  │ [Screenshot]    │                     ║
║ │                 │  │                 │                     ║
║ │ Login Page      │  │ Dashboard       │                     ║
║ │ /login          │  │ /dashboard      │                     ║
║ │                 │  │                 │                     ║
║ │ [✏️ Notes] [🗑] │  │ [✏️ Notes] [🗑] │                     ║
║ └─────────────────┘  └─────────────────┘                     ║
╚════════════════════════════════════════════════════════════════╝

Technical Process

User Action                    System Process
────────────                   ──────────────

1. Select tab
   [example.com ▼]

2. Click "Capture"            │

                        ┌─────────────────┐
                        │ attachDebugger  │
                        │ (tabId)         │
                        └────────┬────────┘


                        ┌─────────────────┐
                        │ Page.enable()   │
                        │ getLayoutMetrics│
                        └────────┬────────┘


                        ┌─────────────────┐
                        │ captureScreenshot│
                        │ • width: full   │
                        │ • height: max   │
                        │   800px         │
                        └────────┬────────┘


                        ┌─────────────────────┐
                        │ describeScreenshot  │
                        │ (OpenAI Vision API) │
                        │                     │
                        │ Analyzes:           │
                        │ • UI elements       │
                        │ • User flows        │
                        │ • Forms, buttons    │
                        │ • Navigation        │
                        │ • Visual layout     │
                        │ • Actionable items  │
                        │                     │
                        │ NO DOM parsing!     │
                        │ Pure vision AI      │
                        └────────┬────────────┘


                        ┌─────────────────┐
                        │ Store in Zustand│
                        │                 │
                        │ ScreenCapture { │
                        │   image,        │
                        │   description,  │
                        │   metadata,     │
                        │   url           │
                        │ }               │
                        └─────────────────┘

3. Screenshot saved             │
   ✓ Screen 1                   │

                        ┌─────────────────┐
                        │ detachDebugger  │
                        └─────────────────┘

Phase 2: Discover Eval Phases

AI Analysis Flow

Input: Screenshots            Output: Phase Discovery
─────────────────            ───────────────────────

┌─────────────────┐
│ Screen 1        │          ┌────────────────────────┐
│ • imageDataUrl  │          │ App Overview           │
│ • domSnapshot   │          │ "E-commerce platform   │
│ • description   │          │  for buying products"  │
└────────┬────────┘          └────────────────────────┘

┌────────▼────────┐          ┌────────────────────────┐
│ Screen 2        │          │ User Actors            │
│ • imageDataUrl  │  ──────> │ • Guest User           │
│ • domSnapshot   │  GPT-4   │ • Registered User      │
│ • description   │          │ • Admin                │
└────────┬────────┘          └────────────────────────┘

┌────────▼────────┐          ┌────────────────────────┐
│ Screen 3        │          │ Navigation Phases      │
│ • imageDataUrl  │          │                        │
│ • domSnapshot   │          │ ✓ Dashboard (P0)       │
│ • description   │          │ ✓ Settings (P1)        │
└─────────────────┘          │   User Mgmt (P0)       │
                             │ ✓ Profile (P2)         │
                             └────────────────────────┘

Selection Interface

╔════════════════════════════════════════════════════════════════╗
║ Phase 2: Discover Eval Phases                                 ║
╠════════════════════════════════════════════════════════════════╣
║ AI analyzes your 3 captured screens to discover navigation    ║
║ sections and create testable phases                           ║
╠════════════════════════════════════════════════════════════════╣
║                                    [🔍 Discover Phases]       ║
╠════════════════════════════════════════════════════════════════╣
║ 💡 APPLICATION OVERVIEW                                        ║
║                                                                 ║
║ E-commerce platform for buying products online with user       ║
║ authentication and shopping cart functionality.                ║
╠════════════════════════════════════════════════════════════════╣
║ 👥 USER ROLES                                                  ║
║                                                                 ║
║ [Guest User]  [Registered User]  [Admin]                      ║
╠════════════════════════════════════════════════════════════════╣
║ DISCOVERED PHASES (3 selected)           [Select All] [Clear]  ║
╠════════════════════════════════════════════════════════════════╣
║ ┌────────────────────────────────────────────────────┐        ║
║ │ ☑ Dashboard                               [P0] ✓   │        ║
║ │   Main application dashboard and overview          │        ║
║ │   Screens: Dashboard, Home                         │        ║
║ └────────────────────────────────────────────────────┘        ║
║ ┌────────────────────────────────────────────────────┐        ║
║ │ ☑ Settings                                [P1] ✓   │        ║
║ │   Application settings and configuration           │        ║
║ │   Screens: Settings, Preferences                   │        ║
║ └────────────────────────────────────────────────────┘        ║
║ ┌────────────────────────────────────────────────────┐        ║
║ │ ☐ User Management                         [P0]     │        ║
║ │   Manage users, roles, and permissions             │        ║
║ │   Screens: Users, Roles                            │        ║
║ └────────────────────────────────────────────────────┘        ║
║ ┌────────────────────────────────────────────────────┐        ║
║ │ ☑ Profile                                 [P2] ✓   │        ║
║ │   User profile viewing and editing                 │        ║
║ │   Screens: Profile, Account                        │        ║
║ └────────────────────────────────────────────────────┘        ║
╠════════════════════════════════════════════════════════════════╣
║ ⚠ Select at least one phase to continue                       ║
╚════════════════════════════════════════════════════════════════╝

Phase 3: Generate Test Cases Per Phase

Test Case Structure

Each phase can have up to 10 test cases:

Phase: "Dashboard"
├── PhaseTestCase 1
│   ├── id: "uuid-1234"
│   ├── phaseName: "Dashboard"
│   ├── title: "Verify dashboard loads correctly"
│   ├── narrative: "As a user, I want to see my dashboard..."
│   ├── priority: "P0"
│   ├── status: "idle"
│   └── steps: [
│         ┌──────────────────────────────────────────────┐
│         │ Step 1                                       │
│         │ action: "Navigate to dashboard"             │
│         │ expectation: "Dashboard page loads"         │
│         └──────────────────────────────────────────────┘
│         ┌──────────────────────────────────────────────┐
│         │ Step 2                                       │
│         │ action: "Verify widgets displayed"          │
│         │ expectation: "All widgets render correctly" │
│         └──────────────────────────────────────────────┘
│       ]
├── PhaseTestCase 2 (up to 10 total per phase)
└── ...

Save Eval Workflow

User Action              Frontend                  Backend
───────────             ────────                  ───────

1. Click "Save Eval"

2. Enter name/desc       │

3. Click "Save"          │

                    ┌─────────────┐
                    │ saveEval()  │
                    └──────┬──────┘

                           │ POST /evals

                                              ┌──────────────┐
                                              │ Create eval  │
                                              │ evalId =     │
                                              │ UUID()       │
                                              └──────┬───────┘

                           ◀─────────────────────────┘
                           │ Returns: { id: UUID }

                    ┌──────▼──────┐
                    │ Store       │
                    │ locally     │
                    │ with backend│
                    │ UUID ✅     │
                    └─────────────┘

                           │ Associate with project

                    ┌─────────────┐
                    │ project.    │
                    │ evalIds.    │
                    │ push(UUID)  │
                    └─────────────┘

4. ✓ Eval saved            │
   X tests across Y phases │

   Future runs use         │
   same UUID ───────────────┘

Phase 4: Execute Tests

Vision-Based Automation with Computer Use

BrowseGenius uses OpenAI's Computer Use model for test execution, which provides several advantages over traditional DOM-based automation:

Benefits:

  • Visual Understanding: AI sees the UI like a human, understanding visual context and layout
  • Resilient to Changes: No brittle selectors - works even when DOM structure changes
  • Natural Interactions: Coordinate-based clicking, typing, and scrolling
  • Context-Aware: AI understands the current state from screenshots
  • Fewer Selectors: No need to maintain CSS selectors, XPath, or data-testid attributes

How It Works:

  1. Capture screenshot of current browser state
  2. Send screenshot to OpenAI Computer Use API with task instructions
  3. Model analyzes visual UI and determines next action (click, type, scroll)
  4. Execute action using coordinates or keyboard input
  5. Wait for UI update, capture new screenshot
  6. Repeat until task completes or fails

AI Automation Loop (Computer Use Model)

┌──────────────────────────────────────────────────────────────┐
│                     TEST EXECUTION LOOP                      │
│              (Computer Use - Vision-Based Actions)           │
│                      (Max 40 iterations)                     │
└──────────────────────────────────────────────────────────────┘

Iteration N              Action                    Result
───────────              ──────                    ──────


    │ 1. Capture screenshot

┌───────────────┐
│ CDP:          │        Screenshot (PNG/Base64)
│ captureScreen │───────> 1920x1080 viewport
└───────┬───────┘          Full browser viewport
        │                  with all UI elements
        │ 2. Send to Computer Use API

┌─────────────────┐
│ OpenAI Computer │        Context:
│ Use Model       │        • Screenshot (vision)
│ (gpt-4o)        │        • Test instructions
└───────┬─────────┘        • Previous actions
        │                  • Task goal

        │ 3. AI Response (computer_call)

┌───────────────────────────────────┐
│ {                                 │
│   thought: "I see login button",  │
│   action: "click",                │
│   parsedAction: {                 │
│     type: "click",                │
│     x: 640,                       │
│     y: 350,                       │
│     button: "left"                │
│   }                               │
│ }                                 │
└────────┬──────────────────────────┘

         │ 4. Execute computer action

┌────────────────────┐
│ executeComputer    │───> Playwright/CDP:
│ Action()           │      • click(x, y)
│ • click(x, y)      │      • type(text)
│ • double_click     │      • scroll(dx, dy)
│ • type(text)       │      • keypress(keys)
│ • scroll(dx, dy)   │      • move(x, y)
│ • keypress(keys)   │      • wait(ms)
│ • finish           │
│ • fail             │
└────────┬───────────┘

         │ 5. Wait for UI update (1.5s)

    Sleep(1500)

         │ 6. Next iteration (capture new screenshot)
         └─────────────────┐

         ┌─────────────────┘


    [Repeat until:]
    • AI calls finish() → ✅ PASSED
    • AI calls fail()   → ❌ FAILED
    • Max 40 actions    → ⚠️ BLOCKED
    • Error occurs      → ❌ FAILED

Available Computer Actions

The Computer Use model can perform these actions:

ActionParametersDescription
clickx, y, buttonClick at coordinates (left, right, middle, back, forward)
double_clickx, yDouble-click at coordinates
typetextType text at current cursor position
keypresskeysPress keyboard keys (Enter, Tab, Escape, etc.)
scrollx, y, scroll_x, scroll_yScroll viewport by offset
movex, yMove mouse cursor to coordinates
waitmsPause execution (default 1000ms)
finish-Mark test as PASSED
failreasonMark test as FAILED with reason

Example Response:

json
{
  "thought": "I need to click the login button in the center of the screen",
  "action": "click",
  "parsedAction": {
    "type": "click",
    "x": 640,
    "y": 450,
    "button": "left"
  }
}

Execution States

Test Case Lifecycle
────────────────────

┌──────┐
│ IDLE │  Initial state
└───┬──┘
    │ Click "Run Tests"

┌─────────┐
│ RUNNING │  AI executing actions
└───┬─────┘

    ├─> AI: finish() ──> ┌────────┐
    │                    │ PASSED │
    │                    └────────┘

    ├─> AI: fail() ────> ┌────────┐
    │                    │ FAILED │
    │                    └────────┘

    ├─> Max actions ───> ┌─────────┐
    │                    │ BLOCKED │
    │                    └─────────┘

    └─> Exception ─────> ┌────────┐
                         │ FAILED │
                         └────────┘

Phase 5: View Reports

Report Structure

SuiteReport
├── id: "report-uuid"
├── startedAt: "2025-10-14T21:00:00Z"
├── completedAt: "2025-10-14T21:05:30Z"
├── status: "complete"

├── summary
│   ├── total: 10
│   ├── passed: 7
│   ├── failed: 2
│   ├── blocked: 1
│   └── skipped: 0

├── cases: [
│     {
│       caseId: "test-1",
│       status: "passed",
│       durationMs: 45000
│     },
│     {
│       caseId: "test-2",
│       status: "failed",
│       failureReason: "Login button not found",
│       durationMs: 12000
│     }
│   ]

└── artifacts: [
      {
        id: "artifact-1",
        caseId: "test-1",
        timestamp: "2025-10-14T21:00:15Z",
        type: "log",
        message: "Clicked login button"
      },
      {
        id: "artifact-2",
        caseId: "test-2",
        timestamp: "2025-10-14T21:02:30Z",
        type: "screenshot",
        data: "base64..."
      }
    ]

Reports UI

╔════════════════════════════════════════════════════════════════╗
║ Phase 5: Test Results                                         ║
╠════════════════════════════════════════════════════════════════╣
║                                                                 ║
║ SUMMARY                                                        ║
║                                                                 ║
║ ┌─────────┐  ┌─────────┐  ┌─────────┐                        ║
║ │ Total   │  │ Passed  │  │ Failed  │                        ║
║ │   10    │  │    7    │  │    2    │                        ║
║ └─────────┘  └─────────┘  └─────────┘                        ║
║                                                                 ║
║ Started: Oct 14, 2025 9:00 PM                                 ║
║ Duration: 5m 30s                                              ║
║                                                                 ║
║                      [Download Report ▼]                       ║
║                      • JSON Only                              ║
║                      • HTML Only                              ║
║                      • Both (Bundle)                          ║
╠════════════════════════════════════════════════════════════════╣
║ TEST CASES                                                     ║
╠════════════════════════════════════════════════════════════════╣
║ ┌────────────────────────────────────────────────────────┐   ║
║ │ #1 [P0] [✓ PASSED]                                     │   ║
║ │ Verify successful login with valid credentials         │   ║
║ │ Duration: 45s                                           │   ║
║ └────────────────────────────────────────────────────────┘   ║
║ ┌────────────────────────────────────────────────────────┐   ║
║ │ #2 [P1] [✗ FAILED]                                     │   ║
║ │ Search products by category                            │   ║
║ │ ⚠ Login button not found                               │   ║
║ │ Duration: 12s                                           │   ║
║ └────────────────────────────────────────────────────────┘   ║
║ ┌────────────────────────────────────────────────────────┐   ║
║ │ #3 [P0] [⚠ BLOCKED]                                    │   ║
║ │ Complete checkout process                              │   ║
║ │ ⚠ Exceeded max actions (40)                            │   ║
║ │ Duration: 2m 15s                                        │   ║
║ └────────────────────────────────────────────────────────┘   ║
╚════════════════════════════════════════════════════════════════╝

Backend Integration

API Architecture

┌──────────────────┐                    ┌──────────────────┐
│ Chrome Extension │                    │ Cloudflare Worker│
│                  │                    │                  │
│  ┌────────────┐  │                    │  ┌────────────┐  │
│  │ API Client │──┼───── HTTPS ───────>│  │ Hono Router│  │
│  └────────────┘  │                    │  └─────┬──────┘  │
│                  │                    │        │         │
│  ┌────────────┐  │                    │  ┌─────▼──────┐  │
│  │ State      │  │                    │  │ Auth       │  │
│  │ (Zustand)  │  │                    │  │ Middleware │  │
│  └────────────┘  │                    │  └─────┬──────┘  │
└──────────────────┘                    │        │         │
                                        │  ┌─────▼──────┐  │
                                        │  │ Routes     │  │
                                        │  │ /projects  │  │
                                        │  │ /test-plans│  │
                                        │  │ /users     │  │
                                        │  └─────┬──────┘  │
                                        │        │         │
                                        │  ┌─────▼──────┐  │
                                        │  │ D1 Database│  │
                                        │  │            │  │
                                        │  │ • projects │  │
                                        │  │ • plans    │  │
                                        │  │ • reports  │  │
                                        │  └────────────┘  │
                                        └──────────────────┘

Request Flow Example

POST /test-plans/:id/report
────────────────────────────

Client                          Server
──────                          ──────

1. Build report object
   {
     summary: {...},
     cases: [...],
     artifacts: [...]
   }

        │ POST with X-Api-Key

                                2. Auth middleware
                                   verifies API key


                                3. Get userId from key


                                4. Verify plan exists
                                   and belongs to user

                                        ├─> 404 if not found


                                5. Calculate credits
                                   cost = base + (cases * 1)


                                6. Deduct credits

                                        ├─> 402 if insufficient


                                7. Insert into test_reports
                                   reportId = UUID()


                                8. Update plan.last_run_at


        ◀───────────────────────9. Return 201
                                   { reportId, creditsUsed }

Debugging Tools

API Logs Interface

╔════════════════════════════════════════════════════════════════╗
║ API Request Logs                           [Clear All]         ║
╠════════════════════════════════════════════════════════════════╣
║ ┌────────────────────────────────────────────────────────┐   ║
║ │ ✓ [GET] [200] 145ms                       21:05:30.123 │   ║
║ │ /api/v1/projects                                        │   ║
║ │                                              [▼ Expand] │   ║
║ └────────────────────────────────────────────────────────┘   ║
║ ┌────────────────────────────────────────────────────────┐   ║
║ │ ✗ [POST] [404] 946ms                      21:11:20.456 │   ║
║ │ /api/v1/test-plans/.../report                          │   ║
║ │ ⚠ Test plan not found or access denied                 │   ║
║ │                                              [▲ Collapse]│   ║
║ │ ┌────────────────────────────────────────────────────┐ │   ║
║ │ │ Request Headers:                                   │ │   ║
║ │ │ {                                                  │ │   ║
║ │ │   "Content-Type": "application/json",             │ │   ║
║ │ │   "X-Api-Key": "***"                              │ │   ║
║ │ │ }                                                  │ │   ║
║ │ │                                                    │ │   ║
║ │ │ Request Body:                                      │ │   ║
║ │ │ {                                                  │ │   ║
║ │ │   "report": {...}                                 │ │   ║
║ │ │ }                                                  │ │   ║
║ │ │                                                    │ │   ║
║ │ │ Response Data:                                     │ │   ║
║ │ │ {                                                  │ │   ║
║ │ │   "success": false,                               │ │   ║
║ │ │   "error": "Test plan not found"                  │ │   ║
║ │ │ }                                                  │ │   ║
║ │ └────────────────────────────────────────────────────┘ │   ║
║ └────────────────────────────────────────────────────────┘   ║
╚════════════════════════════════════════════════════════════════╝

Quick Reference

Key Components

ComponentPurposeLocation
App.tsxMain navigation & screen routingCommon
HomeScreen.tsxCentral hub with plan listPages/Components
PlanDetailsScreen.tsxPlan details & executionPages/Components
CapturePhase.tsxPhase 1 UI (Wizard)Pages/Components
DiscoveryPhase.tsxPhase 2 UI (Wizard)Pages/Components
PlanPhase.tsxPhase 3 UI (Wizard)Pages/Components
ExecutePhase.tsxPhase 4 UI (Wizard)Pages/Components
ReportsPhase.tsxPhase 5 UI (Wizard)Pages/Components
EvalNavigationSetup.tsxEntry point & auth configPages/Components
ComputerUseLoggerModal.tsxComputer Use monitoringPages/Components
DOMRecordingView.tsxAction recording viewerPages/Components
flowDiscovery.tsCapture & generationServices
discoveryService.tsFlow discoveryServices
testOrchestratorComputerUse.tsTest execution (Computer Use)Services
computerUse.tsComputer Use API integrationServices
actionRecorder.tsAction recording (DOM selectors)Services
replayEngine.tsAction replay with fallbackServices
computerUseLogger.tsComputer Use logging storeState
test-plans-api.tsTest plans API clientServices
api-client.tsBackend APIServices

State Flow

User Action → Component → State Action → API Call → Backend → Response → State Update → UI Update

Common Patterns

Capture Screen:

typescript
await captureActiveScreen(notes?, tabId, fullPage=true, generateDescription=true)

Generate Tests:

typescript
await generateTestPlanFromCaptures(customPrompt?, selectedPhaseNames)

Save Eval:

typescript
saveEval(name, description) // Creates in backend, stores locally with backend UUID

Run Tests:

typescript
await runGeneratedSuite({ stopOnFailure: false })

Submit Report:

typescript
await evalsAPI.submitReport(evalId, activeReport)

Computer Use vs DOM-Based Automation

Traditional DOM-Based Approach ❌

typescript
// Brittle selectors that break when HTML changes
const loginButton = await page.$('[data-testid="login-btn"]');
await loginButton.click();

// Requires maintaining selectors
const email = await page.$('#email-input');
await email.type('user@example.com');

Problems:

  • Selectors break when developers change HTML
  • Requires data-testid attributes or stable CSS classes
  • Can't handle dynamic content well
  • Needs constant maintenance

Computer Use Approach ✅

typescript
// AI sees the screen and understands visually
const screenshot = await captureScreenshot();
const action = await computerUseAPI.determineAction(screenshot, {
  instruction: "Click the login button"
});
// Returns: { type: "click", x: 640, y: 450 }

await executeAction(action);

Advantages:

  • No selectors needed - AI sees like a human
  • Works even when HTML changes
  • Resilient to UI updates
  • Natural interaction with coordinates
  • Context-aware decision making

New Features

Home Screen (Central Hub)

The Home Screen is the default view showing all test plans for the active project:

Features:

  • Plan List: View all plans with status badges (New, Discovering, Testing, Completed)
  • Status Filters: Filter plans by status
  • Plan Cards: Show plan name, description, test case count, last run date
  • Quick Actions: Create new plan, view details, execute
  • Entry Point Indicators: See which plans have custom entry points
  • Auth Badges: Identify authenticated workflows at a glance

Plan Details Screen

Detailed view of a single test plan with all configuration:

Displays:

  • Entry Point URL: Full URL constructed from hostname + entry path
  • Authentication Status: ON/OFF badge with credential visibility
  • Credentials: Masked credentials (username, password, API key) with show/hide toggle
  • Test Cases: List of all configured test cases
  • Actions: Back to home, edit plan, execute tests

Entry Point Configuration

Configure custom entry points for tests that don't start at the root URL:

Usage:

Project Hostname: example.com
Entry Point Path: /admin
Result: https://example.com/admin

Common Use Cases:

  • Admin portals: /admin, /dashboard
  • Login pages: /login, /auth
  • Specific features: /products, /checkout
  • Subdirectories: /app, /portal

Authenticated Workflow Toggle

Control whether tests require authentication:

When ON (default):

  • Shows credential inputs in navigation setup
  • Loads credentials from eval → project → none
  • Auto-login detection enabled
  • Computer Use receives credentials in context
  • Login instructions included in prompts

When OFF:

  • Hides all credential fields
  • No credential loading
  • No auto-login attempts
  • Computer Use runs without credentials
  • Perfect for public pages and unauthenticated flows

Computer Use Logger

Real-time monitoring tool for debugging Computer Use interactions:

Features:

  • Action Log: Every Computer Use API call with timestamps
  • Context Display: Instruction, screenshot (truncated), credentials presence
  • Response Details: AI thought process, action type, coordinates
  • Error Tracking: Failed actions with error messages
  • Login Detection: State changes with confidence and reasoning
  • Statistics: Total calls, success rate, error count

Access:

  • Click Monitor icon in top bar
  • Badge shows log count (red if errors)
  • Expandable entries with full details
  • Clear all functionality

Workflow Summary

The workflow emphasizes hub-based navigation, flexible entry points, and vision-based automation with intelligent replay:

  1. Home Screen: Central hub for all test plans

    • View all plans for active project
    • Filter by status
    • Create new or select existing plan
  2. Plan Configuration (Creation Flow):

    • Capture Phase: Create screenshots (vision only, no DOM)
    • Discover Phase: AI identifies navigation phases
    • Plan Phase:
      • Generate up to 10 tests per phase
      • Entry Point: Set custom starting URL path
      • Auth Toggle: Enable/disable authentication workflow
      • Credentials: Configure eval-level or use project defaults
      • Add sitemap URLs for navigation links
      • Save plan → Return to Home
  3. Plan Execution (Execution Flow):

    • Select plan from Home → View Plan Details
    • Review entry point, auth status, credentials
    • Click Execute → Run tests
    • First Run: Computer Use (vision) + action recording
    • Repeat Runs: DOM replay with Computer Use fallback
    • Auto-Login: Automatic when auth toggle ON
    • View Reports → Return to Home

Key Benefits:

  • Vision-Based Testing: No brittle selectors needed for first run
  • Intelligent Replay: Fast DOM-based replay on subsequent runs
  • Resilient: Computer Use fallback when DOM changes
  • Auto-Login: Automatic credential handling and login detection
  • Phase Organization: Clear structure and management
  • Flexible Generation: Up to 10 tests per phase
  • Better Scaling: Manage large test suites efficiently
  • No DOM Required: Pure screenshot analysis for test generation

Next Steps

Released under the MIT License.