System Overview

BrowseGenius is a Chrome extension that transforms exploratory testing into an AI-assisted workflow. It builds on the original Taxy automation loop while adding capture, planning, execution, and evidence layers purpose-built for QA engineers.

Core pillars

AI-Powered Screen Analysis Capture up to five pivotal screens with full-width, height-restricted (800px) screenshots. OpenAI Vision API automatically describes each screen's UI elements, flows, and functionality. The extension combines visual descriptions with DOM snapshots to understand your application.
Intelligent Test Flow Generation GPT-4 analyzes screen descriptions and generates 3-7 prioritized test flows covering authentication, data flows, critical paths, and edge cases. Each flow includes detailed steps with actions and expected outcomes.
Dual-Mode Test Execution
- Vision-Based (Primary): Uses OpenAI's computer use model to visually locate and interact with elements
- DOM-Based (Fallback): Traditional selector-based automation for speed and reliability
Smart Test Recording First test runs record every action with multiple selector strategies (ID, CSS, XPath, coordinates), network requests, and screenshots. These JSON recordings enable fast, deterministic replays.
Network Request Monitoring Chrome DevTools Protocol integration captures all HTTP traffic during test execution. Request/response data is linked to specific actions and validated on replay.
Evidence-Rich Reporting Each run generates comprehensive reports with action logs, network traces, screenshots, timing data, and JSON recordings for replay. Reports export as JSON + HTML bundles.

Extension surfaces

Surface	Purpose
Popup	Capture key screens, configure models/keys, launch suites, download reports.
DevTools panel	Monitor live execution, view action logs, inspect diagnostics.
Background worker	Coordinates orchestration, LLM calls, and report assembly.
Content script	Serialises the DOM with interactive metadata and responds to RPC calls.

High-level flow

mermaid

flowchart TD
  A[Select Tab & Capture] --> B[Vision API Describes Screen]
  B --> C[Store Screenshot + Description + DOM]
  C --> D{Captured 1-5 screens?}
  D -->|No| A
  D -->|Yes| E[Generate Test Plan]
  E -->|GPT-4 analyzes descriptions| F[User Reviews Flows]
  F -->|Edit/Delete| F
  F -->|Run Suite| G[First Run: Record Mode]
  G --> H[Action Recorder + Network Monitor]
  H --> I[Execute Tests - Vision/DOM]
  I --> J[Save JSON Recording]
  J --> K[Generate Report]
  K --> L[Subsequent Runs: Replay Mode]
  L --> M[Load JSON Recording]
  M --> N[Replay Actions + Validate Network]
  N --> K

Phase 1: Screen Capture & Analysis

User selects a tab and captures screenshot (full-width, 800px max height)
OpenAI Vision API automatically analyzes and describes the UI
Screenshot, description, and DOM snapshot are stored locally
Repeat up to 5 times for key application screens

Phase 2: Test Generation

FlowDiscoveryService sends screen descriptions to GPT-4
AI generates 3-7 prioritized test flows with steps
Flows populate the testPlanner store
User reviews, edits narratives, or deletes unwanted flows

Phase 3: First Execution (Recording)

TestOrchestrator starts ActionRecorder and NetworkMonitor
For each test case:
- Vision-based or DOM-based execution
- Every action recorded with multiple selectors
- Network requests captured and linked to actions
- Screenshots taken before/after key actions
JSON recording saved to state and downloadable

Phase 4: Replay Execution (Subsequent Runs)

Load stored JSON recording
For each action:
- Find element using selector priority (ID → data-testid → CSS → XPath → coordinates)
- Perform action
- Monitor and validate network requests match recording
Generate pass/fail report with differences

Tech stack

Chrome Extension MV3 with Chrome DevTools Protocol
React 18 + Chakra UI for popup and devtools surfaces
Zustand (immer + persist) for state management and localStorage persistence
OpenAI GPT-4o / GPT-4.1 / o1 for test generation and execution
OpenAI Vision API for screenshot analysis
Chrome Debugger API for network monitoring and full-page screenshots
Webpack 5 build pipeline
VitePress for documentation

Repository layout

src/
  background/         # Service worker & orchestrator bootstrap
  content/            # DOM serializers, capture helpers, RPC registry
  services/           # Core services
    flowDiscovery.ts    # Screenshot capture + Vision API + test generation
    actionRecorder.ts   # Records actions to JSON format
    networkMonitor.ts   # Network request capture via DevTools Protocol
    testOrchestrator.ts # Test execution coordinator
    reportBuilder.ts    # Report generation and export
  types/
    testRecording.ts    # TypeScript types for JSON recording format
  pages/              # Popup, DevTools, Options, etc.
  state/              # Zustand slices (settings, UI, tasks, planner)
    testPlanner.ts      # Test cases, captures, recordings
docs/                 # VitePress documentation
example-recording.json  # Example JSON recording format
RECORDING_FORMAT.md     # JSON schema documentation

Continue to LLM Orchestration for a deeper look at how GPT-4 drives actions, Data & Storage to understand persistence, or Network Requests for a breakdown of every outbound call.

System Overview ​

Core pillars ​

Extension surfaces ​

High-level flow ​

Phase 1: Screen Capture & Analysis ​

Phase 2: Test Generation ​

Phase 3: First Execution (Recording) ​

Phase 4: Replay Execution (Subsequent Runs) ​

Tech stack ​

Repository layout ​