System Overview
BrowseGenius is a Chrome extension that transforms exploratory testing into an AI-assisted workflow. It builds on the original Taxy automation loop while adding capture, planning, execution, and evidence layers purpose-built for QA engineers.
Core pillars
AI-Powered Screen Analysis Capture up to five pivotal screens with full-width, height-restricted (800px) screenshots. OpenAI Vision API automatically describes each screen's UI elements, flows, and functionality. The extension combines visual descriptions with DOM snapshots to understand your application.
Intelligent Test Flow Generation GPT-4 analyzes screen descriptions and generates 3-7 prioritized test flows covering authentication, data flows, critical paths, and edge cases. Each flow includes detailed steps with actions and expected outcomes.
Dual-Mode Test Execution
- Vision-Based (Primary): Uses OpenAI's computer use model to visually locate and interact with elements
- DOM-Based (Fallback): Traditional selector-based automation for speed and reliability
Smart Test Recording First test runs record every action with multiple selector strategies (ID, CSS, XPath, coordinates), network requests, and screenshots. These JSON recordings enable fast, deterministic replays.
Network Request Monitoring Chrome DevTools Protocol integration captures all HTTP traffic during test execution. Request/response data is linked to specific actions and validated on replay.
Evidence-Rich Reporting Each run generates comprehensive reports with action logs, network traces, screenshots, timing data, and JSON recordings for replay. Reports export as JSON + HTML bundles.
Extension surfaces
| Surface | Purpose |
|---|---|
| Popup | Capture key screens, configure models/keys, launch suites, download reports. |
| DevTools panel | Monitor live execution, view action logs, inspect diagnostics. |
| Background worker | Coordinates orchestration, LLM calls, and report assembly. |
| Content script | Serialises the DOM with interactive metadata and responds to RPC calls. |
High-level flow
flowchart TD
A[Select Tab & Capture] --> B[Vision API Describes Screen]
B --> C[Store Screenshot + Description + DOM]
C --> D{Captured 1-5 screens?}
D -->|No| A
D -->|Yes| E[Generate Test Plan]
E -->|GPT-4 analyzes descriptions| F[User Reviews Flows]
F -->|Edit/Delete| F
F -->|Run Suite| G[First Run: Record Mode]
G --> H[Action Recorder + Network Monitor]
H --> I[Execute Tests - Vision/DOM]
I --> J[Save JSON Recording]
J --> K[Generate Report]
K --> L[Subsequent Runs: Replay Mode]
L --> M[Load JSON Recording]
M --> N[Replay Actions + Validate Network]
N --> KPhase 1: Screen Capture & Analysis
- User selects a tab and captures screenshot (full-width, 800px max height)
- OpenAI Vision API automatically analyzes and describes the UI
- Screenshot, description, and DOM snapshot are stored locally
- Repeat up to 5 times for key application screens
Phase 2: Test Generation
FlowDiscoveryServicesends screen descriptions to GPT-4- AI generates 3-7 prioritized test flows with steps
- Flows populate the
testPlannerstore - User reviews, edits narratives, or deletes unwanted flows
Phase 3: First Execution (Recording)
TestOrchestratorstartsActionRecorderandNetworkMonitor- For each test case:
- Vision-based or DOM-based execution
- Every action recorded with multiple selectors
- Network requests captured and linked to actions
- Screenshots taken before/after key actions
- JSON recording saved to state and downloadable
Phase 4: Replay Execution (Subsequent Runs)
- Load stored JSON recording
- For each action:
- Find element using selector priority (ID → data-testid → CSS → XPath → coordinates)
- Perform action
- Monitor and validate network requests match recording
- Generate pass/fail report with differences
Tech stack
- Chrome Extension MV3 with Chrome DevTools Protocol
- React 18 + Chakra UI for popup and devtools surfaces
- Zustand (immer + persist) for state management and localStorage persistence
- OpenAI GPT-4o / GPT-4.1 / o1 for test generation and execution
- OpenAI Vision API for screenshot analysis
- Chrome Debugger API for network monitoring and full-page screenshots
- Webpack 5 build pipeline
- VitePress for documentation
Repository layout
src/
background/ # Service worker & orchestrator bootstrap
content/ # DOM serializers, capture helpers, RPC registry
services/ # Core services
flowDiscovery.ts # Screenshot capture + Vision API + test generation
actionRecorder.ts # Records actions to JSON format
networkMonitor.ts # Network request capture via DevTools Protocol
testOrchestrator.ts # Test execution coordinator
reportBuilder.ts # Report generation and export
types/
testRecording.ts # TypeScript types for JSON recording format
pages/ # Popup, DevTools, Options, etc.
state/ # Zustand slices (settings, UI, tasks, planner)
testPlanner.ts # Test cases, captures, recordings
docs/ # VitePress documentation
example-recording.json # Example JSON recording format
RECORDING_FORMAT.md # JSON schema documentationContinue to LLM Orchestration for a deeper look at how GPT-4 drives actions, Data & Storage to understand persistence, or Network Requests for a breakdown of every outbound call.