← Back to Skills Marketplace
jarvis563

Browser Audio Capture

by jarvis563 · GitHub ↗ · v1.1.0
cross-platform ⚠ suspicious
482
Downloads
0
Stars
1
Active Installs
3
Versions
Install in OpenClaw
/install browser-audio-capture
Description
Capture audio from any browser tab — meetings, YouTube, podcasts, courses, webinars — and stream to any AI agent. Zero API keys, works with any framework.
README (SKILL.md)

Browser Audio Capture

Give any AI agent ears for the browser. One Chrome extension captures audio from any tab — meetings, YouTube, podcasts, webinars, courses, earnings calls — and streams it to your AI pipeline.

Why Use This

Your AI agent can't hear anything happening in your browser. This skill fixes that. Capture audio from any Chrome tab and stream it to your agent — no API keys, no OAuth, no per-platform integrations.

Use cases: meeting summaries, YouTube/podcast notes, competitive intel from earnings calls, auto-notes from online courses, customer call analysis — anything that plays audio in a browser tab.

Works with any AI agent — Claude, ChatGPT, OpenClaw, LangChain, CrewAI, or your own. If your agent can run shell commands or receive HTTP, it gets browser audio.

Prerequisites

Chrome with remote debugging:

# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
  --remote-debugging-port=9222 --user-data-dir=$HOME/.chrome-debug-profile &

Python 3.9+ with aiohttp: pip install aiohttp

Quick Start

CLI (any agent that can exec)

# List tabs — meetings flagged with 🎙️
python3 -m browser_capture.cli tabs

# Auto-detect and capture meeting tab
python3 -m browser_capture.cli capture

# Continuous watch mode
python3 -m browser_capture.cli watch --interval 15

# Stop
python3 -m browser_capture.cli stop

Chrome Extension (one-click, persistent)

  1. chrome://extensions/ → Developer mode → Load unpacked → scripts/extension/
  2. Join a meeting → click Percept icon → Start Capturing
  3. Close popup — capture continues in background

Supported Platforms

Google Meet • Zoom (web) • Microsoft Teams • Webex • Whereby • Around • Cal.com • Riverside • StreamYard • Ping • Daily.co • Jitsi • Discord — plus any future platform that runs in a browser.

Audio Output

Streams to http://127.0.0.1:8900/audio/browser as JSON:

{
  "sessionId": "browser_1709234567890",
  "audio": "\x3Cbase64 PCM16>",
  "sampleRate": 16000,
  "format": "pcm16",
  "tabUrl": "https://meet.google.com/...",
  "tabTitle": "Weekly Standup"
}

Configure endpoint in scripts/extension/offscreen.js (PERCEPT_URL). Point it at Whisper, Deepgram, NVIDIA Riva, or any transcription service.

Troubleshooting

  • No tabs: Chrome needs --remote-debugging-port=9222
  • Button won't click: Remove + re-add extension (MV3 caches aggressively)
  • Audio not arriving: Check receiver on port 8900. Extension sends to /audio/browser
Usage Guidance
This package is internally consistent for capturing browser tab audio, but it records sensitive audio and sends it (by default) to whatever PERCEPT_URL is configured. Before installing or enabling: 1) Inspect and, if necessary, change PERCEPT_URL so it points to a trusted local receiver; 2) Only load the unpacked extension yourself (don’t accept someone else’s already-installed extension); 3) Be aware the extension uses tabCapture/offscreen and can continue recording after the popup closes — stop captures when finished; 4) Use a dedicated Chrome profile when running with --remote-debugging-port; 5) If you plan to let an AI agent invoke this skill autonomously, understand it could start/stop captures — restrict autonomous permissions or review invocation policies. If you want to be extra cautious, run a local network monitor to confirm traffic goes only to approved endpoints. If you need further assurance, provide the skill author/source or a verified homepage before trusting with sensitive meetings.
Capability Analysis
Type: OpenClaw Skill Name: browser-audio-capture Version: 1.1.0 The skill captures audio, URLs, and titles from browser tabs, streaming this sensitive data to a local endpoint (`http://127.0.0.1:8900/audio/browser`). While the stated purpose is benign (transcription) and data is sent to localhost, the implementation leverages powerful and high-risk browser capabilities. The Python CLI uses Chrome DevTools Protocol (`cdp_client.py`, `audio_capture.py`) to inject arbitrary JavaScript (`evaluate_js`) into browser tabs, requiring the user to enable `--remote-debugging-port`. The Chrome extension uses the `tabCapture` API (`manifest.json`, `offscreen.js`) to capture audio. These mechanisms, while necessary for the skill's function, represent significant attack surfaces if the agent or skill's inputs were compromised, making the skill 'suspicious' due to its inherent capabilities rather than explicit malicious intent.
Capability Assessment
Purpose & Capability
Name/description match the implementation: Python CDP client + JS injection and a Chrome MV3 extension that capture tab audio and POST base64 PCM chunks. Required artifacts (Chrome with CDP, extension permissions) are consistent with the stated purpose. There are no unrelated env vars, binaries, or weird install steps.
Instruction Scope
SKILL.md and the code instruct the agent/user to start Chrome with remote-debugging or load the provided extension and then capture audio from tabs. The injected JS and extension send audio plus tab metadata (URL/title) to PERCEPT_URL (default 127.0.0.1:8900). This is consistent with the purpose but is high-sensitivity behavior (continuous recording, metadata included). The instructions allow 'watch' mode and auto-detection of meeting tabs, which grants the skill broad discretion to start/stop captures automatically.
Install Mechanism
No install spec is provided (instruction-only). That keeps install risk low: user manually loads the extension and runs Python. The extension is unpacked developer-install and requires user action to add. The only third-party dependency called out is aiohttp (pip), which is proportional.
Credentials
The skill requires no environment variables or external credentials. Browser permissions in the extension (tabCapture, activeTab, offscreen) are appropriate for capturing tab audio, but they are powerful — the extension can keep capturing after the popup closes. The code posts captured audio and tab metadata to a configurable endpoint (PERCEPT_URL); by default this is localhost, but changing it would cause exfiltration of sensitive audio, so endpoint trust is critical.
Persistence & Privilege
always:false (normal). The MV3 offscreen document and service worker let the extension persist captures after the popup closes, and the CLI includes a watch mode for continuous operation. Autonomous model invocation is permitted by default on the platform — combined with the skill's ability to start captures, this means an agent could trigger recording if allowed; this is expected behaviour but raises privacy considerations.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install browser-audio-capture
  3. After installation, invoke the skill by name or use /browser-audio-capture
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
- Expanded audio capture support from just meetings to any browser tab, including YouTube, podcasts, webinars, and courses. - Updated description and documentation to highlight broader use cases beyond meetings. - Revised "Why Use This" and "Use cases" sections for clarity and to reflect new capabilities. - No code changes; documentation (SKILL.md) updated only.
v1.0.1
- Updated the description for broader applicability: now highlights compatibility with any AI agent and framework. - Refreshed documentation with a stronger focus on zero API keys, no OAuth, and universal browser meeting support. - Improved Quick Start section with concise CLI and Chrome extension instructions. - Extended clarity on supported meeting platforms. - Audio output documentation now suggests compatibility with any local or cloud transcription service. - General content streamlined for readability and easier integration into diverse workflows.
v1.0.0
Initial release of browser-audio-capture. - Capture audio from browser meeting tabs (Zoom, Google Meet, Teams, etc.) via Chrome CDP. - Stream audio (PCM16) locally for transcription—no API keys or cloud involved. - Includes CLI tools for tab management, meeting detection, capture, and automation. - Bundled Chrome extension enables fully automated, persistent capture. - Supports major meeting platforms and outputs audio via JSON POST to a configurable local endpoint.
Metadata
Slug browser-audio-capture
Version 1.1.0
License
All-time Installs 1
Active Installs 1
Total Versions 3
Frequently Asked Questions

What is Browser Audio Capture?

Capture audio from any browser tab — meetings, YouTube, podcasts, courses, webinars — and stream to any AI agent. Zero API keys, works with any framework. It is an AI Agent Skill for Claude Code / OpenClaw, with 482 downloads so far.

How do I install Browser Audio Capture?

Run "/install browser-audio-capture" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Browser Audio Capture free?

Yes, Browser Audio Capture is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Browser Audio Capture support?

Browser Audio Capture is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Browser Audio Capture?

It is built and maintained by jarvis563 (@jarvis563); the current version is v1.1.0.

💬 Comments