← 返回 Skills 市场

Browser Audio Capture

Name: Browser Audio Capture
Author: jarvis563

作者 jarvis563 · GitHub ↗ · v1.1.0

cross-platform ⚠ suspicious

482

总下载

当前安装

版本数

在 OpenClaw 中安装

/install browser-audio-capture

功能描述

Capture audio from any browser tab — meetings, YouTube, podcasts, courses, webinars — and stream to any AI agent. Zero API keys, works with any framework.

使用说明 (SKILL.md)

Browser Audio Capture

Give any AI agent ears for the browser. One Chrome extension captures audio from any tab — meetings, YouTube, podcasts, webinars, courses, earnings calls — and streams it to your AI pipeline.

Why Use This

Your AI agent can't hear anything happening in your browser. This skill fixes that. Capture audio from any Chrome tab and stream it to your agent — no API keys, no OAuth, no per-platform integrations.

Use cases: meeting summaries, YouTube/podcast notes, competitive intel from earnings calls, auto-notes from online courses, customer call analysis — anything that plays audio in a browser tab.

Works with any AI agent — Claude, ChatGPT, OpenClaw, LangChain, CrewAI, or your own. If your agent can run shell commands or receive HTTP, it gets browser audio.

Prerequisites

Chrome with remote debugging:

# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
  --remote-debugging-port=9222 --user-data-dir=$HOME/.chrome-debug-profile &

Python 3.9+ with aiohttp: pip install aiohttp

Quick Start

CLI (any agent that can exec)

# List tabs — meetings flagged with 🎙️
python3 -m browser_capture.cli tabs

# Auto-detect and capture meeting tab
python3 -m browser_capture.cli capture

# Continuous watch mode
python3 -m browser_capture.cli watch --interval 15

# Stop
python3 -m browser_capture.cli stop

Chrome Extension (one-click, persistent)

chrome://extensions/ → Developer mode → Load unpacked → scripts/extension/
Join a meeting → click Percept icon → Start Capturing
Close popup — capture continues in background

Supported Platforms

Google Meet • Zoom (web) • Microsoft Teams • Webex • Whereby • Around • Cal.com • Riverside • StreamYard • Ping • Daily.co • Jitsi • Discord — plus any future platform that runs in a browser.

Audio Output

Streams to http://127.0.0.1:8900/audio/browser as JSON:

{
  "sessionId": "browser_1709234567890",
  "audio": "\x3Cbase64 PCM16>",
  "sampleRate": 16000,
  "format": "pcm16",
  "tabUrl": "https://meet.google.com/...",
  "tabTitle": "Weekly Standup"
}

Configure endpoint in scripts/extension/offscreen.js (PERCEPT_URL). Point it at Whisper, Deepgram, NVIDIA Riva, or any transcription service.

Troubleshooting

No tabs: Chrome needs --remote-debugging-port=9222
Button won't click: Remove + re-add extension (MV3 caches aggressively)
Audio not arriving: Check receiver on port 8900. Extension sends to /audio/browser

安全使用建议

This package is internally consistent for capturing browser tab audio, but it records sensitive audio and sends it (by default) to whatever PERCEPT_URL is configured. Before installing or enabling: 1) Inspect and, if necessary, change PERCEPT_URL so it points to a trusted local receiver; 2) Only load the unpacked extension yourself (don’t accept someone else’s already-installed extension); 3) Be aware the extension uses tabCapture/offscreen and can continue recording after the popup closes — stop captures when finished; 4) Use a dedicated Chrome profile when running with --remote-debugging-port; 5) If you plan to let an AI agent invoke this skill autonomously, understand it could start/stop captures — restrict autonomous permissions or review invocation policies. If you want to be extra cautious, run a local network monitor to confirm traffic goes only to approved endpoints. If you need further assurance, provide the skill author/source or a verified homepage before trusting with sensitive meetings.

功能分析

Type: OpenClaw Skill Name: browser-audio-capture Version: 1.1.0 The skill captures audio, URLs, and titles from browser tabs, streaming this sensitive data to a local endpoint (`http://127.0.0.1:8900/audio/browser`). While the stated purpose is benign (transcription) and data is sent to localhost, the implementation leverages powerful and high-risk browser capabilities. The Python CLI uses Chrome DevTools Protocol (`cdp_client.py`, `audio_capture.py`) to inject arbitrary JavaScript (`evaluate_js`) into browser tabs, requiring the user to enable `--remote-debugging-port`. The Chrome extension uses the `tabCapture` API (`manifest.json`, `offscreen.js`) to capture audio. These mechanisms, while necessary for the skill's function, represent significant attack surfaces if the agent or skill's inputs were compromised, making the skill 'suspicious' due to its inherent capabilities rather than explicit malicious intent.

能力评估

✓ Purpose & Capability

Name/description match the implementation: Python CDP client + JS injection and a Chrome MV3 extension that capture tab audio and POST base64 PCM chunks. Required artifacts (Chrome with CDP, extension permissions) are consistent with the stated purpose. There are no unrelated env vars, binaries, or weird install steps.

ℹ Instruction Scope

SKILL.md and the code instruct the agent/user to start Chrome with remote-debugging or load the provided extension and then capture audio from tabs. The injected JS and extension send audio plus tab metadata (URL/title) to PERCEPT_URL (default 127.0.0.1:8900). This is consistent with the purpose but is high-sensitivity behavior (continuous recording, metadata included). The instructions allow 'watch' mode and auto-detection of meeting tabs, which grants the skill broad discretion to start/stop captures automatically.

✓ Install Mechanism

No install spec is provided (instruction-only). That keeps install risk low: user manually loads the extension and runs Python. The extension is unpacked developer-install and requires user action to add. The only third-party dependency called out is aiohttp (pip), which is proportional.

✓ Credentials

The skill requires no environment variables or external credentials. Browser permissions in the extension (tabCapture, activeTab, offscreen) are appropriate for capturing tab audio, but they are powerful — the extension can keep capturing after the popup closes. The code posts captured audio and tab metadata to a configurable endpoint (PERCEPT_URL); by default this is localhost, but changing it would cause exfiltration of sensitive audio, so endpoint trust is critical.

ℹ Persistence & Privilege

always:false (normal). The MV3 offscreen document and service worker let the extension persist captures after the popup closes, and the CLI includes a watch mode for continuous operation. Autonomous model invocation is permitted by default on the platform — combined with the skill's ability to start captures, this means an agent could trigger recording if allowed; this is expected behaviour but raises privacy considerations.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install browser-audio-capture
安装完成后，直接呼叫该 Skill 的名称或使用 /browser-audio-capture 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.1.0

- Expanded audio capture support from just meetings to any browser tab, including YouTube, podcasts, webinars, and courses. - Updated description and documentation to highlight broader use cases beyond meetings. - Revised "Why Use This" and "Use cases" sections for clarity and to reflect new capabilities. - No code changes; documentation (SKILL.md) updated only.

v1.0.1

- Updated the description for broader applicability: now highlights compatibility with any AI agent and framework. - Refreshed documentation with a stronger focus on zero API keys, no OAuth, and universal browser meeting support. - Improved Quick Start section with concise CLI and Chrome extension instructions. - Extended clarity on supported meeting platforms. - Audio output documentation now suggests compatibility with any local or cloud transcription service. - General content streamlined for readability and easier integration into diverse workflows.

v1.0.0

Initial release of browser-audio-capture. - Capture audio from browser meeting tabs (Zoom, Google Meet, Teams, etc.) via Chrome CDP. - Stream audio (PCM16) locally for transcription—no API keys or cloud involved. - Includes CLI tools for tab management, meeting detection, capture, and automation. - Bundled Chrome extension enables fully automated, persistent capture. - Supports major meeting platforms and outputs audio via JSON POST to a configurable local endpoint.

元数据

Slug browser-audio-capture

版本 1.1.0

许可证 —

累计安装 1

当前安装数 1

历史版本数 3

常见问题