← Back to Skills Marketplace
augentdevs

Augent

by AugentDevs · GitHub ↗ · v1.5.2 · MIT-0
darwinlinuxwin32 ✓ Security Clean
178
Downloads
0
Stars
0
Active Installs
8
Versions
Install in OpenClaw
/install augent
Description
The audio & video layer for agents. 22 local MCP tools. No cloud, no API keys.
README (SKILL.md)

Augent — Audio & Video Intelligence for AI Agents

Augent is an MCP server that gives your agent 22 tools for audio and video intelligence. Download from 1000+ sites via yt-dlp and aria2c, transcribe in 99 languages via faster-whisper, search by keyword or meaning via sentence-transformers, take notes, identify speakers via pyannote-audio, detect chapters, separate audio via Demucs v4, export clips, extract visual frames, record X/Twitter Spaces (requires user-configured auth token in ~/.augent/auth.json), and generate speech via Kokoro TTS. All processing runs locally. Downloads are saved to ~/Downloads/, notes and clips to ~/Desktop/, transcription memory to ~/.augent/memory/.

Config

{
  "mcpServers": {
    "augent": {
      "command": "augent-mcp"
    }
  }
}

If augent-mcp is not in PATH, use python3 -m augent.mcp as the command instead.

Install

Install via the ClawHub install button above, or use uv tool install augent for the base package or uv tool install "augent[all]" for all features. FFmpeg is required for audio processing.

Tools

Augent exposes 22 MCP tools:

Core

Tool Description
download_audio Download audio from video URLs at maximum speed. Supports YouTube, Vimeo, TikTok, Twitter/X, SoundCloud, and 1000+ sites. Uses aria2c multi-connection + concurrent fragments.
transcribe_audio Full transcription of any audio file with per-segment timestamps. Returns text, language, duration, and segments. Cached by file hash.
search_audio Search audio for keywords. Returns timestamped matches with context snippets. Supports clip export.
deep_search Semantic search — find moments by meaning, not just keywords. Uses sentence-transformers embeddings.
search_memory Search across ALL stored transcriptions in one query. Keyword or semantic mode.
take_notes All-in-one: download audio from URL, transcribe, and save formatted notes. Supports 5 styles: tldr, notes, highlight, eye-candy, quiz.
clip_export Export a video clip from any URL for a specific time range. Downloads only the requested segment.

Analysis

Tool Description
chapters Auto-detect topic chapters with timestamps using embedding similarity.
search_proximity Find where two keywords appear near each other (e.g., "startup" within 30 words of "funding").
identify_speakers Speaker diarization — identify who speaks when. No API keys required.
separate_audio Isolate vocals from music/noise using Meta's Demucs v4. Feed clean vocals into transcription.
batch_search Search multiple audio files in parallel. Ideal for podcast libraries or interview collections.

Utilities

Tool Description
text_to_speech Convert text to natural speech using Kokoro TTS. 54 voices, 9 languages. Runs in background.
list_files List media files in a directory with size info.
list_memories Browse all stored transcriptions by title, duration, and date.
memory_stats View memory statistics (file count, total duration).
clear_memory Clear the transcription memory to free disk space.
tag Add, remove, or list tags on transcriptions. Broad topic categories for organizing memories.
highlights Export the best moments from a transcription. Auto mode picks top moments; focused mode finds moments matching a topic.
visual Extract visual context from video at moments that matter. Query, auto, manual, and assist modes. Frames saved to Obsidian vault.
rebuild_graph Rebuild Obsidian graph view data for all transcriptions. Migrates files, computes wikilinks, generates MOC hubs.
spaces Download or live-record X/Twitter Spaces. Start, check status, or stop recordings.

Usage Examples

Take notes from a video

"Take notes from https://youtube.com/watch?v=xxx"

The agent calls take_notes which downloads, transcribes, and returns formatted notes. One tool call does everything.

Search a podcast for topics

"Search this podcast for every mention of AI regulation" — provide the file path or URL.

The agent uses search_audio for exact keyword matches, or deep_search for semantic matches (finds relevant discussion even without exact words).

Transcribe and identify speakers

"Transcribe this meeting recording and tell me who said what"

The agent calls transcribe_audio then identify_speakers to label each segment by speaker.

Search across all transcriptions

"Search everything I've ever transcribed for mentions of funding"

The agent uses search_memory to search across all stored transcriptions without needing a file path.

Export a clip

"Clip the part where they talk about pricing"

The agent uses search_audio or deep_search to find the moment, then clip_export to extract just that segment.

Separate vocals from noisy audio

"This recording has music in the background, clean it up and transcribe"

The agent calls separate_audio to isolate vocals, then transcribe_audio on the clean vocals track.

Generate speech from text

"Read these notes aloud"

The agent calls text_to_speech to generate an MP3 with natural speech. Supports multiple voices and languages.

Note Styles

When using take_notes, the style parameter controls formatting:

Style Description
tldr Shortest possible summary. One screen. Bold key terms.
notes Clean sections with nested bullets (default).
highlight Notes with callout blocks for key insights and blockquotes with timestamps.
eye-candy Maximum visual formatting — callouts, tables, checklists, blockquotes.
quiz Multiple-choice questions with answer key.

Model Sizes

tiny is the default and handles nearly everything. Only use larger models for heavy accents, poor audio quality, or maximum accuracy needs.

Model Speed Accuracy
tiny Fastest Excellent (default)
base Fast Excellent
small Medium Superior
medium Slow Outstanding
large Slowest Maximum

File Paths

Augent reads and writes to these locations on your machine:

Path Purpose
~/Downloads/ Default directory for downloaded audio files
~/Desktop/ Default directory for notes, clips, and TTS output
~/.augent/memory/transcriptions.db SQLite database for persistent transcription memory
~/.augent/memory/transcriptions/ Markdown files for each stored transcription
~/.augent/config.yaml User configuration (optional)
~/.augent/auth.json Twitter/X authentication cookies for Spaces recording (optional, user-created)

If Obsidian is installed, visual frames are saved to the Obsidian vault's External Files/visual/ directory. The vault path is auto-detected from Obsidian's config.

Network Access

Network access is used for two purposes only:

  1. Downloading media from user-provided URLs via yt-dlp and aria2c
  2. Downloading ML models on first use (Whisper, sentence-transformers, pyannote, Demucs, Kokoro) from Hugging Face

No telemetry. No background network activity. No data is uploaded.

ML Dependencies

The augent[all] install includes these local ML components:

Component Purpose Size
faster-whisper Speech-to-text transcription ~75MB (tiny model)
sentence-transformers Semantic search, auto-tagging, chapter detection ~90MB
pyannote-audio Speaker diarization ~29MB
Demucs v4 Audio source separation (vocals from noise) ~80MB
Kokoro Text-to-speech (54 voices, 9 languages) ~200MB

All models run locally. None require API keys or cloud services.

Requirements

  • Python 3.10+
  • FFmpeg (audio processing)
  • yt-dlp + aria2c (for audio downloads)

Links

Usage Guidance
This skill appears internally consistent and behaves like a local media processing tool. Before installing: (1) verify the package source (check the GitHub repo and release/tarball on PyPI) and the package version, (2) ensure you are comfortable with the tool writing data to ~/Downloads, ~/Desktop, and ~/.augent/memory (transcriptions can be sensitive), (3) if you use Spaces recording, the optional token file (~/.augent/auth.json) contains sensitive credentials—only provide it if you trust the package, and (4) confirm you want local tools installed via pip/uv and that required binaries (ffmpeg, yt-dlp, aria2c) will be installed from trusted package sources.
Capability Analysis
Type: OpenClaw Skill Name: augent Version: 1.5.2 The 'augent' skill bundle provides a comprehensive suite of 22 local tools for audio and video processing, including transcription, semantic search, and text-to-speech. While the tools require network access for downloading media (via yt-dlp/aria2c) and fetching ML models from Hugging Face, the documentation explicitly states that all processing is local and no data is exfiltrated. The use of a Twitter/X auth token is limited to a specific, user-configured feature for recording Spaces, and the file system access (Downloads, Desktop, .augent) is consistent with the stated functionality of a media processing agent.
Capability Assessment
Purpose & Capability
Name/description describe a local audio/video toolset. Required binaries (ffmpeg, yt-dlp, aria2c, and the augent MCP binary) are exactly what an audio/video downloader/transcriber suite would need. Optional Twitter/X auth for Spaces recording is consistent with the 'spaces' feature.
Instruction Scope
SKILL.md instructs the agent to run a local MCP server (augent-mcp) and to read/write media and transcription files in user directories (~/Downloads, ~/Desktop, ~/.augent/memory). This is within the skill's scope, but it does mean the skill will create and persist files on disk and (optionally) read an auth token file at ~/.augent/auth.json. SKILL.md also references saving frames to an Obsidian vault but does not declare an explicit vault path env var; that is a minor documentation/clarity issue rather than a security mismatch.
Install Mechanism
Install metadata lists 'uv' (ClawHub/uv tool) and SKILL.md also documents a pip install alternative (augent[all]). Both are common package install methods. This is reasonable, but pip installs pull packages from PyPI so users should verify the package source (GitHub repo) and version before installing.
Credentials
No required environment variables or credentials; the only optional credential is AUGENT_AUTH_TOKEN (path to an X/Twitter auth file) which is justified for the 'spaces' recording feature. The skill will persist transcriptions to disk (AUGENT_MEMORY_DIR) which can contain sensitive content — that persistence is proportional to the skill but worth noting.
Persistence & Privilege
The skill does not request 'always: true' and allows user invocation. It will create persistent files under user directories (~/Downloads, ~/Desktop, ~/.augent/memory) and can modify an Obsidian vault if configured — normal for this kind of tool, but users should be aware of local file writes and retained transcriptions.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install augent
  3. After installation, invoke the skill by name or use /augent
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.5.2
Force fresh VT scan resubmission.
v1.5.1
Declare auth token and file paths in metadata and skill.json config.
v1.5.0
Declare auth token and file paths in metadata and skill.json config.
v1.4.0
Declare all file paths, network access, ML dependencies, and Spaces auth for HIGH confidence scan.
v1.3.0
Add skill.json and README.md. Remove all bash code blocks from SKILL.md.
v1.2.0
Remove curl|bash entirely. Shorter description. uv and pip only.
v1.1.0
Fix security scan flags. 22 MCP tools (added Spaces). Declared network/filesystem scope. uv/pip as recommended install.
v1.0.0
Audio intelligence for agents. 21 MCP tools: transcribe, search, notes, highlights, speaker ID, visual context, TTS, and more.
Metadata
Slug augent
Version 1.5.2
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 8
Frequently Asked Questions

What is Augent?

The audio & video layer for agents. 22 local MCP tools. No cloud, no API keys. It is an AI Agent Skill for Claude Code / OpenClaw, with 178 downloads so far.

How do I install Augent?

Run "/install augent" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Augent free?

Yes, Augent is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Augent support?

Augent is cross-platform and runs anywhere OpenClaw / Claude Code is available (darwin, linux, win32).

Who created Augent?

It is built and maintained by AugentDevs (@augentdevs); the current version is v1.5.2.

💬 Comments