← Back to Skills Marketplace

Speech to Text (Yandex SpeechKit)

Name: Speech to Text (Yandex SpeechKit)
Author: bzsega

by Sergey Mikhaylov · GitHub ↗ · v1.1.8

cross-platform ✓ Security Clean

708

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install sergei-mikhailov-stt

Description

Speech recognition from voice messages using Yandex SpeechKit (with an extensible architecture for other providers). Use when you need to convert a voice mes...

Usage Guidance

This skill appears to be what it says: a local Python/FFmpeg-based STT plugin that calls Yandex SpeechKit and expects your Yandex API key and folder ID. Before installing, ensure you trust the skill source (the repo owner), run the included bash check.sh and setup.sh in a safe environment, and prefer injecting credentials into ~/.openclaw/openclaw.json (as documented) rather than leaving them in plaintext in the skill folder. If you want extra caution, inspect setup.sh before running and run the skill inside an isolated environment (dedicated user or container).

Capability Analysis

Type: OpenClaw Skill Name: sergei-mikhailov-stt Version: 1.1.8 The skill provides legitimate Speech-to-Text functionality using Yandex SpeechKit and follows OpenClaw best practices. It includes well-structured scripts for setup (setup.sh), diagnostics (check.sh), and audio processing (audio_processor.py) using FFmpeg. Security instructions in SKILL.md are defensive, explicitly directing the AI agent to protect API keys and avoid unauthorized file modifications. No evidence of data exfiltration, malicious command execution, or prompt injection attacks was found; all network activity is directed to official Yandex Cloud endpoints.

Capability Assessment

✓ Purpose & Capability

Name and description match the implementation: the code converts audio with ffmpeg and calls Yandex SpeechKit. Required binaries (ffmpeg, python3) and env vars (YANDEX_API_KEY, YANDEX_FOLDER_ID) are expected and justified for this purpose.

✓ Instruction Scope

SKILL.md and scripts limit activity to validating/converting local audio files, loading config, and calling the Yandex API. The skill reads config from the skill folder and (for diagnostics) ~/.openclaw/openclaw.json to find injected env entries — this is appropriate for setup/validation and is documented. There are no instructions to read unrelated system files, shell history, or to transmit data to arbitrary endpoints beyond Yandex SpeechKit.

ℹ Install Mechanism

No registry install spec was provided (instruction-only), but the package includes setup.sh which creates a local Python venv and installs dependencies from requirements.txt. This is normal for Python skills; there are no opaque remote downloads or extracted archives from untrusted URLs in the manifest.

✓ Credentials

Only YANDEX_API_KEY and YANDEX_FOLDER_ID (plus optional STT_* vars) are required. These are the expected credentials for calling Yandex SpeechKit. The SKILL.md and code avoid exposing keys and instruct users how to store them in OpenClaw config or a .env.

✓ Persistence & Privilege

The skill does not request always:true or other elevated persistent privileges. It creates/uses a local venv and config files in its own skill directory and does not modify other skills or system-wide settings without explicit user action.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install sergei-mikhailov-stt
After installation, invoke the skill by name or use /sergei-mikhailov-stt
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.1.8

Update outdated README instructions (setup.sh, clawhub update); add Security section to SKILL.md

v1.1.7

Fix: eliminate cd && chaining to avoid approval prompts; add API connectivity check to diagnostics; fix temp_dir to use absolute path

v1.1.6

Fix: generate .env inline when env.example is missing from ClawHub package; add check.sh diagnostic script

v1.1.5

Fix: generate .env inline when env.example is missing from package

v1.1.4

Fix: generate .env inline when env.example is missing from package

v1.1.3

Add check.sh diagnostic script for setup verification

v1.1.2

- Updated README.md with clarified documentation and usage steps. - Improved instructions for setting up environment and configuration. - No code or functional changes; this is a documentation update only.

v1.1.1

- Added a setup.sh script for one-command installation and setup (creates virtual environment, installs dependencies, and copies config examples). - Updated documentation for streamlined setup and first-run experience, including a new "Quick Start" section. - Clarified file size limits and emphasized the Yandex SpeechKit 1 MB request limit. - Minor corrections and formatting improvements in configuration and troubleshooting docs.

v1.1.0

- Added CLAUDE.md documentation file. - Updated assets/config.example.json and scripts/stt_processor.py with unspecified changes. - No user-facing feature changes detailed.

v1.0.4

- Broadened skill description from "Telegram voice messages" to "voice messages" for any messenger. - Updated metadata structure for improved compatibility. - Clarified that the skill works with all OpenClaw-connected messengers, not just Telegram. - No changes to core functionality or requirements.

v1.0.3

- No code or configuration changes in this release. - Documentation (SKILL.md) updated for clarity and troubleshooting. - Usage instructions, configuration steps, and error handling information revised.

v1.0.2

- Added detailed installation instructions, including skill installation and Python virtual environment setup. - Updated configuration guidance to prioritize using OpenClaw config (openclaw.json) for API keys. - Expanded error handling section with user-friendly error messages and actionable next steps. - Improved troubleshooting section for owners, specifying log checks and service account roles. - Clarified usage of the `.env` file as an alternative configuration method.

v1.0.1

- Updated SKILL.md metadata section to YAML format for improved compatibility. - No functional changes to skill logic or usage. Documentation only.

v1.0.0

sergei-mikhailov-stt version 1.0.0 - Initial release providing speech-to-text conversion for Telegram voice messages. - Supports audio file validation and processing (OGG, WAV, MP3) using ffmpeg. - Integrates Yandex SpeechKit as the default STT provider, with the option to add more providers. - Handles file size, format verification, error reporting, and usage of environment-based API credentials. - Provides structured results including recognized text, language, confidence, provider, and processing time information.

Metadata

Slug sergei-mikhailov-stt

Version 1.1.8

License —

All-time Installs 0

Active Installs 0

Total Versions 14

Frequently Asked Questions

What is Speech to Text (Yandex SpeechKit)?

Speech recognition from voice messages using Yandex SpeechKit (with an extensible architecture for other providers). Use when you need to convert a voice mes... It is an AI Agent Skill for Claude Code / OpenClaw, with 708 downloads so far.

How do I install Speech to Text (Yandex SpeechKit)?

Run "/install sergei-mikhailov-stt" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Speech to Text (Yandex SpeechKit) free?

Yes, Speech to Text (Yandex SpeechKit) is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Speech to Text (Yandex SpeechKit) support?

Speech to Text (Yandex SpeechKit) is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Speech to Text (Yandex SpeechKit)?

It is built and maintained by Sergey Mikhaylov (@bzsega); the current version is v1.1.8.

More Skills