← 返回 Skills 市场

claw-text-and-pics

Name: claw-text-and-pics
Author: photon78

作者 photon78 · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install claw-text-and-pics

功能描述

Extract text and embedded images from scanned documents, PDFs, and photos via Mistral OCR API. Use when reading receipts, invoices, contracts, handwritten no...

使用说明 (SKILL.md)

claw-text-and-pics

Extract text and images from documents via Mistral OCR

Give your OpenClaw agent the ability to read scanned documents, PDFs, and images — extracting clean Markdown text and cropping out embedded images. Powered by Mistral's OCR API.

When to use

Extract text from scanned documents, invoices, receipts, contracts
Pull embedded images from PDFs or scans
Convert handwritten notes or photos to searchable text
Send extracted images directly to Telegram

Usage

# Extract text only
python3 ocr.py --input scan.jpg

# Extract text from PDF (3 pages)
python3 ocr.py --input document.pdf --pages 3

# Extract embedded images
python3 ocr.py --input scan.jpg --extract-images --output-dir ./images/

# Extract images and send to Telegram
python3 ocr.py --input scan.jpg --extract-images --send --target 123456789

# Works with URLs too
python3 ocr.py --input https://example.com/document.pdf

Output

stdout: Extracted text as Markdown
Files: Cropped images saved to --output-dir (only with --extract-images)

Configuration

Set in ~/.openclaw/.env or as environment variables:

Variable	Required	Description
`MISTRAL_API_KEY`	Yes	Your Mistral API key
`TELEGRAM_BOT_TOKEN`	Only for `--send`	Your Telegram bot token
`TELEGRAM_CHAT_ID`	Optional	Default chat ID (overridable with `--target`)

Environment Variables

MISTRAL_API_KEY=required        # Mistral API key — get one at console.mistral.ai
TELEGRAM_BOT_TOKEN=optional     # Required only when using --send
TELEGRAM_CHAT_ID=optional       # Default target chat ID (overridable with --target)

This skill reads ~/.openclaw/.env as a fallback for credentials. Ensure the file has restricted permissions: chmod 600 ~/.openclaw/.env

Requirements

Python 3.11+
Mistral API key (console.mistral.ai)
Optional (only for --extract-images): pip install pillow

Parameters

Parameter	Required	Description
`--input`	Yes	Local path or URL to image/PDF
`--extract-images`	No	Crop and save embedded images
`--output-dir`	No	Output directory (default: `./extracted-images`)
`--send`	No	Send extracted images via Telegram
`--target`	No	Telegram chat ID (or `TELEGRAM_CHAT_ID` env var)
`--pages`	No	Number of PDF pages to process
`--debug`	No	Print raw API response

安全使用建议

This skill appears to do what it says (send image/PDF content to Mistral OCR and optionally post cropped images to Telegram), but note these points before installing: - The registry metadata omitted required credentials, but the skill actually requires MISTRAL_API_KEY (and TELEGRAM_BOT_TOKEN only if you use --send). Provide the Mistral key via environment variables; otherwise the script exits. - SKILL.md says it reads ~/.openclaw/.env as a fallback, but the included Python script does not load that file — it reads only environment variables. If you rely on a .env file, ensure your environment loader populates os.environ or modify the script. - Using this skill sends document data to Mistral's API. Do not run it on highly sensitive documents unless you trust the Mistral service and your API key policy. Consider processing sensitive files in an isolated environment or checking your Mistral account data-retention policy. - If you use --send, the skill will upload images to Telegram using the provided bot token and chat ID. Ensure your TELEGRAM_BOT_TOKEN is limited to the bot you expect and keep it secret. - The repository imports subprocess but does not use it; no arbitrary shell execution is performed by the script. Still, review network endpoints (api.mistral.ai and api.telegram.org) and confirm you are comfortable with external network calls. If you want to proceed: set MISTRAL_API_KEY in the agent environment, audit that environment for other secrets, and run the script in an environment where accidental exfiltration risk is controlled. If you need stronger assurance, request the publisher correct the registry metadata and/or add explicit code to load ~/.openclaw/.env (or remove the misleading note).

功能分析

Type: OpenClaw Skill Name: claw-text-and-pics Version: 1.0.1 The skill bundle provides legitimate OCR functionality by integrating with the Mistral API and offering optional image extraction and Telegram delivery. The `ocr.py` script uses standard Python libraries (`urllib`, `base64`, `pathlib`) and the Pillow library to process documents and communicate with official endpoints (`api.mistral.ai` and `api.telegram.org`). The instructions in `SKILL.md` and `README.md` are consistent with the code's behavior, and no evidence of malicious intent, unauthorized data exfiltration, or prompt injection was found.

能力标签

requires-sensitive-credentials

能力评估

⚠ Purpose & Capability

The code and SKILL.md implement a Mistral OCR client that needs a MISTRAL_API_KEY and optionally TELEGRAM_BOT_TOKEN / TELEGRAM_CHAT_ID for sending images. However, the registry metadata at the top claims "Required env vars: none" and "Primary credential: none", which is incorrect. The required environment variables (MISTRAL_API_KEY) are proportionate to the stated purpose, but the registry listing failing to declare them is an inconsistency that could mislead users.

ℹ Instruction Scope

SKILL.md instructs the agent to read ~/.openclaw/.env as a fallback for credentials, but the included ocr.py only reads environment variables via os.environ and does not implement loading that file. Aside from that mismatch, the runtime behavior described (send document to Mistral, print Markdown, optionally crop images locally with Pillow, optionally send images to Telegram) matches the code. The skill transmits document data to api.mistral.ai (expected) and to api.telegram.org only when --send is used (also expected).

✓ Install Mechanism

No install spec / external downloads are present; the skill is instruction+Python code only. Optional dependency is Pillow (pip). Nothing is downloaded from arbitrary URLs and no installers create unexpected binaries, so install risk is low.

ℹ Credentials

The code requires MISTRAL_API_KEY (sensitive) and optionally TELEGRAM_BOT_TOKEN / TELEGRAM_CHAT_ID for sending images. Those credentials are proportional to the functionality. The concern is the registry metadata omitted declaring the required env var(s), which may cause users to miss that they must provide a sensitive API key. The SKILL.md does document the env vars correctly; code enforces MISTRAL_API_KEY at runtime.

✓ Persistence & Privilege

The skill does not request permanent/global presence (always:false) and it does not modify other skills or system-wide settings. Autonomous invocation is allowed by default but is not combined with other high-privilege requests, so no additional persistence concerns are present.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install claw-text-and-pics
安装完成后，直接呼叫该 Skill 的名称或使用 /claw-text-and-pics 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

- Removed "config_file" and "version" fields from skill metadata. - No functional changes; documentation and usage remain the same.

v1.0.0

Initial release of claw-text-and-pics - Extracts text and embedded images from scanned documents, PDFs, and photos using the Mistral OCR API. - Supports both local files and URLs as input. - Outputs clean Markdown text and saves cropped images. - Optional: Sends extracted images directly to Telegram. - Requires Python 3.11+, Mistral API key, and optionally Pillow for image extraction. - Environment variables and config file supported for credentials.

元数据

Slug claw-text-and-pics

版本 1.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题