← Back to Skills Marketplace

claw-text-and-pics

Name: claw-text-and-pics
Author: photon78

by photon78 · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install claw-text-and-pics

Description

Extract text and embedded images from scanned documents, PDFs, and photos via Mistral OCR API. Use when reading receipts, invoices, contracts, handwritten no...

README (SKILL.md)

claw-text-and-pics

Extract text and images from documents via Mistral OCR

Give your OpenClaw agent the ability to read scanned documents, PDFs, and images — extracting clean Markdown text and cropping out embedded images. Powered by Mistral's OCR API.

When to use

Extract text from scanned documents, invoices, receipts, contracts
Pull embedded images from PDFs or scans
Convert handwritten notes or photos to searchable text
Send extracted images directly to Telegram

Usage

# Extract text only
python3 ocr.py --input scan.jpg

# Extract text from PDF (3 pages)
python3 ocr.py --input document.pdf --pages 3

# Extract embedded images
python3 ocr.py --input scan.jpg --extract-images --output-dir ./images/

# Extract images and send to Telegram
python3 ocr.py --input scan.jpg --extract-images --send --target 123456789

# Works with URLs too
python3 ocr.py --input https://example.com/document.pdf

Output

stdout: Extracted text as Markdown
Files: Cropped images saved to --output-dir (only with --extract-images)

Configuration

Set in ~/.openclaw/.env or as environment variables:

Variable	Required	Description
`MISTRAL_API_KEY`	Yes	Your Mistral API key
`TELEGRAM_BOT_TOKEN`	Only for `--send`	Your Telegram bot token
`TELEGRAM_CHAT_ID`	Optional	Default chat ID (overridable with `--target`)

Environment Variables

MISTRAL_API_KEY=required        # Mistral API key — get one at console.mistral.ai
TELEGRAM_BOT_TOKEN=optional     # Required only when using --send
TELEGRAM_CHAT_ID=optional       # Default target chat ID (overridable with --target)

This skill reads ~/.openclaw/.env as a fallback for credentials. Ensure the file has restricted permissions: chmod 600 ~/.openclaw/.env

Requirements

Python 3.11+
Mistral API key (console.mistral.ai)
Optional (only for --extract-images): pip install pillow

Parameters

Parameter	Required	Description
`--input`	Yes	Local path or URL to image/PDF
`--extract-images`	No	Crop and save embedded images
`--output-dir`	No	Output directory (default: `./extracted-images`)
`--send`	No	Send extracted images via Telegram
`--target`	No	Telegram chat ID (or `TELEGRAM_CHAT_ID` env var)
`--pages`	No	Number of PDF pages to process
`--debug`	No	Print raw API response

Usage Guidance

This skill appears to do what it says (send image/PDF content to Mistral OCR and optionally post cropped images to Telegram), but note these points before installing: - The registry metadata omitted required credentials, but the skill actually requires MISTRAL_API_KEY (and TELEGRAM_BOT_TOKEN only if you use --send). Provide the Mistral key via environment variables; otherwise the script exits. - SKILL.md says it reads ~/.openclaw/.env as a fallback, but the included Python script does not load that file — it reads only environment variables. If you rely on a .env file, ensure your environment loader populates os.environ or modify the script. - Using this skill sends document data to Mistral's API. Do not run it on highly sensitive documents unless you trust the Mistral service and your API key policy. Consider processing sensitive files in an isolated environment or checking your Mistral account data-retention policy. - If you use --send, the skill will upload images to Telegram using the provided bot token and chat ID. Ensure your TELEGRAM_BOT_TOKEN is limited to the bot you expect and keep it secret. - The repository imports subprocess but does not use it; no arbitrary shell execution is performed by the script. Still, review network endpoints (api.mistral.ai and api.telegram.org) and confirm you are comfortable with external network calls. If you want to proceed: set MISTRAL_API_KEY in the agent environment, audit that environment for other secrets, and run the script in an environment where accidental exfiltration risk is controlled. If you need stronger assurance, request the publisher correct the registry metadata and/or add explicit code to load ~/.openclaw/.env (or remove the misleading note).

Capability Analysis

Type: OpenClaw Skill Name: claw-text-and-pics Version: 1.0.1 The skill bundle provides legitimate OCR functionality by integrating with the Mistral API and offering optional image extraction and Telegram delivery. The `ocr.py` script uses standard Python libraries (`urllib`, `base64`, `pathlib`) and the Pillow library to process documents and communicate with official endpoints (`api.mistral.ai` and `api.telegram.org`). The instructions in `SKILL.md` and `README.md` are consistent with the code's behavior, and no evidence of malicious intent, unauthorized data exfiltration, or prompt injection was found.

Capability Tags

requires-sensitive-credentials

Capability Assessment

⚠ Purpose & Capability

The code and SKILL.md implement a Mistral OCR client that needs a MISTRAL_API_KEY and optionally TELEGRAM_BOT_TOKEN / TELEGRAM_CHAT_ID for sending images. However, the registry metadata at the top claims "Required env vars: none" and "Primary credential: none", which is incorrect. The required environment variables (MISTRAL_API_KEY) are proportionate to the stated purpose, but the registry listing failing to declare them is an inconsistency that could mislead users.

ℹ Instruction Scope

SKILL.md instructs the agent to read ~/.openclaw/.env as a fallback for credentials, but the included ocr.py only reads environment variables via os.environ and does not implement loading that file. Aside from that mismatch, the runtime behavior described (send document to Mistral, print Markdown, optionally crop images locally with Pillow, optionally send images to Telegram) matches the code. The skill transmits document data to api.mistral.ai (expected) and to api.telegram.org only when --send is used (also expected).

✓ Install Mechanism

No install spec / external downloads are present; the skill is instruction+Python code only. Optional dependency is Pillow (pip). Nothing is downloaded from arbitrary URLs and no installers create unexpected binaries, so install risk is low.

ℹ Credentials

The code requires MISTRAL_API_KEY (sensitive) and optionally TELEGRAM_BOT_TOKEN / TELEGRAM_CHAT_ID for sending images. Those credentials are proportional to the functionality. The concern is the registry metadata omitted declaring the required env var(s), which may cause users to miss that they must provide a sensitive API key. The SKILL.md does document the env vars correctly; code enforces MISTRAL_API_KEY at runtime.

✓ Persistence & Privilege

The skill does not request permanent/global presence (always:false) and it does not modify other skills or system-wide settings. Autonomous invocation is allowed by default but is not combined with other high-privilege requests, so no additional persistence concerns are present.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install claw-text-and-pics
After installation, invoke the skill by name or use /claw-text-and-pics
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.1

- Removed "config_file" and "version" fields from skill metadata. - No functional changes; documentation and usage remain the same.

v1.0.0

Initial release of claw-text-and-pics - Extracts text and embedded images from scanned documents, PDFs, and photos using the Mistral OCR API. - Supports both local files and URLs as input. - Outputs clean Markdown text and saves cropped images. - Optional: Sends extracted images directly to Telegram. - Requires Python 3.11+, Mistral API key, and optionally Pillow for image extraction. - Environment variables and config file supported for credentials.

Metadata

Slug claw-text-and-pics

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is claw-text-and-pics?

Extract text and embedded images from scanned documents, PDFs, and photos via Mistral OCR API. Use when reading receipts, invoices, contracts, handwritten no... It is an AI Agent Skill for Claude Code / OpenClaw, with 97 downloads so far.

How do I install claw-text-and-pics?

Run "/install claw-text-and-pics" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is claw-text-and-pics free?

Yes, claw-text-and-pics is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does claw-text-and-pics support?

claw-text-and-pics is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created claw-text-and-pics?

It is built and maintained by photon78 (@photon78); the current version is v1.0.1.

More Skills

claw-text-and-pics

claw-text-and-pics

When to use

Usage

Output

Configuration

Environment Variables

Requirements

Parameters

What is claw-text-and-pics?

How do I install claw-text-and-pics?

Is claw-text-and-pics free?

Which platforms does claw-text-and-pics support?

Who created claw-text-and-pics?

💬 Comments