← Back to Skills Marketplace
photon78

claw-text-and-pics

by photon78 · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
97
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install claw-text-and-pics
Description
Extract text and embedded images from scanned documents, PDFs, and photos via Mistral OCR API. Use when reading receipts, invoices, contracts, handwritten no...
README (SKILL.md)

claw-text-and-pics

Extract text and images from documents via Mistral OCR

Give your OpenClaw agent the ability to read scanned documents, PDFs, and images — extracting clean Markdown text and cropping out embedded images. Powered by Mistral's OCR API.

When to use

  • Extract text from scanned documents, invoices, receipts, contracts
  • Pull embedded images from PDFs or scans
  • Convert handwritten notes or photos to searchable text
  • Send extracted images directly to Telegram

Usage

# Extract text only
python3 ocr.py --input scan.jpg

# Extract text from PDF (3 pages)
python3 ocr.py --input document.pdf --pages 3

# Extract embedded images
python3 ocr.py --input scan.jpg --extract-images --output-dir ./images/

# Extract images and send to Telegram
python3 ocr.py --input scan.jpg --extract-images --send --target 123456789

# Works with URLs too
python3 ocr.py --input https://example.com/document.pdf

Output

  • stdout: Extracted text as Markdown
  • Files: Cropped images saved to --output-dir (only with --extract-images)

Configuration

Set in ~/.openclaw/.env or as environment variables:

Variable Required Description
MISTRAL_API_KEY Yes Your Mistral API key
TELEGRAM_BOT_TOKEN Only for --send Your Telegram bot token
TELEGRAM_CHAT_ID Optional Default chat ID (overridable with --target)

Environment Variables

MISTRAL_API_KEY=required        # Mistral API key — get one at console.mistral.ai
TELEGRAM_BOT_TOKEN=optional     # Required only when using --send
TELEGRAM_CHAT_ID=optional       # Default target chat ID (overridable with --target)

This skill reads ~/.openclaw/.env as a fallback for credentials. Ensure the file has restricted permissions: chmod 600 ~/.openclaw/.env

Requirements

  • Python 3.11+
  • Mistral API key (console.mistral.ai)
  • Optional (only for --extract-images): pip install pillow

Parameters

Parameter Required Description
--input Yes Local path or URL to image/PDF
--extract-images No Crop and save embedded images
--output-dir No Output directory (default: ./extracted-images)
--send No Send extracted images via Telegram
--target No Telegram chat ID (or TELEGRAM_CHAT_ID env var)
--pages No Number of PDF pages to process
--debug No Print raw API response
Usage Guidance
This skill appears to do what it says (send image/PDF content to Mistral OCR and optionally post cropped images to Telegram), but note these points before installing: - The registry metadata omitted required credentials, but the skill actually requires MISTRAL_API_KEY (and TELEGRAM_BOT_TOKEN only if you use --send). Provide the Mistral key via environment variables; otherwise the script exits. - SKILL.md says it reads ~/.openclaw/.env as a fallback, but the included Python script does not load that file — it reads only environment variables. If you rely on a .env file, ensure your environment loader populates os.environ or modify the script. - Using this skill sends document data to Mistral's API. Do not run it on highly sensitive documents unless you trust the Mistral service and your API key policy. Consider processing sensitive files in an isolated environment or checking your Mistral account data-retention policy. - If you use --send, the skill will upload images to Telegram using the provided bot token and chat ID. Ensure your TELEGRAM_BOT_TOKEN is limited to the bot you expect and keep it secret. - The repository imports subprocess but does not use it; no arbitrary shell execution is performed by the script. Still, review network endpoints (api.mistral.ai and api.telegram.org) and confirm you are comfortable with external network calls. If you want to proceed: set MISTRAL_API_KEY in the agent environment, audit that environment for other secrets, and run the script in an environment where accidental exfiltration risk is controlled. If you need stronger assurance, request the publisher correct the registry metadata and/or add explicit code to load ~/.openclaw/.env (or remove the misleading note).
Capability Analysis
Type: OpenClaw Skill Name: claw-text-and-pics Version: 1.0.1 The skill bundle provides legitimate OCR functionality by integrating with the Mistral API and offering optional image extraction and Telegram delivery. The `ocr.py` script uses standard Python libraries (`urllib`, `base64`, `pathlib`) and the Pillow library to process documents and communicate with official endpoints (`api.mistral.ai` and `api.telegram.org`). The instructions in `SKILL.md` and `README.md` are consistent with the code's behavior, and no evidence of malicious intent, unauthorized data exfiltration, or prompt injection was found.
Capability Tags
requires-sensitive-credentials
Capability Assessment
Purpose & Capability
The code and SKILL.md implement a Mistral OCR client that needs a MISTRAL_API_KEY and optionally TELEGRAM_BOT_TOKEN / TELEGRAM_CHAT_ID for sending images. However, the registry metadata at the top claims "Required env vars: none" and "Primary credential: none", which is incorrect. The required environment variables (MISTRAL_API_KEY) are proportionate to the stated purpose, but the registry listing failing to declare them is an inconsistency that could mislead users.
Instruction Scope
SKILL.md instructs the agent to read ~/.openclaw/.env as a fallback for credentials, but the included ocr.py only reads environment variables via os.environ and does not implement loading that file. Aside from that mismatch, the runtime behavior described (send document to Mistral, print Markdown, optionally crop images locally with Pillow, optionally send images to Telegram) matches the code. The skill transmits document data to api.mistral.ai (expected) and to api.telegram.org only when --send is used (also expected).
Install Mechanism
No install spec / external downloads are present; the skill is instruction+Python code only. Optional dependency is Pillow (pip). Nothing is downloaded from arbitrary URLs and no installers create unexpected binaries, so install risk is low.
Credentials
The code requires MISTRAL_API_KEY (sensitive) and optionally TELEGRAM_BOT_TOKEN / TELEGRAM_CHAT_ID for sending images. Those credentials are proportional to the functionality. The concern is the registry metadata omitted declaring the required env var(s), which may cause users to miss that they must provide a sensitive API key. The SKILL.md does document the env vars correctly; code enforces MISTRAL_API_KEY at runtime.
Persistence & Privilege
The skill does not request permanent/global presence (always:false) and it does not modify other skills or system-wide settings. Autonomous invocation is allowed by default but is not combined with other high-privilege requests, so no additional persistence concerns are present.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install claw-text-and-pics
  3. After installation, invoke the skill by name or use /claw-text-and-pics
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
- Removed "config_file" and "version" fields from skill metadata. - No functional changes; documentation and usage remain the same.
v1.0.0
Initial release of claw-text-and-pics - Extracts text and embedded images from scanned documents, PDFs, and photos using the Mistral OCR API. - Supports both local files and URLs as input. - Outputs clean Markdown text and saves cropped images. - Optional: Sends extracted images directly to Telegram. - Requires Python 3.11+, Mistral API key, and optionally Pillow for image extraction. - Environment variables and config file supported for credentials.
Metadata
Slug claw-text-and-pics
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is claw-text-and-pics?

Extract text and embedded images from scanned documents, PDFs, and photos via Mistral OCR API. Use when reading receipts, invoices, contracts, handwritten no... It is an AI Agent Skill for Claude Code / OpenClaw, with 97 downloads so far.

How do I install claw-text-and-pics?

Run "/install claw-text-and-pics" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is claw-text-and-pics free?

Yes, claw-text-and-pics is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does claw-text-and-pics support?

claw-text-and-pics is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created claw-text-and-pics?

It is built and maintained by photon78 (@photon78); the current version is v1.0.1.

💬 Comments