← 返回 Skills 市场
cinience

Aliyun Qwen Ocr

作者 cinience · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
104
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install aliyun-qwen-ocr
功能描述
Use when OCR-specialized extraction is needed with Alibaba Cloud Model Studio Qwen OCR models (`qwen-vl-ocr`, `qwen-vl-ocr-latest`, and snapshots), including...
使用说明 (SKILL.md)

Category: provider

Model Studio Qwen OCR

Validation

mkdir -p output/aliyun-qwen-ocr
python -m py_compile skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py && echo "py_compile_ok" > output/aliyun-qwen-ocr/validate.txt

Pass criteria: command exits 0 and output/aliyun-qwen-ocr/validate.txt is generated.

Output And Evidence

  • Save request payloads, selected OCR task name, and normalized output expectations under output/aliyun-qwen-ocr/.
  • Keep the exact model, image source, and task configuration with each saved run.

Use Qwen OCR when the task is primarily text extraction or document structure parsing rather than broad visual reasoning.

Critical model names

Use one of these exact model strings:

  • qwen-vl-ocr
  • qwen-vl-ocr-latest
  • qwen-vl-ocr-2025-11-20
  • qwen-vl-ocr-2025-08-28
  • qwen-vl-ocr-2025-04-13
  • qwen-vl-ocr-2024-10-28

Selection guidance:

  • Use qwen-vl-ocr for the stable channel.
  • Use qwen-vl-ocr-latest only when you explicitly want the newest OCR behavior.
  • Pin qwen-vl-ocr-2025-11-20 when you need reproducible document parsing based on the Qwen3-VL OCR upgrade.

Prerequisites

  • Install dependencies (recommended in a venv):
python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests
  • Set DASHSCOPE_API_KEY in environment, or add dashscope_api_key to ~/.alibabacloud/credentials.

Normalized interface (ocr.extract)

Request

  • image (string, required): HTTPS URL, local path, or data: URL.
  • model (string, optional): default qwen-vl-ocr.
  • prompt (string, optional): use when you want custom extraction instructions.
  • task (string, optional): built-in OCR task.
  • task_config (object, optional): configuration for built-in task such as extraction fields.
  • enable_rotate (bool, optional): default false.
  • min_pixels (int, optional)
  • max_pixels (int, optional)
  • max_tokens (int, optional)
  • temperature (float, optional): recommended to keep near default/low values.

Response

  • text (string): extracted text or structured markdown/html-style output.
  • model (string)
  • usage (object, optional)

Built-in OCR tasks

Use one of these values in task:

  • text_recognition
  • key_information_extraction
  • document_parsing
  • table_parsing
  • formula_recognition
  • multi_lan
  • advanced_recognition

Quick start

Custom prompt:

python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
  --image "https://example.com/invoice.png" \
  --prompt "Extract seller name, invoice date, amount, and tax number in JSON."

Built-in task:

python skills/ai/multimodal/aliyun-qwen-ocr/scripts/prepare_ocr_request.py \
  --image "https://example.com/table.png" \
  --task table_parsing \
  --model qwen-vl-ocr-2025-11-20

Operational guidance

  • Prefer built-in OCR tasks for standard parsing jobs because they use official task prompts.
  • For critical business fields, add downstream validation rules after OCR.
  • qwen-vl-ocr and older snapshots default to 4096 max output tokens unless higher limits are approved by Alibaba Cloud; qwen-vl-ocr-2025-11-20 follows the model maximum.
  • Increase max_pixels only when small text is missed; this raises token cost.

Output location

  • Default output: output/aliyun-qwen-ocr/request.json
  • Override base dir with OUTPUT_DIR.

References

  • references/api_reference.md
  • references/sources.md
安全使用建议
This skill is basically a small helper that builds Qwen OCR request JSON; it does not itself send requests. Before installing, ask the publisher to fix two things: (1) declare DASHSCOPE_API_KEY (or equivalent) in the skill metadata if the skill expects an API key, and (2) remove or justify the 'requests' dependency note (the included script does not use requests). If you do provide an API key, treat it like any cloud credential: limit its permissions, store it securely (not in shared shells), and audit usage. If you don't trust the publisher, inspect or run the prepare_ocr_request.py locally in a sandbox and avoid giving the API key until metadata is corrected.
功能分析
Type: OpenClaw Skill Name: aliyun-qwen-ocr Version: 1.0.0 The skill is a legitimate integration for Alibaba Cloud's Qwen OCR service. The primary script, `scripts/prepare_ocr_request.py`, safely constructs JSON request payloads from command-line arguments without any network calls, sensitive data access, or dangerous execution patterns. The documentation in `SKILL.md` and `references/` correctly reflects the official Alibaba Cloud Model Studio API usage and does not contain any malicious instructions or prompt-injection attempts.
能力评估
Purpose & Capability
The name/description (Qwen OCR helper) matches the included artifacts: SKILL.md, API reference, and a small Python script that prepares OCR request payloads. Requiring an Alibaba Cloud Dashscope API key is consistent with calling Model Studio endpoints. However, the skill metadata declares no required environment variables while the SKILL.md explicitly asks the user to set DASHSCOPE_API_KEY or add dashscope_api_key to ~/.alibabacloud/credentials — an incoherence in declared requirements.
Instruction Scope
Runtime instructions are narrowly scoped: validate the Python file compiles, generate and save a normalized request payload to output/aliyun-qwen-ocr/request.json, and keep run metadata. The SKILL.md tells the agent how to format requests and which models/tasks to use. It does not instruct the agent to read unrelated system files or exfiltrate data. The only notable instruction beyond payload prep is to supply a DASHSCOPE_API_KEY (see environment_proportionality).
Install Mechanism
There is no install spec and the skill is instruction-only plus a small helper script — nothing is downloaded or written during install. This is low-risk from an install perspective.
Credentials
The SKILL.md requires DASHSCOPE_API_KEY (or a dashscope_api_key entry in ~/.alibabacloud/credentials) to call Alibaba endpoints, which is reasonable for the stated purpose. However, the registry metadata lists no required environment variables or primary credential — this mismatch is concerning because a user may not realize an API key is needed or that the skill expects it. Additionally, the SKILL.md recommends installing the 'requests' package, but the included Python script does not import or use requests (the script only constructs JSON payloads). These inconsistencies should be resolved so users understand what secrets and dependencies are actually required.
Persistence & Privilege
The skill does not request persistent or elevated platform privileges. always is false and disable-model-invocation is false (normal). The skill writes only to its own output directory and does not modify other skills or system-wide configs.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install aliyun-qwen-ocr
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /aliyun-qwen-ocr 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of aliyun-qwen-ocr. - Provides OCR extraction using Alibaba Cloud Model Studio Qwen OCR models. - Supports document parsing, table parsing, multilingual OCR, formula recognition, and key information extraction. - Standardized `ocr.extract` interface with flexible image input, tasks, and prompt customization. - Includes clear model selection guidance and built-in OCR task descriptions. - Output, validation, and configuration instructions included for fast integration.
元数据
Slug aliyun-qwen-ocr
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Aliyun Qwen Ocr 是什么?

Use when OCR-specialized extraction is needed with Alibaba Cloud Model Studio Qwen OCR models (`qwen-vl-ocr`, `qwen-vl-ocr-latest`, and snapshots), including... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 104 次。

如何安装 Aliyun Qwen Ocr?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install aliyun-qwen-ocr」即可一键安装,无需额外配置。

Aliyun Qwen Ocr 是免费的吗?

是的,Aliyun Qwen Ocr 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Aliyun Qwen Ocr 支持哪些平台?

Aliyun Qwen Ocr 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Aliyun Qwen Ocr?

由 cinience(@cinience)开发并维护,当前版本 v1.0.0。

💬 留言讨论