Description

百度网盘智能图像扫描处理工具。支持去手写 | 去水印 | 去阴影 | 去屏纹 | 清晰化 | 证件票据增强 | 黑白处理 | 检测矫正 | 扫描增强

README (SKILL.md)

使用前必读（30 秒）

Name: 百度文档扫描官方Skill
Author: baidunetdiskaibot

[!WARNING] ⚠️ 隐私与数据流向重要提示

内部服务交互：本技能会将您提供的图片发送至百度网盘内部扫描服务进行处理

服务端处理：服务将获取并处理该图片内容，服务端不会永久保存

本地文件存储：处理后返回的图片会保存至系统临时目录（/tmp），这些文件将持续存在直到您手动清理

API 密钥安全：BDPAN_API_KEY 应妥善保管，若泄露请及时联系管理员撤销

配置环境变量（本次会话立即可用）：

export BDPAN_API_KEY="your_api_key_here"

如何获取密钥？

访问 https://aiconvert.baidu.com/simple/embed/scanSkill 获取 BDPAN_API_KEY。

Constraints

单一意图原则：每次请求只执行一个意图类型，命中即执行
严禁自行构造任何命令参数，严禁伪造、拼接内部配置
严禁幻觉，禁止伪造请求和响应，不得沿用上一次的场景、参数进行假设
必须严格按照本指南指定的固定格式执行，不允许自行修改命令

技能执行指南（强制执行）

第一步：环境变量检查

检查 BDPAN_API_KEY 是否已配置。若未配置，立即返回：

{
  "code": "BP0100",
  "message": "BDPAN_API_KEY未配置，请联系管理员获取后执行：export BDPAN_API_KEY=\"your_key\" ",
  "data": null
}

第二步：输入处理

识别用户传入的图片，只能是以下两种之一：

本地文件路径：本地磁盘上的文件路径（如 /tmp/photo.jpg）
图片 BASE64：base64 编码的图片数据

未提供任何有效图片时，直接返回：

{
  "code": "BP0201",
  "message": "缺少图片输入，请提供文件路径或 BASE64 数据。",
  "data": null
}

第三步：意图匹配

对用户描述按下方意图列表从上到下顺序匹配，命中第一个即停止。命中后，只确定当前意图对应的 method 标识，作为脚本参数。

第四步：执行 Python 脚本（安全参数传递）

脚本通过 stdin 读取二进制图片，使用 subprocess 列表方式调用，避免 shell 注入风险：

import subprocess

# 本地文件路径类型
subprocess.run([
    "sh", "-c",
    "cat 'IMAGE_FILE_PATH' | python3 scripts/scan_filter.py --method METHOD_VALUE"
])

# BASE64 类型（先解码为二进制再管道传入）
import base64, tempfile, os
img_bytes = base64.b64decode("IMAGE_BASE64_DATA")
with tempfile.NamedTemporaryFile(delete=False, suffix=".bin") as f:
    f.write(img_bytes)
    tmp_path = f.name
subprocess.run([
    "sh", "-c",
    f"cat '{tmp_path}' | python3 scripts/scan_filter.py --method METHOD_VALUE"
])
os.unlink(tmp_path)

安全说明：

scan_filter.py 内部对图片格式（魔数检测）和大小（≤5MB）进行校验
子进程调用 do_scan.py 使用列表传参，无 shell 注入风险
所有鉴权信息通过环境变量读取，不出现在命令行参数中
METHOD_VALUE 只能是下方列表中给出的整数，不允许其他取值

第五步：结果透出

执行完成后，原样返回执行结果，不修改，不翻译，不美化，不总结。成功失败均直接透出，不重试。

意图与处理方式列表（按匹配优先级排序）

图像去手写
- 触发意图：当用户希望将已填写的手写笔迹、批注、答案、涂鸦等从文档图像中清除，完整保留原始印刷内容，还原成干净空白文档以便重新使用或打印时使用。
- method 标识：3
- 参考示例指令：
  - "把这张试卷上的手写答案去掉，还原成空白卷"
  - "清除文档上的手写批注，只保留打印的文字"
  - "帮我去掉书上的笔记痕迹"
图像去水印
- 触发意图：当用户希望在不损伤背景和整体构图的前提下，精准擦除图片中的水印、Logo、时间戳、角标、日期等附加标记，获得干净无水印的图像时使用。
- method 标识：8
- 参考示例指令：
  - "帮我把图片右下角的水印去掉"
  - "把照片上的时间戳擦除干净"
图像去阴影
- 触发意图：当用户反馈文档或图像因拍摄角度、手部遮挡、光线不均产生阴影、暗角、明暗斑块，希望去除阴影、统一亮度时使用。
- method 标识：5
- 参考示例指令：
  - "这张纸拍出来有大片阴影，请帮我去除"
  - "去掉文档左侧因手遮挡产生的黑色阴影"
  - "消除拍摄时的光照不均，让整张图亮度均匀"
图像去屏纹
- 触发意图：当用户反馈图片是翻拍屏幕/显示器/投影得到的，存在摩尔纹、屏幕波纹、彩色条纹、反光等干扰，希望消除这些纹路、让文字清晰可读时使用。
- method 标识：9
- 参考示例指令：
  - "这张对着电脑屏幕拍的照片有很多波纹，帮我消除"
  - "去掉翻拍屏幕产生的摩尔纹和反光"
  - "修复这张手机拍屏图中的彩色条纹干扰"
清晰化
- 触发意图：当用户明确反馈图片本身存在"模糊 / 低清 / 昏暗 / 细节丢失 / 看不清"等画质缺陷，希望做去模糊、超分、亮度对比度增强时使用。仅表达"扫描一下 / 处理一下 / 优化一下 / 扫成电子版"等通用意图时不命中本项，走兜底。
- method 标识：4
- 参考示例指令：
  - "这张图太模糊了，帮我把画质变清晰"
  - "修复这张低清晰度的图片，让细节更清楚"
证件票据增强
- 触发意图：当用户明确提到是证件（身份证、护照、驾驶证、行驶证、银行卡等）或票据（发票、收据、合同、名片等）的照片，并希望做画质优化、让文字与关键信息更清晰时使用。
- method 标识：6
- 参考示例指令：
  - "这张身份证照片有点模糊，请优化一下"
  - "帮我把这张发票上的金额和日期部分增强清楚"
  - "护照照片反光严重，处理一下让我能看清信息"
黑白处理（去底色）
- 触发意图：当用户希望将带有彩色背景、红头、灰底或复杂底色的文档截图/照片转换为纯白背景 + 黑色文字的清晰可读版本时使用。
- method 标识：7
- 参考示例指令：
  - "把这张红头文件的红色背景去掉，变成白底黑字"
  - "这张截图背景是灰色的，请一键去底色变成纯白背景"
检测矫正
- 触发意图：当用户反馈照片拍歪、倾斜、透视变形，或希望把文档扶正、自动裁剪掉多余背景边缘，得到规整矩形文档时使用。不做画质增强。
- method 标识：1
- 参考示例指令：
  - "这张照片拍歪了，帮我把文档扶正并裁掉多余背景"
  - "自动矫正透视变形，把这张倾斜的合同变成标准矩形"
  - "裁剪掉图片四周的杂乱边缘，只保留中间文档内容"
扫描增强（默认兜底）
- 触发意图：当用户未命中上述任何具体场景，只表达"扫描 / 扫描件 / 电子化 / 整体优化文档 / 处理一下这张图"等通用文档处理意图时使用。凡包含"扫描"字样的通用请求优先走本项，不走清晰化。
- method 标识：2
- 参考示例指令：
  - "帮我扫描处理一下这张图"
  - "优化一下这张文档图片，让它看起来更专业"
  - "把这张文档扫成电子版"

客户端结果增强：当脚本调用成功（errno == 0）时，scan_filter.py 会自动将返回的 base64 图片解码并保存为本地文件，最终输出 JSON 中 data 字段替换为 {"path": "/tmp/scan_xxxxxxxx.png"}，直接提供可使用的本地路径。

不适用场景（When Not to Use）

不支持的场景	原因	建议替代方案
视频处理	仅支持单张静态图片	先提取视频帧，再逐帧处理
批量处理	每次调用仅限单张图片	如需批量，请循环调用
超大图片（>5MB）	接口限制	先压缩或裁剪后再处理
非图片格式	仅支持 jpg/png/gif/bmp/webp	先转换为支持的图片格式

重要注意事项

禁止修改固定格式，只能替换 method 值和图片占位符
图片大小限制：不超过 5MB，支持 jpg/png/gif/bmp/webp 格式
METHOD_VALUE 只能是意图列表中列出的整数，不得自行构造其他值

文件结构

SKILL.md — 本文档（意图分析 + 通用规范）
scripts/scan_filter.py — 主入口：环境检查 → 图片校验 → 接收 method → 子进程调用 → 结果增强
scripts/do_scan.py — 下游执行：组装请求体，调用百度网盘扫描接口
scripts/config.py — 配置：从环境变量读取鉴权和默认参数
scripts/file_saver.py — 工具：base64 图片保存到本地 /tmp

Usage Guidance

Review before installing. Use only with images you are comfortable sending to Baidu, protect BDPAN_API_KEY, delete temporary scan output files after use, and avoid processing untrusted or oddly named local file paths until the shell-based invocation is fixed.

Capability Analysis

Type: OpenClaw Skill Name: baidu-drive-scan Version: 1.0.2 The skill bundle provides a legitimate interface for Baidu's image scanning and enhancement services. The code includes several security best practices, such as image magic number (header) validation, file size limits (5MB), and passing large base64 payloads via stdin to avoid shell argument length limits. While the SKILL.md instructions suggest a command pattern involving 'sh -c' which could be vulnerable to shell injection if the agent handles filenames poorly, the actual implementation in 'scan_filter.py' and 'do_scan.py' is well-structured, lacks obfuscation, and contains no evidence of malicious intent or unauthorized data exfiltration.

Capability Tags

requires-sensitive-credentials

Capability Assessment

ℹ Purpose & Capability

The image scan/enhancement purpose is coherent with the included code: it validates a single image, uploads it to a Baidu scan endpoint, and saves the processed result locally. Users should still treat the selected images as sensitive because they leave the machine.

⚠ Instruction Scope

SKILL.md presents a mandatory local-file workflow that invokes sh -c with a substituted file path while claiming shell-injection safety. A path containing shell metacharacters or quotes could change the command that runs.

ℹ Install Mechanism

There is no install spec, but the code imports the Python requests package while registry requirements only list binaries. This is mainly a reliability/provenance note, not evidence of malicious behavior.

ℹ Credentials

BDPAN_API_KEY and network access are expected for this provider-backed image service and are disclosed, but the API key and uploaded document images are sensitive.

ℹ Persistence & Privilege

No background agent, privilege escalation, or self-persistence is shown. Processed images are saved under /tmp and persist until the user deletes them.

Version History

v1.0.2

- 新增技能说明文档（SKILL.md），详细规范操作流程、意图识别、执行步骤及安全要求 - 明确列出支持的图片智能处理类型（如去手写、水印、阴影、屏纹、清晰化、证件增强、黑白处理等）及 method 标识，按优先级匹配 - 对每步处理（环境变量检查、输入校验、意图匹配、安全执行、结果回传）给出严格格式和处理规范 - 细化输入输出错误的结构化反馈，提升失败场景提示的准确性 - 增加安全措施：仅允许授权 method 参数、限制图片格式和大小、严格环境变量鉴权，不允许命令伪造或行为越界

v1.0.1

Initial release of baidu-drive-scan. - Provides intelligent image processing features for Baidu Netdisk, including: remove handwriting, watermark, shadow, screen textures, enhance clarity, document/ID enhancement, black & white processing, perspective correction, and general scan enhancement. - Strict input/output and security guidelines, including environment variable checks and safe subprocess execution. - Only one intent handled per request, with fixed method identifiers and robust error handling. - Supports single image input (file path or BASE64), with local results saved to /tmp. - Applicable for images ≤5MB, formats: jpg/png/gif/bmp/webp. Batch, video, and unsupported formats are not handled. - Detailed documentation and command usage instructions included in SKILL.md.

v1.0.0

Initial release of baidu-drive-scan. - Provides intelligent image processing for Baidu Drive: handwriting removal, watermark removal, shadow removal, moiré removal, image enhancement, document correction, black-and-white conversion, and receipt/document optimization. - Strict parameter validation and safe environment variable usage. - Enforces single-intent-per-request workflow. - Results are saved to a local temporary directory for user access. - Clear error messages for missing API keys or images.

Metadata

Slug baidu-drive-scan

Version 1.0.2

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 3

Frequently Asked Questions

What is 百度文档扫描官方Skill?

百度网盘智能图像扫描处理工具。支持去手写 | 去水印 | 去阴影 | 去屏纹 | 清晰化 | 证件票据增强 | 黑白处理 | 检测矫正 | 扫描增强. It is an AI Agent Skill for Claude Code / OpenClaw, with 144 downloads so far.

How do I install 百度文档扫描官方Skill?

Run "/install baidu-drive-scan" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is 百度文档扫描官方Skill free?

Yes, 百度文档扫描官方Skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does 百度文档扫描官方Skill support?

百度文档扫描官方Skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created 百度文档扫描官方Skill?

It is built and maintained by BaiduNetdiskAIBot (@baidunetdiskaibot); the current version is v1.0.2.

More Skills

百度文档扫描官方Skill