← 返回 Skills 市场
volcengine-skills

Byted Las Document Parse

作者 volcengine-skills · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
138
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install byted-las-document-parse
功能描述
CRITICAL: EXCELLENT at parsing BOTH PDF documents and IMAGES (including LONG SCREENSHOTS, scanned documents, and standard images). Extract structured Markdow...
使用说明 (SKILL.md)

LAS-AI Document Parsing

Powerful Parser for PDF & Images: Extract structured text, tables, and data from PDF documents and IMAGES (especially extremely long screenshots and scanned files), converting them seamlessly to structured Markdown format. Powered by Bytedance LAS-AI.

设计模式

本 skill 主要采用:

  • Pipeline:多步骤工作流(确认模式 → 提交 → 轮询 → 结果)
  • Tool Wrapper:封装 LAS API,提供按需知识
  • Inversion:在关键决策点(解析模式)先收集用户偏好

Gotchas

  • 必须配置 LAS_API_KEY 环境变量,获取方式
  • 本地上传文件需要配置 TOS_ACCESS_KEYTOS_SECRET_KEY获取方式
  • env.sh 自动加载(无需手动 source):自动发现 skill 目录 / 当前目录的 env.sh,仅补充缺失变量;--env-file 显式指定则强制覆盖已有值
  • 如果用户报告认证失败(401),优先建议加上 --env-file 参数以确保使用最新凭证
  • 长图(宽高比 \x3C 0.334)处理耗时更长,会自动分页
  • API 并发限制为 1 QPM,过快调用可能触发限流
  • 断点恢复:如果会话中断,用户稍后回来提供 task_id,直接使用 check-and-notify --task-id {task_id} --output /tmp/las_parse_{task_id} --poll 即可恢复所有本地图片下载和 ZIP 打包流程。

工作流

复制此清单并跟踪进度:

解析进度:
- [ ] 步骤 0:确认解析模式
- [ ] 步骤 1:确认虚拟环境就绪
- [ ] 步骤 2:提交任务
- [ ] 步骤 3:立即回复用户
- [ ] 步骤 4:后台异步轮询

第 0 步:确认解析模式(必须先执行)

在提交任务前,必须先让用户确认解析模式。向用户展示以下信息并询问:

请选择 PDF 解析模式:

| 模式 | 说明 | 价格 |
|------|------|------|
| normal | 默认模式,不进行深度思考,速度更快,适用于绝大多数文档 | 0.02 元/页 |
| detail | 深度分析模式,精度更高但处理时间更长 | 0.04 元/页 |

推荐:如果文档结构简单、以文字为主,选择 normal;如果文档包含复杂表格、图表或需要高精度识别,选择 detail。

行为约束

  • 如果用户明确指定了模式,直接使用用户指定的模式
  • 如果用户未明确指定,默认使用 normal 模式
  • 确认模式后,继续执行第 1 步

第 1 步:确认虚拟环境就绪(仅首次)

首次执行命令前,将环境初始化与 submit chain 在一起,一次工具调用完成:

cd {skill_directory} && (test -d .venv || (python3 -m venv .venv && .venv/bin/pip install -r requirements.txt)) && .venv/bin/python3 scripts/skill.py [--env-file /path/to/env.sh] submit --url "{文件路径或URL}" --parse-mode {normal|detail}

如果上述命令执行成功(未报错),说明虚拟环境已就绪。后续命令无需再重复环境检查,直接使用 .venv/bin/python3 即可。

第 2 步:提交任务

首次用上方 chain 命令完成环境检查 + submit。

  • 输入类型自动识别(本地PDF / 本地图片 / HTTP URL / TOS路径)
  • stdout 输出一行 JSON{"task_id": "xxx", "eta": "40秒~2分钟", "input_type": "local_image", "pages": 4}
    • 若返回了 tos_bucket / tos_prefix,须在第 4 步透传给 check-and-notify
  • 建议输出目录: /tmp/las_parse_{task_id}/

第 3 步:立即回复用户(submit 成功后必须先执行此步)

submit 命令返回后,必须先回复用户再做任何其他操作(包括启动轮询)。用户在等待反馈——如果跳过此步直接进入轮询,用户会长时间看不到任何响应。

解析 stdout JSON,立刻回复:

✅ PDF 解析任务已提交
📋 任务ID: {task_id}
⏳ 预计 {eta} 完成,完成后会自动通知您,您可以继续其他操作。

回复后,在同一个响应中继续第 4 步。

第 4 步:后台异步轮询

在回复用户的同一个响应中,根据当前运行环境,选择以下第一个匹配的方案执行:

环境检测方法

  • 支持后台子代理(可 spawn subagent) → 方案 A(后台子代理轮询)
  • 不支持后台子代理 → 方案 B(降级:当前对话轮询)

方案 A:后台子代理轮询(首选)

使用后台子代理执行轮询。子代理在独立上下文中运行,不阻塞主对话,完成后结果自动返回主会话。

调用后台子代理,使用如下参数:

  • description: PDF解析结果轮询 ({task_id})
  • prompt:
你是一个后台轮询 Agent,任务是检查 PDF 解析任务状态并返回结果。

执行以下命令(一次调用即可,脚本内部会自动进行智能动态退避轮询,无需你手动循环):
cd {skill_directory} && .venv/bin/python3 scripts/skill.py check-and-notify --task-id {task_id} --poll [--tos-bucket {tos_bucket}]

命令会阻塞直到任务完成/失败/超时,然后输出一行 JSON 到 stdout。

根据退出码向用户报告:
- exit 0 (成功): 读取 stdout JSON,按"结果回复模板"格式向用户报告
- exit 1 (失败): 读取 stdout JSON 中的 error_msg,报告错误
- exit 3 (超时): 报告任务耗时较长,提供 task_id 供手动查询

结果回复模板(exit 0 成功时):

  ✅ PDF 解析完成
  📄 页数: {page_count} | 表格: {table_count} | 图片: {image_count}(已下载 {image_downloaded} 张)
  📝 内容预览: {preview}
  📂 本地路径: {output_dir}/
     ├── result.md(图片已替换为本地路径)
     ├── result.full.json(完整 API 响应)
     └── images/({image_downloaded} 张图片)
  ☁️ 打包下载: {download_link_markdown}

  - 仅当 `has_download_url` 为 true 时显示下载行
  - 若 `has_download_url` 为 false,展示 `download_url_missing_reasons` 中的原因代替下载行
  - 直接原样输出 JSON 中 `download_link_markdown` 字段的值,不要自行拼接或截断

行为约束(主 Agent 侧):

  • 调用后台子代理后,不要等待子代理完成,继续响应用户
  • 子代理完成后结果会自动回传,不要主动轮询子代理状态
  • 如果后台子代理不可用或调用失败,降级到方案 B

方案 B:当前对话轮询(后台子代理不可用时)

直接在当前对话中执行带 --poll 的命令,脚本内部会自动进行智能动态退避轮询:

cd {skill_directory} && .venv/bin/python3 scripts/skill.py check-and-notify --task-id {task_id} --poll [--tos-bucket {tos_bucket}]

命令会阻塞直到出结果(成功/失败/超时),一次调用即可,不要手动循环

Exit Code 含义 处理
0 成功 读取 stdout JSON,按下方"结果回复模板"格式报告结果
1 失败 读取 stdout JSON 中的 error_msg,报告错误
3 超时 报告任务耗时较长,提供 task_id 供手动查询

结果回复模板(exit 0 成功时)

✅ PDF 解析完成
📄 页数: {page_count} | 表格: {table_count} | 图片: {image_count}(已下载 {image_downloaded} 张)
📝 内容预览: {preview}
📂 本地路径: {output_dir}/
   ├── result.md(图片已替换为本地路径)
   ├── result.full.json(完整 API 响应)
   └── images/({image_downloaded} 张图片)
☁️ 打包下载: {download_link_markdown}
  • 仅当 has_download_url 为 true 时显示下载行
  • has_download_url 为 false,展示 download_url_missing_reasons 中的原因代替下载行
  • 直接原样输出 JSON 中 download_link_markdown 字段的值,不要自行拼接或截断

下载链接展示(重要)

返回 JSON 中包含 download_link_markdown 字段,值是已拼好的完整 Markdown 链接(如 [点击下载ZIP(24小时有效)](https://...))。

直接原样输出 download_link_markdown 的值,不要自行拼接或改写。预签名 URL 包含签名参数(?X-Tos-Algorithm=...&X-Tos-Signature=...),任何截断都会导致 403 下载失败。

作为兜底,下载链接也写入了 {output_dir}/download_url.txt。如果输出时链接被截断,引导用户从该文件获取完整链接:

cat {output_dir}/download_url.txt

不要使用 tos_internal_path 字段——它是 tos:// 内部路径,无法作为 HTTP 链接。

输出目录结构

/tmp/las_parse_{task_id}/
├── result.md              # Markdown 正文(图片 URL 已替换为本地相对路径)
├── result.full.json       # 完整 API 响应(含 detail / text_blocks / 位置信息)
├── images/                # 下载到本地的所有图片
├── images.json            # 图片清单(含原始 URL、本地路径、下载状态)
└── download_url.txt       # 完整下载链接(兜底,防止 LLM 输出截断)

配置了 --tos-bucket 时,结果会打包为 zip 上传到 TOS 并生成预签名下载 URL(默认 24 小时有效)。

更多参考

安全使用建议
This skill generally appears to do what it says: call a LAS API to parse PDFs/images and optionally upload/download results via TOS. Before installing, consider the following: - Secrets required: you must provide LAS_API_KEY and (for local-file uploads) TOS_ACCESS_KEY / TOS_SECRET_KEY and a TOS_BUCKET. Only supply credentials you trust and prefer creating a dedicated, least-privilege LAS/TOS keypair and bucket for this skill. - env.sh auto-load: the skill auto-loads env.sh from the skill directory and from the current working directory (and a user-specified --env-file will forcibly overwrite env vars). Make sure there is no sensitive env.sh in the working directory that could be picked up accidentally (e.g., CI keys, cloud credentials). Prefer using a minimal env.sh containing only the keys needed for this skill. - Automatic triggers: SKILL.md says to trigger whenever the user shares a local path/URL/tos:// path. If you want explicit consent before parsing local files, disable automatic triggers in your agent or only invoke the skill manually. - Data flow: the skill downloads remote images, saves files to /tmp, creates ZIP archives, and may upload archives to your TOS bucket and generate presigned download URLs. Review who has access to the TOS bucket and the lifetime of presigned URLs. - Review permissions: give the TOS credentials only the permissions required to write the expected result key/prefix, and rotate keys if you stop using the skill. If you want to increase confidence further, you can: (1) inspect scripts locally (they are included) to verify behavior, (2) run the skill in an isolated environment/container, and (3) create dedicated minimal-permission API/TOS credentials for it.
功能分析
Type: OpenClaw Skill Name: byted-las-document-parse Version: 1.0.0 The skill is a legitimate document parsing tool utilizing ByteDance's LAS-AI (Volcengine) services. It handles PDF and image parsing, including specialized logic for long screenshots via LLM-assisted cropping. Security best practices are observed, such as SSRF protection in `skill.py` (validating URLs against private IP ranges) and transparent logging of environment variable loading. While it handles sensitive credentials (TOS and LAS keys) and performs file uploads/downloads, these actions are strictly aligned with its stated purpose of document OCR and structured data extraction. No evidence of malicious intent, data exfiltration to unauthorized endpoints, or unauthorized persistence was found.
能力评估
Purpose & Capability
The requested credentials (LAS_API_KEY as primary, and TOS_ACCESS_KEY / TOS_SECRET_KEY / TOS_BUCKET for uploading local files) match the stated purpose of calling a Bytedance LAS parsing API and uploading/downloading artifacts to TOS. The code calls LAS endpoints and uses a TOS SDK for uploads, which is coherent with the description. Minor mismatch: documentation says TOS_BUCKET can be optional (overridable via --tos-bucket), but the registry metadata lists it as required; this is a small config inconsistency but not malicious.
Instruction Scope
The SKILL.md + scripts instruct the agent to automatically trigger on any shared local file path, URL, or tos:// path and to auto-load env.sh files (skill dir and current working directory). Auto-loading env.sh from the current working directory means the skill may read environment values from a project/workspace that contain unrelated secrets. The skill also allows a user-provided --env-file that will force-overwrite environment variables. These behaviors are functional for local file uploads but increase the chance of the skill reading/using environment variables beyond the minimal set needed for a single parse operation.
Install Mechanism
There is no remote download/install spec embedded in the skill bundle; dependencies are declared in requirements.txt and the SKILL.md instructs creation of a virtualenv and pip install -r requirements.txt — a standard, low-risk installation pattern. No suspicious external URLs or archive extract operations were used in install instructions.
Credentials
Requested environment variables (LAS_API_KEY, TOS_ACCESS_KEY, TOS_SECRET_KEY, TOS_BUCKET) are reasonable for a parser that uploads local files to TOS and calls LAS. However, the skill's auto-detection of env.sh files in cwd and the ability for an explicit --env-file to forcibly overwrite existing environment variables increase risk of unintentional use of unrelated secrets. Also, registry metadata marking TOS_BUCKET as required while docs treat it as optional is a minor inconsistency to be aware of.
Persistence & Privilege
The skill is not marked always:true and does not request modification of other skills or system-wide configuration. It runs subprocesses and writes results to /tmp (and optionally uploads a ZIP to TOS), which is expected for this functionality. Autonomous invocation is allowed (platform default) but not combined with an elevated 'always' privilege.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install byted-las-document-parse
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /byted-las-document-parse 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
byted-las-document-parse 1.0.0 - Initial release of the skill for parsing PDF documents and images (including long screenshots and scanned documents). - Extracts structured content (text, tables, images) and outputs in Markdown format. - Supports both local files and URLs, with automatic mode selection (normal/detail) and process workflow. - Handles asynchronous task submission, user feedback, and result retrieval (including background and fallback polling). - Environment setup and resume capability for interrupted sessions included. - Guides user with detailed workflow and troubleshooting tips in the documentation.
元数据
Slug byted-las-document-parse
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Byted Las Document Parse 是什么?

CRITICAL: EXCELLENT at parsing BOTH PDF documents and IMAGES (including LONG SCREENSHOTS, scanned documents, and standard images). Extract structured Markdow... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 138 次。

如何安装 Byted Las Document Parse?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install byted-las-document-parse」即可一键安装,无需额外配置。

Byted Las Document Parse 是免费的吗?

是的,Byted Las Document Parse 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Byted Las Document Parse 支持哪些平台?

Byted Las Document Parse 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Byted Las Document Parse?

由 volcengine-skills(@volcengine-skills)开发并维护,当前版本 v1.0.0。

💬 留言讨论