功能描述

阿里文档智能解析工具 - 将PDF/图片转结构化HTML。支持复杂布局、公式识别、化学结构、代码块、流程图、乐谱等。

使用说明 (SKILL.md)

Logics-Parsing 文档解析工具

Name: Logics-Parsing阿里文档解析
Author: smseow001

阿里文档智能解析 v1/v2 | GitHub 1.3k ⭐
文档图片 → 结构化 HTML | 复杂布局 | 公式识别 | 化学结构 | 代码块

一、核心定位

本技能整合阿里巴巴 Logics-Parsing 文档解析工具，核心理念：

End-to-End Document Parsing
从文档图片直接输出结构化结果，无需复杂 pipeline

二、版本对比

维度	v1	v2（推荐）
发布	2025-09	2026-02
性能	基础 SOTA	全面领先
LogicsDocBench	基准	82.16 分
OmniDocBench	基准	93.23 分
Parsing-2.0	❌ 不支持	✅ 支持
结构化内容	公式/化学	+ 流程图/乐谱/代码

三、核心能力

3.1 支持的内容类型

类型	输出格式	说明
文本段落	HTML `\x3Cp>`	自动识别标题/页眉/页脚
表格	HTML Table	跨页表格合并
科学公式	LaTeX / MathML	复杂公式精准识别
化学结构	SMILES 格式	分子式标准化
流程图	Mermaid 语法	v2 新增
乐谱	ABC Notation	v2 新增
代码块	语法高亮代码	v2 新增
手写内容	独立标注	区分打印/手写

3.2 输出结构

\x3Cdiv class="content-block" category="formula" bbox="[x1,y1,x2,y2]">
  \x3C!-- 公式内容 + 坐标 + OCR 文本 -->
\x3C/div>

每个元素包含：

category: 元素类型（paragraph/table/formula/figure 等）
bbox: 边界框坐标
text: OCR 识别文本

四、 Benchmarks 性能

4.1 LogicsDocBench（自建基准）

模型	总体分数
Logics-Parsing-v2	82.16 ✅
GPT-5	46.0
Gemini 2.5 pro	26.0
Qwen2.5VL-72B	34.9
SmolDocling	92.7

4.2 OmniDocBench-v1.5（公开基准）

模型	总体分数
Logics-Parsing-v2	93.23 ✅
GPT-5	46.0
Gemini 2.5 pro	46.0
Qwen2VL-72B	35.9
Doubao-1.6	31.7

五、安装方式

5.1 基础安装（推荐 v2）

# 1. 克隆仓库
git clone https://github.com/alibaba/Logics-Parsing.git
cd Logics-Parsing

# 2. 创建环境（Python 3.10）
conda create -n logics-parsing python=3.10
conda activate logics-parsing

# 3. 安装依赖
pip install -r requirements.txt

# 4. 下载模型（Modelscope）
pip install modelscope
python download_model_v2.py -t modelscope

# 或从 HuggingFace
pip install huggingface_hub
python download_model_v2.py -t huggingface

5.2 快速安装（仅 v1）

conda create -n logics-parsing python=3.10
conda activate logics-parsing
pip install -r requirements.txt

# 下载模型
python download_model.py -t modelscope

六、快速开始

6.1 v2 推理命令

python3 inference_v2.py \
  --image_path PATH_TO_INPUT_IMG \
  --output_path PATH_TO_OUTPUT \
  --model_path PATH_TO_MODEL

6.2 v1 推理命令

python3 inference.py \
  --image_path PATH_TO_INPUT_IMG \
  --output_path PATH_TO_OUTPUT \
  --model_path PATH_TO_MODEL

6.3 Python API

from logics_parsing import LogicsParser

# 初始化
parser = LogicsParser(model_path="path/to/model")

# 解析文档
result = parser.parse("document.jpg")

# 输出 HTML
print(result.html)

# 输出结构化 JSON
print(result.to_json())

七、应用场景

7.1 学术文档处理

场景	能力
论文 PDF 解析	提取公式/表格/参考文献
化学论文	SMILES 格式分子结构
数学讲义	LaTeX 公式精准提取
教科书	复杂布局（多栏/跨页）处理

7.2 商业文档处理

场景	能力
合同解析	条款表格结构化
财务报表	数字表格提取
发票识别	表单字段提取
报纸剪报	复杂排版处理

7.3 Parsing-2.0 场景（v2 新增）

场景	输出格式
流程图	Mermaid 代码
乐谱	ABC Notation
代码块	语法高亮代码
Pseudocode	结构化伪代码

八、输出示例

8.1 输入

[复杂布局学术论文图片，包含多栏文字、跨页表格、化学结构式]

8.2 结构化输出（HTML）

\x3Cdiv class="content-block" category="paragraph" bbox="[120,340,580,420]">
  \x3Cp>We introduce a new document parsing model...\x3C/p>
\x3C/div>

\x3Cdiv class="content-block" category="formula" bbox="[200,450,400,520]">
  \x3Cspan class="latex">E = mc^2\x3C/span>
\x3C/div>

\x3Cdiv class="content-block" category="chemistry" bbox="[100,550,300,700]">
  \x3Cspan class="smiles">CC(=O)OC(=O)C\x3C/span>
\x3C/div>

\x3Cdiv class="content-block" category="table" bbox="[50,750,600,900]">
  \x3Ctable>
    \x3Ctr>\x3Ctd>Method\x3C/td>\x3Ctd>Score\x3C/td>\x3C/tr>
    \x3Ctr>\x3Ctd>Logics-Parsing\x3C/td>\x3Ctd>82.16\x3C/td>\x3C/tr>
  \x3C/table>
\x3C/div>

九、与其他技能关联

本技能	关联技能	关系
Logics-Parsing	`ai-research-tools`	论文解析 + 科研自动化
Logics-Parsing	`browser-use`	网页内容抓取 + 解析
Logics-Parsing	`obsidian-handbook`	解析结果存入 Obsidian
Logics-Parsing	`math-theory-notes`	数学公式识别

十、常见问题

问题	解决方案
模型下载慢	使用 Modelscope（国内推荐）
显存不足	减小 `image_size` 参数
OCR 乱码	检查字体配置
表格识别不准	使用 v2 版本性能更优

十一、注意事项

⚠️ 注意事项：
- Python 3.10+ required
- 需要 GPU（推荐 8GB+ 显存）
- 模型文件较大（~2GB），下载需要网络
- 部分功能需要额外字体支持

十二、使用方式

触发场景

用户说「解析这篇 PDF」→ 调用 Logics-Parsing v2
用户说「提取论文公式」→ 调用 Logics-Parsing
用户说「识别化学结构式」→ SMILES 格式输出
用户说「将 PDF 转 HTML」→ 结构化 HTML 输出
用户说「解析乐谱」→ v2 Parsing-2.0 功能

组合使用

用户：「帮我把这篇论文的关键公式和表格提取出来」
→ 使用 Logics-Parsing v2 解析
→ 提取公式（LaTeX）+ 表格（HTML）
→ 存入 Obsidian 或知识库

本技能整合阿里 Logics-Parsing 文档解析工具的完整安装与使用指南

安全使用建议

This skill is a guide for installing and using Logics-Parsing and appears internally consistent. Before you follow the installation steps: 1) inspect the referenced GitHub repository (https://github.com/alibaba/Logics-Parsing.git) and its requirements.txt and install scripts for any unexpected post-install actions; 2) run installation in an isolated environment (conda/virtualenv) since pip may install arbitrary packages; 3) be prepared for large model downloads and GPU/space requirements; 4) only provide Modelscope/HuggingFace credentials if you intend to fetch private models and you trust the source; 5) avoid running repository scripts with elevated privileges. If you want extra assurance, ask for the repo URL contents or the requirements.txt before proceeding.

功能分析

Type: OpenClaw Skill Name: logics-parsing Version: 1.0.0 The skill bundle describes a document parsing tool ('Logics-Parsing') and provides instructions in SKILL.md for the agent to clone a GitHub repository, install dependencies, and execute Python scripts (e.g., download_model_v2.py). The content is highly suspicious as it references non-existent or future-dated entities, such as GPT-5, Gemini 2.5, and a repository (github.com/alibaba/Logics-Parsing.git) that does not currently exist. While no explicitly malicious code is contained within the provided files, the use of fabricated benchmarks and future dates (2026) to encourage the execution of external, unverified code constitutes a significant security risk.

能力评估

✓ Purpose & Capability

The name/description (document parsing -> structured HTML) aligns with the SKILL.md content. Steps reference cloning an Alibaba GitHub repo, installing Python deps, and downloading models from Modelscope/HuggingFace — all appropriate for a large on-device/offline parsing model.

ℹ Instruction Scope

SKILL.md gives concrete install/run commands (git clone, pip install, python inference scripts). It does not instruct reading unrelated system files or exfiltrating data. Note: it directs downloads from external hubs (Modelscope/HuggingFace) and running repo scripts — expected, but those scripts should be inspected before execution.

ℹ Install Mechanism

The skill itself has no install spec (instruction-only), which is low-risk. The instructions recommend cloning a GitHub repo and downloading ~2GB models via Modelscope/HuggingFace — standard for ML models but involves large external downloads and running third-party code from the repo.

✓ Credentials

No environment variables, credentials, or config paths are declared or required by the skill metadata. The SKILL.md mentions Modelscope and huggingface_hub (which may optionally use tokens for private models) but does not demand any unrelated secrets.

✓ Persistence & Privilege

always is false and the skill does not request persistent or system-level privileges or modify other skills. As an instruction-only guide it does not gain autonomous runtime capabilities by itself.

版本历史

v1.0.0

- Initial release of the logics-parsing skill. - Integrates Alibaba Logics-Parsing tool for converting PDFs/images into structured HTML, supporting complex layouts, formulas, chemical structures, code blocks, flowcharts, sheet music, and more. - Includes detailed feature and benchmark comparisons of v1 and v2, with v2 recommended for improved performance and broader content support. - Provides comprehensive installation and usage instructions for both versions, including command-line and Python API examples. - Lists supported content types, application scenarios, sample outputs, troubleshooting tips, and integration with related tools and workflows.

元数据

Slug logics-parsing

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Logics-Parsing阿里文档解析是什么？

阿里文档智能解析工具 - 将PDF/图片转结构化HTML。支持复杂布局、公式识别、化学结构、代码块、流程图、乐谱等。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 51 次。

如何安装 Logics-Parsing阿里文档解析？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install logics-parsing」即可一键安装，无需额外配置。

Logics-Parsing阿里文档解析是免费的吗？

是的，Logics-Parsing阿里文档解析完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Logics-Parsing阿里文档解析支持哪些平台？

Logics-Parsing阿里文档解析跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Logics-Parsing阿里文档解析？

由 SMS（@smseow001）开发并维护，当前版本 v1.0.0。

Logics-Parsing阿里文档解析