/install llamaparse
LlamaParse
Parse documents (PDFs, images, spreadsheets, presentations — 130+ formats) into LLM-ready text, markdown, and structured data using the LlamaParse API.
Prerequisites
- Python package:
llama-cloud>=1.0(pip install llama-cloud) - API key: Set
LLAMA_CLOUD_API_KEYenvironment variable. Get one at https://cloud.llamaindex.ai
Verify setup:
pip install llama-cloud>=1.0
export LLAMA_CLOUD_API_KEY=llx-...
Quick Start
from llama_cloud import AsyncLlamaCloud
import asyncio
async def parse_document(file_path: str):
client = AsyncLlamaCloud() # Uses LLAMA_CLOUD_API_KEY env var
file = await client.files.create(file=file_path, purpose="parse")
result = await client.parsing.parse(
file_id=file.id,
tier="agentic",
version="latest",
expand=["markdown", "text"],
)
return result
result = asyncio.run(parse_document("document.pdf"))
print(result.markdown.pages[0].markdown)
Core Concepts
Tiers (required — choose one)
| Tier | Use Case | Cost |
|---|---|---|
agentic_plus |
Maximum accuracy, complex layouts, charts | Highest |
agentic |
Advanced parsing with intelligent agents | Medium-high |
cost_effective |
Balanced performance and cost | Medium |
fast |
Fastest, basic parsing | Lowest |
Always specify both tier and version. Use version="latest" for dev, or a date string like "2026-01-08" for production reproducibility.
Output Views (expand parameter)
Request one or more in the expand list:
markdown— Structured markdown with headings, lists, tables. Best for RAG/LLM pipelines.text— Clean flattened text per page. Good for search/retrieval.items— Structured tree of page elements (headers, paragraphs, tables, figures) with bounding boxes. Use for layout-aware processing.metadata— Document metadata.images_content_metadata— Image/screenshot metadata with presigned URLs.
Access results: result.markdown.pages[i].markdown, result.text.pages[i].text, result.items.pages[i].items
Output Options
Control markdown rendering:
output_options={
"markdown": {
"tables": {
"output_tables_as_markdown": True, # or False for HTML tables
},
},
"images_to_save": ["screenshot"], # Save page screenshots
}
Processing Options
processing_options={
"ignore": {"ignore_diagonal_text": True},
"ocr_parameters": {"languages": ["en"]}, # OCR language hints
"specialized_chart_parsing": "agentic_plus", # Extract charts as structured data
}
Custom Prompts (Agentic Parsing Instructions)
Guide the parser like an LLM — useful for extracting specific data or transforming output:
from llama_cloud.types.parsing_create_params import (
ProcessingOptions, ProcessingOptionsAutoModeConfiguration,
ProcessingOptionsAutoModeConfigurationParsingConf
)
result = await client.parsing.parse(
file_id=file.id,
tier="agentic",
version="latest",
expand=["markdown"],
processing_options=ProcessingOptions(
auto_mode_configuration=[ProcessingOptionsAutoModeConfiguration(
parsing_conf=ProcessingOptionsAutoModeConfigurationParsingConf(
custom_prompt="Extract only prices and totals from this receipt."
)
)]
),
)
Common Workflows
Parse a single document
Use scripts/parse_document.py:
python scripts/parse_document.py document.pdf --tier agentic --output markdown,text
Batch parse a folder
Use scripts/batch_parse.py:
python scripts/batch_parse.py ./documents/ --tier agentic --max-concurrent 5
Extract tables from a document
Request items in expand, then filter for table items:
for page in result.items.pages:
for item in page.items:
if hasattr(item, 'rows'): # Table item
print(f"Table on page {page.page_number}: {len(item.rows)} rows")
# item.csv, item.html, item.md available
Extract chart data
Enable specialized chart parsing, then pull table rows from the chart page:
result = await client.parsing.parse(
file_id=file.id,
tier="agentic_plus",
version="latest",
processing_options={"specialized_chart_parsing": "agentic_plus"},
expand=["items"],
)
Download page screenshots
import httpx, re
result = await client.parsing.parse(
file_id=file.id, tier="agentic", version="latest",
output_options={"images_to_save": ["screenshot"]},
expand=["images_content_metadata"],
)
for img in result.images_content_metadata.images:
if img.presigned_url and re.match(r"^page_\d+\.jpg$", img.filename):
async with httpx.AsyncClient() as http:
resp = await http.get(img.presigned_url)
with open(img.filename, "wb") as f:
f.write(resp.content)
API Reference
For complete API details, see references/api-reference.md.
External Service & Security
This skill uses the LlamaParse API (https://cloud.llamaindex.ai), a cloud document parsing service by LlamaIndex.
- API key required: You must set the
LLAMA_CLOUD_API_KEYenvironment variable. Get a key at https://cloud.llamaindex.ai. - Data sent externally: Documents are uploaded to the LlamaParse API for server-side parsing. Parsed results are returned to your local machine.
- No other network calls: The scripts only communicate with
api.cloud.llamaindex.ai. Screenshot downloads use presigned URLs from the same service. - Scripts are reference utilities:
scripts/parse_document.pyandscripts/batch_parse.pyare helper scripts meant to be run manually by the user. They are not executed automatically by the skill.
Tips
- Request only the
expandviews you need — more views = larger response + higher latency. - Use
agentic_plustier withspecialized_chart_parsingfor documents with charts/graphs. - For production, pin a specific
versiondate instead of"latest". - Use semaphore-based concurrency for batch parsing to respect rate limits.
- The
itemsview provides bounding boxes (b_box) for each element — useful for spatial analysis.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install llamaparse - 安装完成后,直接呼叫该 Skill 的名称或使用
/llamaparse触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Llamaparse 是什么?
Parse, extract, and analyze documents using the LlamaParse API (LlamaCloud). Use when the user asks to parse PDFs, images, spreadsheets, or other documents i... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 245 次。
如何安装 Llamaparse?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install llamaparse」即可一键安装,无需额外配置。
Llamaparse 是免费的吗?
是的,Llamaparse 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Llamaparse 支持哪些平台?
Llamaparse 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Llamaparse?
由 zli484(@zli484)开发并维护,当前版本 v1.0.1。