Web to Markdown
/install web-to-md
Web to Markdown
Deterministic, console-first extraction workflow for user-provided URLs. Enforces a fixed fallback chain to maximize content quality without open-ended browsing.
When to Use
- The user provides one or more specific URLs.
- The task requires reading, extracting, summarizing, or analyzing those URLs.
- A deterministic fallback order is preferred over open-ended browsing.
Do not use for open-ended web discovery unless the user explicitly asks for discovery first.
Fallback Chain
For each URL, attempt in order. Stop at the first sufficient result.
1. markdown.new (AI mode)
curl -s "https://markdown.new/{URL}?method=ai"
2. markdown.new (Auto mode)
Only if step 1 is insufficient or timed out:
curl -s "https://markdown.new/{URL}?method=auto"
3. r.jina.ai (Browser engine)
Only if steps 1–2 are insufficient or timed out:
curl -s "https://r.jina.ai/{URL}" -H "X-Engine: browser"
4. Agent tools (last resort)
If all three prefixes fail, report the failure and fall back to the agent's own extraction tools. This is outside the skill's chain — acknowledge it as a fallback.
Quality Gate
After each step, content is insufficient when any condition is true:
- Main article or body text is missing
- Content is clearly truncated
- Output is mostly navigation, boilerplate, placeholders, or login walls
- Useful text is too short for the task
- Important sections requested by the user are absent
Rule of thumb: Under ~1,200 useful characters for an article page is almost certainly truncated. Naturally short pages (announcements, status updates) may be legitimately brief — use judgment.
URL Handling
- Preserve the protocol when present.
- Ensure the URL is shell-safe and quoted in all curl commands.
- Process each URL independently when multiple are provided.
Provenance Reporting
Report exactly one final source label per extracted URL in your response:
| Label | When |
|---|---|
markdown.new:ai |
method=ai was sufficient |
markdown.new:auto |
method=auto was sufficient (ai failed) |
r.jina.ai |
r.jina.ai was sufficient (both markdown.new failed) |
agent-tools |
All three prefixes failed; agent used own tools |
Workflow
- Scope gate — Only process URLs explicitly provided by the user. If discovery is needed, use web search first and confirm candidate URLs before extraction.
- Normalize — Quote URLs, preserve protocol.
- Extract — Run the fallback chain per URL.
- Quality gate — Check each result against the insufficiency conditions.
- Continue — Use the richest sufficient source for the task.
- Report — Include provenance labels in the final response.
Best Practices
- Keep extraction deterministic — explicit fallback transitions, state why each happened.
- Prefer reproducible commands with quoted URLs.
- Conservative timeout handling: continue immediately to the next fallback when blocked.
- Preserve source traceability via provenance labels.
- Avoid tool-specific assumptions beyond curl and standard HTTP endpoints.
Edge Cases
- Page blocks automated access: Skip to next fallback immediately.
- Multiple URLs: Apply the same sequence to each independently.
- Naturally short pages: Accept shorter content when it satisfies the request.
- All prefixes fail: Report failure clearly, then use agent tools as last resort.
Common Pitfalls
- Output format must be markdown. If any level returns raw HTML or another format, it breaks the contract. Test each level independently.
- Don't skip testing lower fallback levels just because the top level works. A chain is only as reliable as its weakest link.
- Quality is subjective — the 1,200-char heuristic is a guideline, not a hard rule. Apply judgment for short-form content.
Verification Checklist
- curl is installed (
which curl) - Extraction starts with
markdown.new?method=ai -
method=autois tried only after ai fails -
r.jina.aiis tried only after both markdown.new attempts fail - All three prefixes failing → report + fall back to agent tools
- Quality checks include: missing body, truncation, boilerplate, too-short content
- Final response includes provenance label per URL
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install web-to-md - 安装完成后,直接呼叫该 Skill 的名称或使用
/web-to-md触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Web to Markdown 是什么?
Extracts readable markdown from user-provided URLs via a deterministic fallback chain (markdown.new → r.jina.ai). Use when the user supplies specific URLs an... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 56 次。
如何安装 Web to Markdown?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install web-to-md」即可一键安装,无需额外配置。
Web to Markdown 是免费的吗?
是的,Web to Markdown 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Web to Markdown 支持哪些平台?
Web to Markdown 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Web to Markdown?
由 Christian de la Cruz(@chdlc)开发并维护,当前版本 v1.0.0。