← Back to Skills Marketplace
nengnengz

Baoyu Url To Markdown

by nengnengZ · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ⚠ suspicious
246
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install baoyu-url-to-markdown-2
Description
Fetch any URL and convert to markdown using Chrome CDP. Saves the rendered HTML snapshot alongside the markdown, uses an upgraded Defuddle pipeline with bett...
README (SKILL.md)

URL to Markdown

Fetches any URL via Chrome CDP, saves the rendered HTML snapshot, and converts it to clean markdown.

Script Directory

Important: All scripts are located in the scripts/ subdirectory of this skill.

Agent Execution Instructions:

  1. Determine this SKILL.md file's directory path as {baseDir}
  2. Script path = {baseDir}/scripts/\x3Cscript-name>.ts
  3. Resolve ${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun
  4. Replace all {baseDir} and ${BUN_X} in this document with actual values

Script Reference:

Script Purpose
scripts/main.ts CLI entry point for URL fetching
scripts/html-to-markdown.ts Markdown conversion entry point and converter selection
scripts/defuddle-converter.ts Defuddle-based conversion
scripts/legacy-converter.ts Pre-Defuddle legacy extraction and markdown conversion
scripts/markdown-conversion-shared.ts Shared metadata parsing and markdown document helpers

Preferences (EXTEND.md)

Check EXTEND.md existence (priority order):

# macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-url-to-markdown/EXTEND.md && echo "project"
test -f "${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "xdg"
test -f "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "user"
# PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-url-to-markdown/EXTEND.md) { "project" }
$xdg = if ($env:XDG_CONFIG_HOME) { $env:XDG_CONFIG_HOME } else { "$HOME/.config" }
if (Test-Path "$xdg/baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "xdg" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "user" }

┌────────────────────────────────────────────────────────┬───────────────────┐ │ Path │ Location │ ├────────────────────────────────────────────────────────┼───────────────────┤ │ .baoyu-skills/baoyu-url-to-markdown/EXTEND.md │ Project directory │ ├────────────────────────────────────────────────────────┼───────────────────┤ │ $HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md │ User home │ └────────────────────────────────────────────────────────┴───────────────────┘

┌───────────┬───────────────────────────────────────────────────────────────────────────┐ │ Result │ Action │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ Found │ Read, parse, apply settings │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ Not found │ MUST run first-time setup (see below) — do NOT silently create defaults │ └───────────┴───────────────────────────────────────────────────────────────────────────┘

EXTEND.md Supports: Download media by default | Default output directory | Default capture mode | Timeout settings

First-Time Setup (BLOCKING)

CRITICAL: When EXTEND.md is not found, you MUST use AskUserQuestion to ask the user for their preferences before creating EXTEND.md. NEVER create EXTEND.md with defaults without asking. This is a BLOCKING operation — do NOT proceed with any conversion until setup is complete.

Use AskUserQuestion with ALL questions in ONE call:

Question 1 — header: "Media", question: "How to handle images and videos in pages?"

  • "Ask each time (Recommended)" — After saving markdown, ask whether to download media
  • "Always download" — Always download media to local imgs/ and videos/ directories
  • "Never download" — Keep original remote URLs in markdown

Question 2 — header: "Output", question: "Default output directory?"

  • "url-to-markdown (Recommended)" — Save to ./url-to-markdown/{domain}/{slug}.md
  • (User may choose "Other" to type a custom path)

Question 3 — header: "Save", question: "Where to save preferences?"

  • "User (Recommended)" — ~/.baoyu-skills/ (all projects)
  • "Project" — .baoyu-skills/ (this project only)

After user answers, create EXTEND.md at the chosen location, confirm "Preferences saved to [path]", then continue.

Full reference: references/config/first-time-setup.md

Supported Keys

Key Default Values Description
download_media ask ask / 1 / 0 ask = prompt each time, 1 = always download, 0 = never
default_output_dir empty path or empty Default output directory (empty = ./url-to-markdown/)

EXTEND.md → CLI mapping:

EXTEND.md key CLI argument Notes
download_media: 1 --download-media
default_output_dir: ./posts/ --output-dir ./posts/ Directory path. Do NOT pass to -o (which expects a file path)

Value priority:

  1. CLI arguments (--download-media, -o, --output-dir)
  2. EXTEND.md
  3. Skill defaults

Features

  • Chrome CDP for full JavaScript rendering
  • Two capture modes: auto or wait-for-user
  • Save rendered HTML as a sibling -captured.html file
  • Clean markdown output with metadata
  • Upgraded Defuddle-first markdown conversion with automatic fallback to the pre-Defuddle extractor from git history
  • Materializes shadow DOM content before conversion so web-component pages survive serialization better
  • YouTube pages can include transcript/caption text in the markdown when YouTube exposes a caption track
  • If local browser capture fails completely, can fall back to defuddle.md/\x3Curl> and still save markdown
  • Handles login-required pages via wait mode
  • Download images and videos to local directories

Usage

# Auto mode (default) - capture when page loads
${BUN_X} {baseDir}/scripts/main.ts \x3Curl>

# Wait mode - wait for user signal before capture
${BUN_X} {baseDir}/scripts/main.ts \x3Curl> --wait

# Save to specific file
${BUN_X} {baseDir}/scripts/main.ts \x3Curl> -o output.md

# Save to a custom output directory (auto-generates filename)
${BUN_X} {baseDir}/scripts/main.ts \x3Curl> --output-dir ./posts/

# Download images and videos to local directories
${BUN_X} {baseDir}/scripts/main.ts \x3Curl> --download-media

Options

Option Description
\x3Curl> URL to fetch
-o \x3Cpath> Output file path — must be a file path, not directory (default: auto-generated)
--output-dir \x3Cdir> Base output directory — auto-generates {dir}/{domain}/{slug}.md (default: ./url-to-markdown/)
--wait Wait for user signal before capturing
--timeout \x3Cms> Page load timeout (default: 30000)
--download-media Download image/video assets to local imgs/ and videos/, and rewrite markdown links to local relative paths

Capture Modes

Mode Behavior Use When
Auto (default) Capture on network idle Public pages, static content
Wait (--wait) User signals when ready Login-required, lazy loading, paywalls

Wait mode workflow:

  1. Run with --wait → script outputs "Press Enter when ready"
  2. Ask user to confirm page is ready
  3. Send newline to stdin to trigger capture

Output Format

Each run saves two files side by side:

  • Markdown: YAML front matter with url, title, description, author, published, optional coverImage, and captured_at, followed by converted markdown content
  • HTML snapshot: *-captured.html, containing the rendered page HTML captured from Chrome

When Defuddle or page metadata provides a language hint, the markdown front matter also includes language.

The HTML snapshot is saved before any markdown media localization, so it stays a faithful capture of the page DOM used for conversion. If the hosted defuddle.md API fallback is used, markdown is still saved, but there is no local -captured.html snapshot for that run.

Output Directory

Default: url-to-markdown/\x3Cdomain>/\x3Cslug>.md With --output-dir ./posts/: ./posts/\x3Cdomain>/\x3Cslug>.md

HTML snapshot path uses the same basename:

  • url-to-markdown/\x3Cdomain>/\x3Cslug>-captured.html

  • ./posts/\x3Cdomain>/\x3Cslug>-captured.html

  • \x3Cslug>: From page title or URL path (kebab-case, 2-6 words)

  • Conflict resolution: Append timestamp \x3Cslug>-YYYYMMDD-HHMMSS.md

When --download-media is enabled:

  • Images are saved to imgs/ next to the markdown file
  • Videos are saved to videos/ next to the markdown file
  • Markdown media links are rewritten to local relative paths

Conversion Fallback

Conversion order:

  1. Try Defuddle first
  2. For rich pages such as YouTube, prefer Defuddle's extractor-specific output (including transcripts when available) instead of replacing it with the legacy pipeline
  3. If Defuddle throws, cannot load, returns obviously incomplete markdown, or captures lower-quality content than the legacy pipeline, automatically fall back to the pre-Defuddle extractor
  4. If the entire local browser capture flow fails before markdown can be produced, try the hosted https://defuddle.md/\x3Curl> API and save its markdown output directly
  5. The legacy fallback path uses the older Readability/selector/Next.js-data based HTML-to-Markdown implementation recovered from git history

CLI output will show:

  • Converter: defuddle when Defuddle succeeds
  • Converter: legacy:... plus Fallback used: ... when fallback was needed
  • Converter: defuddle-api when local browser capture failed and the hosted API was used instead

Media Download Workflow

Based on download_media setting in EXTEND.md:

Setting Behavior
1 (always) Run script with --download-media flag
0 (never) Run script without --download-media flag
ask (default) Follow the ask-each-time flow below

Ask-Each-Time Flow

  1. Run script without --download-media → markdown saved
  2. Check saved markdown for remote media URLs (https:// in image/video links)
  3. If no remote media found → done, no prompt needed
  4. If remote media found → use AskUserQuestion:
    • header: "Media", question: "Download N images/videos to local files?"
    • "Yes" — Download to local directories
    • "No" — Keep remote URLs
  5. If user confirms → run script again with --download-media (overwrites markdown with localized links)

Environment Variables

Variable Description
URL_CHROME_PATH Custom Chrome executable path
URL_DATA_DIR Custom data directory
URL_CHROME_PROFILE_DIR Custom Chrome profile directory

Troubleshooting: Chrome not found → set URL_CHROME_PATH. Timeout → increase --timeout. Complex pages → try --wait mode. If markdown quality is poor, inspect the saved -captured.html and check whether the run logged a legacy fallback.

YouTube Notes

  • The upgraded Defuddle path uses async extractors, so YouTube pages can include transcript text directly in the markdown body.
  • Transcript availability depends on YouTube exposing a caption track. Videos with captions disabled, restricted playback, or blocked regional access may still produce description-only output.
  • If the page needs time to finish loading descriptions, chapters, or player metadata, prefer --wait and capture after the watch page is fully hydrated.

Hosted API Fallback

  • The hosted fallback endpoint is https://defuddle.md/\x3Curl>. In shell form: curl https://defuddle.md/stephango.com
  • Use it only when the local Chrome/CDP capture path fails outright. The local path still has higher fidelity because it can save the captured HTML and handle authenticated pages.
  • The hosted API already returns Markdown with YAML frontmatter, so save that response as-is and then apply the normal media-localization step if requested.

Extension Support

Custom configurations via EXTEND.md. See Preferences section for paths and supported options.

Usage Guidance
This skill will execute the included TypeScript code locally, launch or attach to a Chrome instance, fetch arbitrary URLs, optionally download linked media, and write markdown and HTML snapshot files to your filesystem (project dirs or ~/.baoyu-skills). It will fall back to the public defuddle.md service if local capture fails, which exposes the requested URL to that service. Before installing or running: (1) review the included scripts if you don't trust them; (2) be aware it will create EXTEND.md in your project or home and will download media into imgs/ or videos/ subfolders; (3) run it in a sandbox or with a dedicated Chrome profile if you want to avoid attaching to your personal browser; (4) ensure you want URLs you request to potentially be sent to defuddle.md on failure. Note: SKILL.md requires the agent to prompt the user (AskUserQuestion) before creating EXTEND.md — that flow depends on the agent following SKILL.md (the code does not itself enforce the interactive block).
Capability Analysis
Type: OpenClaw Skill Name: baoyu-url-to-markdown-2 Version: 0.1.0 The skill bundle provides a robust utility for converting webpages to Markdown using Chrome CDP for rendering and the Defuddle library for content extraction. It features sophisticated handling for Shadow DOM, YouTube transcripts, and local media downloading. While it employs high-risk capabilities such as browser automation, local file system access, and an external API fallback (defuddle.md), these behaviors are well-documented in SKILL.md and are strictly aligned with the tool's stated purpose. The code (main.ts, cdp.ts) and agent instructions are transparent and do not exhibit signs of malicious intent or unauthorized data exfiltration.
Capability Assessment
Purpose & Capability
Name/description (URL → markdown via Chrome CDP) aligns with the included TypeScript scripts. Required runtime helpers (bun or npx) are plausible for running the scripts. The code legitimately needs to launch/attach to Chrome, read/write EXTEND.md and output files, fetch external resources, and optionally call defuddle.md — all coherent with an archiver/conversion tool.
Instruction Scope
SKILL.md correctly instructs the agent to locate the scripts, prefer bun/npx, and to prompt the user for preferences before creating EXTEND.md. The code itself performs file I/O (reading/writing EXTEND.md, writing markdown and captured HTML, downloading media) and network I/O (fetching pages/media and defuddle.md). These actions are within the stated purpose. One mismatch: SKILL.md mandates a blocking AskUserQuestion flow on first run, but the provided main.ts does not itself enforce that flow — it relies on the agent to follow SKILL.md. Agents must therefore obey the SKILL.md requirement to avoid silently creating EXTEND.md.
Install Mechanism
There is no install spec (instruction-only install), which means nothing will be automatically downloaded by the registry. The package includes source files and an internal vendor folder for baoyu-chrome-cdp; no external arbitrary URLs or archive downloads are present in the manifest. Expect runtime dependency resolution (bun/npx) when executing the scripts.
Credentials
The skill declares no required env vars but the code tolerates and uses common environment variables (HOME, XDG_CONFIG_HOME) and optionally URL_CHROME_PATH to override the Chrome path. Those are reasonable for locating profile/config and Chrome. The skill will read/write EXTEND.md in project or user config locations and will create output directories — behavior consistent with its purpose but worth noting because it writes into the user's filesystem. No unrelated credentials or secrets are requested.
Persistence & Privilege
always is false and the skill is user-invocable; it does not request permanent inclusion or modify other skills. It writes its own config (EXTEND.md) and output files, and may launch or kill Chrome instances — this is appropriate for a browser-capture utility. Autonomous invocation is allowed (platform default) but does not combine with other red flags here.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install baoyu-url-to-markdown-2
  3. After installation, invoke the skill by name or use /baoyu-url-to-markdown-2
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.0
Initial release of baoyu-url-to-markdown-2 – fetches any URL, renders with Chrome CDP, and converts to markdown with advanced pipeline and robust fallback. - Saves rendered HTML snapshots alongside markdown for archival. - Uses upgraded Defuddle pipeline for better web-component handling and YouTube transcript extraction; automatically falls back to legacy conversion if needed. - Supports both auto-capture and user-wait modes for handling logins and dynamic content. - Requires explicit user setup for preferences (media downloading, output location) via a blocking first-time setup using EXTEND.md. - Allows local browser capture with fallback to hosted API if needed; can download images and videos. - Detailed CLI and configuration instructions available in SKILL.md.
Metadata
Slug baoyu-url-to-markdown-2
Version 0.1.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Baoyu Url To Markdown?

Fetch any URL and convert to markdown using Chrome CDP. Saves the rendered HTML snapshot alongside the markdown, uses an upgraded Defuddle pipeline with bett... It is an AI Agent Skill for Claude Code / OpenClaw, with 246 downloads so far.

How do I install Baoyu Url To Markdown?

Run "/install baoyu-url-to-markdown-2" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Baoyu Url To Markdown free?

Yes, Baoyu Url To Markdown is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Baoyu Url To Markdown support?

Baoyu Url To Markdown is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Baoyu Url To Markdown?

It is built and maintained by nengnengZ (@nengnengz); the current version is v0.1.0.

💬 Comments