Description

Extract and fill PDF AcroForm fields with a multi-backend fallback chain. Reads field schemas (text inputs, checkboxes, radio buttons, dropdowns, multi-line...

README (SKILL.md)

pdf-filler

Name: Pdf Filler
Author: qubit999

Operate on PDF AcroForms: list every field with its type and current value, then fill the PDF with values supplied as JSON. The skill calls a small Python package (oc-pdf-filler) that wraps a fallback chain of PDF libraries, so a single recalcitrant PDF doesn't block the workflow.

Workspace rules (read this first)

Many agent hosts run inside a sandbox that only allows reads/writes inside a specific workspace folder. Files written outside that folder show up as "Unavailable / Outside allowed folders" and the user can't download them.

The CLI enforces this for you:

The "workspace" is resolved from the first set environment variable in this list, falling back to the current working directory: OC_PDF_FILLER_WORKSPACE, OPENCLAW_WORKSPACE, CLAWHUB_WORKSPACE, AGENT_WORKSPACE, SKILL_WORKSPACE, WORKSPACE.
All output paths (schema JSON and filled PDF) are resolved relative to the workspace.
If you pass an absolute --output that points outside the workspace, the CLI rewrites it to the same basename inside the workspace (and prints a warning to stderr).
You can override the workspace explicitly with --workspace DIR.

In practice: pass relative paths (e.g. -o form_done.pdf), or omit --output entirely. The default is \x3Cinput-stem>_done.pdf inside the workspace.

When to use

Trigger this skill when the user:

Asks to inspect, list, or extract the form fields of a PDF
Wants to fill out / populate / complete a PDF form programmatically
Mentions AcroForm, checkbox, radio button, or dropdown handling in a PDF
Has a batch of PDFs to fill from structured data (JSON)

Setup (once per workspace)

The skill scripts call the oc-pdf-filler Python package. Install it first:

pip install "oc-pdf-filler[all]"
# or, if working from the source repo: pip install -e ".[all]"

The [all] extra pulls in pdfrw and PyMuPDF for the full fallback chain. Install pdftk from your package manager for the last-resort backend (optional, but useful for stubborn PDFs).

Verify which backends are active:

python scripts/list_backends.py

Step 1: Extract the field schema

Always extract first so you know the exact field names and types before constructing the JSON values file. Use a workspace-relative path for the output (the CLI confines it to the workspace automatically).

python scripts/extract.py /path/to/form.pdf --output schema.json --include-values

Each entry in the resulting JSON has:

name: the AcroForm field name (use this verbatim as the key when filling)
type: one of text, checkbox, radio, choice, signature, pushbutton, unknown
options: for radios and checkboxes, the accepted export values; for choices, the dropdown options
value: current value if the form is partially filled (only when --include-values is set)
max_length, multiline, required, read_only: hints for validation

See references/FIELD_TYPES.md for the value contract per field type.

Step 2: Build a values JSON file

The fill input is a flat JSON object { "FieldName": value }. Example:

{
  "Name Verantwortlicher": "ACME GmbH",
  "Postleitzahl Verantwortlicher": "10115",
  "Beschäftigte": true,
  "Verarbeitungstyp": "Automatisiert"
}

A starter template is included at assets/values.example.json.

Critical: include every checkbox and radio explicitly

LLMs tend to omit fields they're unsure about, which silently leaves checkboxes unchecked in the output PDF. Don't do that. For every field in the schema:

Checkbox (type: checkbox): set true or false. If the user didn't mention it, default to false rather than omitting the key.
Radio (type: radio): set the exact export string from options. If the user didn't pick one, leave it out only if it's truly optional; otherwise ask the user or pick the most plausible value.
Text / choice / signature: omit only if the field is genuinely blank.

If you are unsure for a checkbox, choose false, not omission. The CLI's unset_checkboxes and unset_radios summary fields tell you which fields were left out so you can self-correct on the next pass. As safety nets you can pass:

--default-unset-checkboxes off to force every untouched checkbox to false in one go.
--default-unset-radios first to pick the first available option for every untouched radio group.

Step 3: Fill the PDF

Omit --output to get the recommended default \x3Cinput-stem>_done.pdf inside the workspace, or pass a workspace-relative filename. Absolute paths outside the workspace are automatically rewritten into it (the host's sandbox would reject them otherwise).

# preferred: keep the original name with _done suffix, in the workspace
python scripts/fill.py /path/to/form.pdf values.json
# or pass an explicit relative path
python scripts/fill.py /path/to/form.pdf values.json --output form_done.pdf

By default the orchestrator uses --backend auto, walking the chain pypdf -> pdfrw -> PyMuPDF -> pdftk and stopping at the first backend that fills every field.

Useful flags:

--backend pymupdf -- force a specific backend (e.g. when the auto winner produces a PDF that doesn't render correctly in your viewer)
--best-effort -- chain backends so partial fills accumulate (use when no single backend handles every field)
--flatten -- bake values into the PDF so they can't be edited (best support: PyMuPDF, pdftk)
--strict -- exit non-zero if any requested field is missing or unfillable

The script prints a JSON summary including winning_backend, workspace, output_path (absolute path of the resulting PDF, always inside the workspace), filled, missing, failed, unset_checkboxes, unset_radios, and per-attempt details. If filling fails, see references/BACKENDS.md for backend-specific troubleshooting tips.

Delivering the result to the user

Send the file back automatically as soon as the fill succeeds. Do not wait for the user to ask. Users on chat platforms (Telegram, Slack, Discord, Teams, the ClawHub web client, etc.) expect the finished PDF to arrive in the conversation as an attachment immediately after you announce success.

Procedure:

Read output_path from the fill summary. It is guaranteed to be inside the workspace, so the host can attach it.
Use whatever attachment / file-return mechanism the host provides — Telegram bot sendDocument, Slack files.upload, the agent runtime's own attach_file / send_file tool, or simply emit it as a workspace artifact. Do not ask the user "do you want me to send it?"; just send it.
In the same turn, briefly confirm what was sent (filename + filled-field count). The user shouldn't have to ask twice.
If you genuinely cannot find an attachment channel, surface the workspace-relative path so the user can fetch it manually — but treat that as a last resort.

Never write the PDF outside the workspace (e.g. /tmp, /var, your home directory). Sandboxed hosts will mark it as "Unavailable / Outside allowed folders".

End-to-end example

python scripts/extract.py form.pdf -o schema.json
# ... agent inspects schema.json, builds values.json based on user input ...
python scripts/fill.py form.pdf values.json
# writes \x3Cworkspace>/form_done.pdf

After filling, re-run extract.py --include-values form_done.pdf and confirm the values stuck before delivering the PDF to the user.

Notes and edge cases

Field names may contain spaces, German umlauts, or punctuation. Always copy them verbatim from extract.py output.
For radio groups, set the value to the export name of the chosen option (one of the strings in options), not a boolean.
Signature fields (type: signature) are reported but not auto-filled.
Encrypted PDFs are out of scope; the tool will surface the underlying library error.
Some PDF viewers cache appearance streams; if a viewer shows blank fields after filling, try opening with a different viewer or use --flatten.

Usage Guidance

Before installing, verify you trust the `oc-pdf-filler` package and its optional dependencies. When using the skill, process only PDFs you intend to share with the agent, treat extracted schemas as potentially sensitive if they include current values, and review the filled PDF carefully before submitting or signing it.

Capability Analysis

Type: OpenClaw Skill Name: pdf-filler Version: 0.1.5 The pdf-filler skill is a utility for extracting and filling PDF AcroForms using a fallback chain of standard Python libraries (pypdf, PyMuPDF, etc.). The scripts (extract.py, fill.py, list_backends.py) are simple wrappers for the oc_pdf_filler package and do not contain any suspicious logic. The SKILL.md instructions include proactive security measures, such as enforcing workspace boundaries to prevent path traversal and ensuring the agent only operates within sandboxed directories.

Capability Assessment

✓ Purpose & Capability

The artifacts consistently describe extracting and filling PDF AcroForm fields, and the included scripts only dispatch extract, fill, and backend-listing commands to the declared PDF filler package.

ℹ Instruction Scope

The workflow is user-directed, but the instructions allow defaulting or inferring missing checkbox/radio values, so generated forms should be reviewed before use.

ℹ Install Mechanism

There is no install spec, and setup asks the user/agent to install an unpinned external Python package with optional system dependencies; this is disclosed and central to the purpose but not fully reviewable from the supplied artifacts.

ℹ Credentials

The skill reads user-specified PDFs and JSON and writes schema/filled-PDF outputs, with documented workspace output confinement; extracted schemas may include existing form values when requested.

✓ Persistence & Privilege

No credentials, privileged account access, background persistence, or autonomous long-running behavior is shown in the artifacts.

Version History

v0.1.5

pdf-filler 0.1.5 - Updated version metadata and documentation to 0.1.5 in SKILL.md. - Documentation improvements: clarified handling for unset checkboxes and radios in values JSON, including new summary fields and CLI options. - Added references to new CLI options: --default-unset-radios (pick default radio selections) in addition to --default-unset-checkboxes. - No functional code changes; doc/metadata update only.

v0.1.4

Version 0.1.4 - Updated documentation to clarify that all checkbox and radio fields should be explicitly included in the input JSON, defaulting missing checkboxes to false. - Added instructions to immediately send the filled PDF file to the user as an attachment after a successful fill, without waiting for further confirmation. - Expanded workflow details explaining how unset checkboxes are reported and how to use relevant CLI flags for default behaviors. - Documentation now points out correct handling for workspace and attachment logic across platforms and agent hosts. - Bumped metadata version to 0.1.4.

v0.1.3

- Enforces workspace-relative paths for all output files, preventing files from being written outside the allowed workspace directory. - Adds a new section describing workspace rules, including environment variable overrides and automatic path rewriting. - Updates instructions and examples to use workspace-relative paths. - Output file paths outside the workspace are now automatically rewritten to be inside the workspace, with warnings. - Fill summary now explicitly includes the resolved workspace and ensures output_path is always inside it.

v0.1.2

- Bumped version to 0.1.2. - Updated docs: If no `--output` is given, the fill script now writes to `./<input-stem>_done.pdf` by default, preserving original filenames for easier identification. - Adjusted end-to-end example and instructions to reflect new default output path behavior. - Clarified workspace output best practices and preferred file naming for host compatibility and user convenience. - No changes to code or features beyond documentation.

v0.1.1

- Updated author and homepage metadata. - Clarified instructions to always write output files (schema, values, filled PDF) inside the conversation working directory (e.g., ./), not /tmp or other external directories. - Emphasized the importance of placing output PDFs where the chat host can access and attach them for the user. - Added detailed guidance for reliably delivering output PDFs as chat attachments, including usage of the output_path field. - Minor documentation updates for clarity and host compatibility.

v0.1.0

- Initial release of pdf-filler: a tool to extract and fill AcroForm PDF fields using a robust multi-backend fallback approach. - Supports inspection and extraction of field schemas (types, values, options) from any AcroForm PDF. - Allows programmatic filling of form fields via user-provided JSON, supporting text, checkbox, radio, dropdown, and multi-line text fields. - Integrates several PDF-handling libraries (pypdf, pdfrw, PyMuPDF, pdftk) to maximize compatibility with stubborn PDFs. - Provides command-line scripts for schema extraction and batch filling, with options for backend selection, best-effort filling, flattening, and strict validation. - Includes comprehensive setup instructions and usage examples for PDF inspection and filling workflows.

Metadata

Slug pdf-filler

Version 0.1.5

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 6

Frequently Asked Questions

What is Pdf Filler?

Extract and fill PDF AcroForm fields with a multi-backend fallback chain. Reads field schemas (text inputs, checkboxes, radio buttons, dropdowns, multi-line... It is an AI Agent Skill for Claude Code / OpenClaw, with 55 downloads so far.

How do I install Pdf Filler?

Run "/install pdf-filler" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Pdf Filler free?

Yes, Pdf Filler is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Pdf Filler support?

Pdf Filler is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Pdf Filler?

It is built and maintained by qubit999 (@qubit999); the current version is v0.1.5.

More Skills

Pdf Filler