功能描述

Generate and edit images from the CLI using picture-it. Use this skill whenever the user asks to create, edit, or manipulate images — blog headers, social ca...

使用说明 (SKILL.md)

picture-it

Name: Picture it!
Author: geongeorge

Photoshop for AI agents. Composable image operations from the CLI.

Source: https://github.com/geongeorge/picture-it | npm: https://www.npmjs.com/package/picture-it

Prerequisites

picture-it must be installed and configured. Requires Node.js 18+.

# Install (pick one)
npm install -g picture-it
pnpm add -g picture-it
bun install -g picture-it

# Setup
picture-it download-fonts

Credentials

The FAL API key is required for AI operations (generate, edit, remove-bg, upscale). Set it via environment variable or the CLI:

# Option 1: Environment variable (preferred — use platform-managed secrets)
export FAL_KEY=your-key-here

# Option 2: CLI config (stored in ~/.picture-it/config.json with 0600 permissions)
picture-it auth --fal \x3Cfal-api-key>

NEVER paste API keys into chat. Always use environment variables or the CLI auth command. Get a FAL key from https://fal.ai.

Note: User images are uploaded to fal.ai for AI processing when using generate, edit, remove-bg, or upscale commands. Local-only commands (crop, grade, grain, vignette, text, compose, template, info) do not transmit data.

Core Concept

Every command takes an image in and outputs an image. Chain them to build anything. The agent calling picture-it IS the planner — there is no AI planner inside the tool.

Before You Generate Anything — Think First

Image generation costs real money ($0.03–$0.15 per FAL call). A 4-pass workflow is $0.10+. Don't burn budget on a vague idea — spend time planning before running any commands.

Step 1: Understand the purpose

Before touching picture-it, get full clarity on what the user wants. Ask yourself:

What is this image for? (blog header, Instagram ad, YouTube thumbnail, product comparison, poster)
Who is the audience? (developers, consumers, enterprise buyers)
What should someone FEEL when they see it? (excitement, trust, urgency, curiosity)
What's the one message? Every good image communicates exactly one thing.
Where will it be displayed? This determines size, text sizing, and composition rules.

If any of these are unclear, ask the user before proceeding. A 30-second question saves $0.15 in wasted generation.

Step 2: Plan the composition

Think through at least 3 different approaches before picking one. Consider:

Can this be done without FAL? Templates and Satori compose are free. A solid gradient + good typography is often enough.
What's the minimum number of FAL calls? Each call costs money. Plan the fewest passes that achieve the goal.
Which technique fits? Text-behind-subject for thumbnails, remove-bg + compose for product photos, multi-pass for cinematic scenes.

Present your top 2-3 ideas to the user briefly — one sentence each — and let them pick before generating. Example:

"Here are a few directions:

Dramatic product shot — generate a dark stage, edit to place your logo as a glowing 3D object ($0.07)

Clean comparison — remove-bg from both products, compose on gradient with text ($0.01)

Text-behind-subject — generate an action scene, edit to weave the title behind the subject ($0.07)

Which direction, or a mix?"

Step 3: Plan the pipeline

Before running the first command, write out the full pipeline:

1. generate (flux-dev $0.03) — dark stage scene
2. edit (seedream $0.04) — place logo into scene
3. compose (free) — add text overlay
4. grade + vignette (free) — post-process
Total: ~$0.07

This avoids discovering mid-way that you need a different approach and wasting the earlier calls.

Commands Quick Reference

Command	What it does	Needs FAL?
`generate`	Create image from text prompt	Yes
`edit`	Edit image(s) with AI	Yes
`remove-bg`	Remove background	Yes
`replace-bg`	Remove bg + generate new one	Yes
`crop`	Resize/crop to exact dimensions	No
`grade`	Apply color grading	No
`grain`	Add film grain	No
`vignette`	Add edge darkening	No
`text`	Render text onto image (Satori)	No
`compose`	Overlay images/text/shapes from JSON	No
`template`	Built-in templates (no AI)	No
`info`	Analyze image dimensions/colors	No

Model Selection

Choose the right model for the job — don't overspend.

Generation (no input images):

flux-schnell ($0.003) — Default. Fast, good quality. Use for backgrounds and base scenes.
flux-dev ($0.03) — Better quality. Use for hero images, portraits, detailed scenes where quality matters.

Editing (with input images):

seedream ($0.04) — Default. Good for compositing multiple images, placing objects in scenes, adding text. Handles up to 10 inputs.
banana2 ($0.08) — Better image preservation. Use when you need the input image to stay more faithful, or >10 inputs.
banana-pro ($0.15) — Best quality, best text rendering. Use for premium work, complex edits, character consistency.

Background removal:

bria (default) — Best edge quality, clean cutouts
birefnet — Good general purpose
pixelcut — Alternative
rembg — Cheapest

How to Write Good Prompts

This is the difference between mediocre and professional output. Read references/prompt-library.md for a full library of tested prompts you can copy and adapt. Key rules:

For generation: Be specific about lighting ("dramatic side lighting from upper right"), camera ("shot on Canon R5 70-200mm f2.8"), and atmosphere ("dust particles visible in the light beam"). Vague prompts produce generic results.

For text-behind-subject: The key phrase is: "Add '[TEXT]' in large bold [color] letters BEHIND the [subject] — the [subject's] body overlaps and partially covers the letters." Without "BEHIND" and the occlusion instruction, the text floats on top.

For edits: Always end with "Keep everything else exactly the same" and list what to preserve. Without this, the AI changes things you didn't want changed.

For background replacement: Use realistic, specific locations ("modern upscale mall entrance during daytime, natural warm daylight"). Over-dramatic backgrounds ("city at night with neon reflections") look obviously fake.

Typography

For big titles and hero text: Use the FAL model via edit — it handles large text well and integrates it into the scene naturally. No font size math needed, just say "very large bold" in the prompt.

For precise small text (credits, URLs, badges, coverlines): Use compose or text with Satori. This is where font sizing matters — images display much smaller on phones. Quick rule: on a 1080px Instagram image, nothing under 36px is readable. Run picture-it download-fonts first if fonts aren't installed.

Hierarchy: Max 3 text sizes per image. Brand name should be larger than tagline.

Font pairing: Serif + sans-serif works best. For FAL model text, just describe the style in the prompt. For Satori, 3 fonts are bundled — drop more .ttf files into ~/.picture-it/fonts/. Run picture-it download-fonts if fonts aren't installed. See references/composition-guide.md for pairing suggestions.

Composition Techniques

Read references/composition-guide.md for detailed multi-pass workflows, product photography, magazine covers, and overlay composition.

Common Workflows

Simple: Generate an image

picture-it generate --prompt "dark cosmic background with nebula" --size 1200x630 -o bg.png

Simple: Add text to an image

picture-it text -i bg.png --title "Hello World" --font "Space Grotesk" --color white --font-size 64 -o hero.png

Medium: Blog header with AI background + text

picture-it generate --prompt "abstract dark tech background" --size 1200x630 -o bg.png
picture-it text -i bg.png --title "My Blog Post" --font "DM Serif Display" --font-size 72 -o header.png
picture-it grade -i header.png --name cinematic -o header-graded.png

Medium: Edit a photo background

picture-it edit -i photo.jpg --prompt "replace background with modern hotel entrance, keep subject identical" --model banana-pro -o edited.jpg

Advanced: Text behind subject (YouTube thumbnail style)

# 1. Generate a scene
picture-it generate --prompt "runner on mountain trail at golden hour" --model flux-dev --size 1280x720 -o runner.png

# 2. Use FAL edit to add text BEHIND the subject
picture-it edit -i runner.png --prompt "Add 'RUN FASTER' in large bold black letters BEHIND the runner — the runner's body overlaps the text" --model seedream -o thumbnail.png

Advanced: Product comparison with real photos

# 1. Remove backgrounds from product photos
picture-it remove-bg -i product-a.png --model bria -o a-cutout.png
picture-it remove-bg -i product-b.png --model bria -o b-cutout.png

# 2. Generate a background
picture-it generate --prompt "split gradient, blue left to orange right" --size 1200x630 -o bg.png

# 3. Compose cutouts onto background with text
picture-it compose -i bg.png --overlays overlays.json -o comparison.png

Advanced: Multi-pass cinematic composition

# 1. Generate base scene
picture-it generate --prompt "dark stage with green spotlight" --model flux-dev --size 2048x1080 -o stage.png

# 2. Edit scene to place objects
picture-it edit -i stage.png -i logo.png --prompt "Place Figure 2 as glowing 3D cube in the spotlight" --model seedream -o composed.png

# 3. Post-process
picture-it crop -i composed.png --size 1200x630 --position attention -o cropped.png
picture-it grade -i cropped.png --name cinematic -o graded.png
picture-it vignette -i graded.png --opacity 0.3 -o final.png

Platform Presets

Use --platform \x3Cname> with generate or crop:

Preset	Size
`blog-featured`	1200x630
`og-image`	1200x630
`youtube-thumbnail`	1280x720
`instagram-square`	1080x1080
`instagram-story`	1080x1920
`twitter-header`	1500x500

Output Behavior

stdout: only the output file path
stderr: progress logs
Exit 0 on success, Exit 1 on failure

Read stdout to get the file path. This is how you chain commands.

Gotchas

Always use --model bria for remove-bg — the default birefnet leaves rectangular artifacts that cause ugly glow/shadow halos when compositing.
The glow effect in compose mode blurs the entire rectangular buffer, not the shape. Avoid using glow on cutout images — use the background color/lighting to create the glow effect instead.
The shadow effect has the same rectangular artifact issue. For cutout images on clean backgrounds, skip shadows entirely.
When editing with FAL, the model may alter product details (logos, text, design elements). For product images where accuracy matters, use remove-bg + compose instead of edit to preserve the original exactly.
SeedDream takes ~60 seconds per generation. Don't assume it failed if it's slow.
For edit with banana-pro, don't pass resolution or limit_generations params — it auto-detects.
Always crop to exact dimensions after FAL generation — FAL models output approximate sizes.
Use flux-dev ($0.03) not flux-schnell ($0.003) when image quality matters (hero images, portraits). The quality difference is significant.
Satori does NOT support: display:grid, transforms, animations, box-shadow, filters. Use flexbox only.
When adding text behind a subject with edit, be very explicit in the prompt: "the text is BEHIND the subject — the subject's body overlaps and partially covers the letters."

安全使用建议

This skill appears to do what it says: it runs the picture-it CLI and uses your FAL API key to call fal.ai for generation/editing. Before installing or using it: 1) Confirm the npm package and GitHub repo (https://github.com/geongeorge/picture-it and npm package name) to ensure the code is what you expect; review the package source if you plan to install globally. 2) Understand that generate/edit/remove-bg/upscale will upload user images and prompts to fal.ai and will incur costs; read fal.ai's privacy/retention policy and consider using an API key with limited scope or an expendable key. 3) Prefer storing FAL_KEY in your platform's secret manager rather than pasting into chat; if using CLI auth, the config file is stored at ~/.picture-it/config.json (SKILL.md recommends 0600). 4) Because SKILL.md suggests npm install -g, be aware npm installs can run install scripts — consider installing in an isolated environment or reviewing package scripts first. 5) If you need stricter guarantees about data residency or non-exfiltration, do not use the FAL-backed commands; local-only commands (crop, grade, compose, text) run offline. Overall: coherent and expected behavior, but verify package origin and accept that image uploads and billing are part of its operation.

功能分析

Type: OpenClaw Skill Name: picture-it Version: 1.0.5 The 'picture-it' skill bundle is a well-documented tool for image generation and editing using the fal.ai API. It provides clear instructions for the AI agent, including explicit security warnings against sharing API keys in chat and guidance on cost management. The bundle focuses entirely on its stated purpose, with no evidence of malicious intent, data exfiltration, or unauthorized command execution.

能力标签

cryptocan-make-purchases

能力评估

✓ Purpose & Capability

Name/description match the declared requirements: picture-it CLI and Node are required and FAL_KEY is needed for AI-backed operations. The declared config path (~/.picture-it/config.json) and FAL network usage align with the stated purpose.

ℹ Instruction Scope

SKILL.md stays on-task and gives detailed, prescriptive CLI workflows. It explicitly documents which commands send images to fal.ai (generate, edit, remove-bg, upscale) and which are local-only. This is appropriate, but important: user images and prompts will be uploaded to fal.ai by those commands and cost money. No instructions appear to request unrelated system files or credentials.

ℹ Install Mechanism

The skill bundle itself is instruction-only (no install spec in registry), which is low-risk. The SKILL.md recommends installing picture-it via npm (public registry). That is a normal install path, but npm packages execute code on install — users should verify the package and GitHub source before installing globally. Minor inconsistency: registry metadata listed 'No install spec' while SKILL.md includes an 'openclaw.install' block recommending npm.

✓ Credentials

Only one credential is requested (FAL_KEY), which is proportionate for a tool that calls fal.ai. The skill documents using either environment variable or CLI config (~/.picture-it/config.json with 0600). There are no unrelated secrets requested.

✓ Persistence & Privilege

always:false and standard agent invocation settings are used. The skill does not request system-wide modifications or other skills' credentials. Storing auth in the tool's own config file is expected behavior.

版本历史

v1.0.5

v1.0.5 - Documentation updated with strong guidance to think, clarify user intent, and minimize costs before generating images. - Added new "Before You Generate Anything — Think First" section to help users plan and select the most efficient workflow. - Expanded recommendations for asking clarifying questions and providing the user 2–3 creative direction options before proceeding. - Emphasized pipeline planning and FAL call cost awareness to reduce waste. - No code or feature changes; documentation improvements only.

v1.0.4

- Added prompt library: Included references/prompt-library.md with a collection of tested image generation and editing prompts. - Documentation update: SKILL.md now references the prompt library and provides concise rules for writing effective prompts. - Enhanced prompt guidance: Added new section in SKILL.md on prompt-writing for better results in generation, background replacement, text-behind-subject, and editing tasks. - No code or command changes; documentation and prompt resources only.

v1.0.3

- Updated skill metadata to version 0.2.1. - Added detailed OpenClaw configuration for environment variables, binaries, and install instructions. - Removed legacy version field and replaced with updated metadata structure. - No other visible functional or documentation changes.

v1.0.2

- Compatibility updated: Now requires Node.js 18+ and the picture-it CLI, instead of Bun. - `bun` removed as a required dependency; replaced with `node`. - Install instructions now default to `npm`, with `bun` as an alternative. - Compatibility and prerequisites sections clarified for Node.js usage. - No changes to features, commands, or workflows.

v1.0.1

**picture-it 1.0.1 Changelog** - Added compatibility and privacy information, including required dependencies (Bun 1.3+, picture-it CLI, FAL_KEY) and what data is sent to fal.ai. - Updated installation instructions to include Bun, pnpm, and npm options. - Clarified credential management: recommend environment variable for FAL key; warn not to paste API keys into chat. - Included license, author, and homepage/source links in the metadata. - Specified which commands transmit user images to fal.ai and which operate locally. - No functional or CLI command changes; documentation improvements only.

v1.0.0

First version

元数据

Slug picture-it

版本 1.0.5

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 6

常见问题

Picture it! 是什么？

Generate and edit images from the CLI using picture-it. Use this skill whenever the user asks to create, edit, or manipulate images — blog headers, social ca... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 163 次。

如何安装 Picture it!？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install picture-it」即可一键安装，无需额外配置。

Picture it! 是免费的吗？

是的，Picture it! 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Picture it! 支持哪些平台？

Picture it! 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Picture it!？

由 Geon George（@geongeorge）开发并维护，当前版本 v1.0.5。

Picture it!