Description

使用 Cloudflare Workers AI 生成图片或语音。触发条件： - 文生图："生成图片"、"文生图"、"text-to-image"、"AI 作图"、"帮我画" - TTS："文字转语音"、"TTS"、"读出来"、"语音合成"、"text-to-speech

README (SKILL.md)

Cloudflare Workers AI — 图片 & 语音生成

Name: cloudflare-media
Author: n0nsense11

凭证配置

优先从 skills/cloudflare-media/config.json 或 MEMORY.md 读取 Account ID 和 API Token，缺失则询问用户。

第一部分：文生图（Text-to-Image）

可选模型一览（共10个）

#	模型	模型 ID	简介	价格	传输方式
1	FLUX.2 klein 4B	`@cf/black-forest-labs/flux-2-klein-4b`	高速蒸馏版，4B参数，实时预览	$0.000059/tile	multipart
2	FLUX.2 klein 9B	`@cf/black-forest-labs/flux-2-klein-9b`	增强质量版，9B参数	$0.015/first MP	multipart
3	FLUX.2 dev	`@cf/black-forest-labs/flux-2-dev`	最高质量，开放权重	$0.00021/tile/step	multipart
4	FLUX.1 schnell	`@cf/black-forest-labs/flux-1-schnell`	12B参数，最快4步生成，适合批量	$0.000053/tile	JSON body
5	SDXL-Lightning	`@cf/bytedance/stable-diffusion-xl-lightning`	极快文生图，几步完成，Beta	$0.00/step	JSON body
6	DreamShaper 8 LCM	`@cf/lykon/dreamshaper-8-lcm`	强逼真写实风格，不牺牲创意范围	免费	JSON body
7	Leonardo Lucid Origin	`@cf/leonardo/lucid-origin`	强提示跟随，支持文字渲染	$0.007/tile	JSON body
8	Leonardo Phoenix 1.0	`@cf/leonardo/phoenix-1.0`	最佳文字生成，提示 adherence 最强	$0.0058/tile	JSON body

1. FLUX.2 klein 4B / 9B / dev（Black Forest Labs）

特点： 高速/高质量/最高质量三档，multipart/form-data 传输

参数	必填	默认	说明
prompt	✅	—	图片描述文本
width	❌	1024	宽度 256~1024（64倍数）
height	❌	1024	高度 256~1024（64倍数）
steps	❌	—	步数（参考值25）

curl -X POST "https://api.cloudflare.com/client/v4/accounts/{ACCOUNT}/ai/run/@cf/black-forest-labs/flux-2-klein-4b" \
  -H "Authorization: Bearer {TOKEN}" \
  -F "prompt=a sunset over the ocean" \
  -F "width=1024" -F "height=1024"

返回： {"result":{"image":"base64..."}} → 保存为 .png

2. FLUX.1 schnell（Black Forest Labs）

特点： 12B 参数，极快（默认4步），适合批量生成，JSON body

参数	必填	默认	说明
prompt	✅	—	图片描述（最长2048字符）
steps	❌	4	步数（1~8，越高越慢）

curl -X POST "https://api.cloudflare.com/client/v4/accounts/{ACCOUNT}/ai/run/@cf/black-forest-labs/flux-1-schnell" \
  -H "Authorization: Bearer {TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"a cyberpunk cat","steps":4}'

返回： {"image":"base64..."} → 保存为 .jpg

3. SDXL-Lightning（ByteDance）Beta

特点： 极快几步生成，支持 img2img，输出为原始 JPEG 二进制流

参数	必填	默认	说明
prompt	✅	—	图片描述
negative_prompt	❌	—	反向提示词
width	❌	1024	宽度 256~2048
height	❌	1024	高度 256~2048
num_steps	❌	20	步数（1~20）
guidance	❌	7.5	提示跟随度
strength	❌	1	img2img 强度（0~1）
seed	❌	—	随机种子
image / image_b64	❌	—	img2img 输入图（数组或base64）
mask / mask_b64	❌	—	inpainting mask

curl -X POST "https://api.cloudflare.com/client/v4/accounts/{ACCOUNT}/ai/run/@cf/bytedance/stable-diffusion-xl-lightning" \
  -H "Authorization: Bearer {TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"a cyberpunk cat","num_steps":10}'

返回： 原始 JPEG 二进制流 → 保存为 .jpg

4. DreamShaper 8 LCM（lykon）

特点： 强逼真写实风格，LCM 加速，支持 img2img + inpainting，参数同 SDXL-Lightning

参数	必填	默认	说明
prompt	✅	—	图片描述
negative_prompt	❌	—	反向提示词
width	❌	1024	宽度 256~2048
height	❌	1024	高度 256~2048
num_steps	❌	20	步数（1~20）
guidance	❌	7.5	提示跟随度
strength	❌	1	img2img 强度（0~1）
seed	❌	—	随机种子
image / image_b64	❌	—	img2img 输入图
mask / mask_b64	❌	—	inpainting mask

curl -X POST "https://api.cloudflare.com/client/v4/accounts/{ACCOUNT}/ai/run/@cf/lykon/dreamshaper-8-lcm" \
  -H "Authorization: Bearer {TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"a realistic photo of a cat","num_steps":8}'

返回： 原始 JPEG 二进制流 → 保存为 .jpg

5. Leonardo Lucid Origin

参数	必填	默认	说明
prompt	✅	—	图片描述
width	❌	1120	宽度 0~2500
height	❌	1120	高度 0~2500
guidance	❌	4.5	提示跟随度（0~10）
num_steps	❌	—	步数（1~40）
seed	❌	—	随机种子

返回： {"result":{"image":"base64..."}} → 保存为 .png

6. Leonardo Phoenix 1.0

参数	必填	默认	说明
prompt	✅	—	图片描述
width	❌	1024	宽度 0~2048
height	❌	1024	高度 0~2048
guidance	❌	2	提示跟随度（2~10）
num_steps	❌	25	步数（1~50）
negative_prompt	❌	—	反向提示词
seed	❌	—	随机种子

返回： 原始 JPEG 二进制流 → 保存为 .jpg

第二部分：TTS（Text-to-Speech）

可选模型一览（共4个）

#	模型	模型 ID	简介	价格
1	Deepgram Aura-2 英语	`@cf/deepgram/aura-2-en`	40个声音，上下文感知，自然停顿表达	$0.03/1k字符
2	Deepgram Aura-2 西班牙语	`@cf/deepgram/aura-2-es`	同上，专为西班牙语优化	$0.03/1k字符
3	Deepgram Aura-1	`@cf/deepgram/aura-1`	12个声音，Aura-2 低配版，半价	$0.015/1k字符
4	MyShell MeloTTS	`@cf/myshell-ai/melotts`	多语言（en/es/fr/zh/ja/ko），费用最低	$0.0002/分钟

Deepgram Aura 声音列表

Aura-2（40个）： amalthea, andromeda, apollo, arcas, aries, asteria, athena, atlas, aurora, callista, cora, cordelia, delia, draco, electra, harmonia, helena, hera, hermes, hyperion, iris, janus, juno, jupiter, luna, mars, minerva, neptune, odysseus, ophelia, orion, orpheus, pandora, phoebe, pluto, saturn, thalia, theia, vesta, zeus

默认声音：luna（女声，温暖）

Aura-1（12个）： angus, asteria, arcas, orion, orpheus, athena, luna, zeus, perseus, helios, hera, stella

默认声音：angus（男声）

Deepgram Aura 参数

参数	必填	默认	说明
text	✅	—	要转语音的文本
speaker	❌	luna/angus	声音名称
encoding	❌	mp3	编码：linear16/flac/mulaw/alaw/mp3/opus/aac
sample_rate	❌	—	采样率（Hz）
bit_rate	❌	—	比特率（bps）

curl -X POST "https://api.cloudflare.com/client/v4/accounts/{ACCOUNT}/ai/run/@cf/deepgram/aura-2-en" \
  -H "Authorization: Bearer {TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello world","speaker":"luna"}'

返回： 原始 MP3 二进制流 → 保存为 .mp3

MyShell MeloTTS 参数

参数	必填	默认	说明
prompt	✅	—	要转语音的文本
lang	❌	en	语言：en/es/fr/zh/ja/ko

curl -X POST "https://api.cloudflare.com/client/v4/accounts/{ACCOUNT}/ai/run/@cf/myshell-ai/melotts" \
  -H "Authorization: Bearer {TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Hello world","lang":"en"}'

返回： {"result":{"audio":"base64..."}}（实测为 WAV 16bit 44100Hz PCM）→ 保存为 .wav

第三部分：交互流程

步骤一：检查凭证

读取 skills/cloudflare-media/config.json 或 MEMORY.md，缺失则询问用户。

步骤二：展示模型选项

用户提出请求后立即展示所有可选模型：

文生图（8个模型）：

🎨 文生图 — 可选模型（共8个）

1️⃣ FLUX.2 klein 4B（推荐快速预览）
   高速蒸馏版，4B参数，实时交互

2️⃣ FLUX.2 klein 9B
   增强质量版，9B参数，更细腻

3️⃣ FLUX.2 dev
   最高输出质量，开放权重

4️⃣ FLUX.1 schnell
   12B参数，极快（4步），适合批量

5️⃣ SDXL-Lightning（Beta）
   极快几步生成，ByteDance出品

6️⃣ DreamShaper 8 LCM
   强逼真写实风格

7️⃣ Leonardo Lucid Origin
   强提示跟随，支持文字渲染

8️⃣ Leonardo Phoenix 1.0
   最佳文字生成

请提供：模型编号（默认1）+ 图片描述 + 尺寸（可选）

TTS（4个模型）：

🔊 TTS — 可选模型（共4个）

1️⃣ Deepgram Aura-2 英语（推荐）
   40个声音，上下文感知，自然表达
   默认声音：luna

2️⃣ Deepgram Aura-2 西班牙语
   同上，专为西班牙语优化

3️⃣ Deepgram Aura-1
   12个声音，半价

4️⃣ MyShell MeloTTS
   多语言（en/es/fr/zh/ja/ko），最便宜

请提供：模型编号（默认1）+ 要朗读的文本 + 声音（可选）

步骤三：生成并发送

webchat：图片用 image 展示，音频用 tts 工具发送
其他 channel：走对应平台接口

第四部分：注意事项

免费额度：SDXL-Lightning / DreamShaper 均为 $0.00（免费），FLUX/Leonardo 按 tile 计费
输出格式实测：
- Deepgram Aura → 原始 MP3 二进制
- MeloTTS → base64 WAV（不是 MP3！）
- Phoenix 1.0 / SDXL-Lightning / DreamShaper → 原始 JPEG 二进制流
img2img/inpainting：SDXL-Lightning 和 DreamShaper 支持图生图，需要额外提供 image/mask 数据，skill 当前版本暂不支持图片输入作为参数传递

Usage Guidance

This skill appears to implement Cloudflare Workers AI calls for images and TTS, but its instructions expect your Cloudflare Account ID and API Token to be read from skills/cloudflare-media/config.json or MEMORY.md even though the registry declares no credentials. Before installing: - Do not store global or high-privilege secrets in MEMORY.md; inspect that file's contents first. Prefer providing a scoped API token at runtime rather than persisting it in shared memory. - Create a Cloudflare API token with the minimal scopes required (limit to AI/model usage or the least-privilege equivalent) and avoid using your full account key. - Review the skill's config file (skills/cloudflare-media/config.json) before allowing it to be read; keep credentials in a dedicated secret store if possible. - Be aware the skill is allowed to Exec and perform curl calls; avoid enabling autonomous/always-on behavior if you don't want it to invoke without each explicit approval. If you want a stronger assurance, ask the skill author to (1) declare required credentials in registry metadata, (2) avoid reading MEMORY.md, and (3) document exact token scopes needed. If the author cannot provide that, treat the skill as higher-risk and only use with ephemeral, scoped tokens.

Capability Assessment

⚠ Purpose & Capability

Name/description claim Cloudflare Workers AI for image and TTS generation and the SKILL.md contains curl examples against Cloudflare AI endpoints — that's coherent. However, the skill's registry metadata lists no required env vars or primary credential, while the runtime instructions explicitly expect an Account ID and API Token (from skills/cloudflare-media/config.json or MEMORY.md, or by prompting the user). That mismatch (declared no credentials vs instructions needing credentials) is an inconsistency.

⚠ Instruction Scope

The SKILL.md instructs the agent to read credentials from skills/cloudflare-media/config.json and MEMORY.md (local files) and to run curl/exec commands. Asking to read MEMORY.md is notable because memory may contain unrelated secrets or context; instructions therefore expand scope beyond the stated API usage and grant the skill broad discretion to access local agent state.

✓ Install Mechanism

This is an instruction-only skill with no install spec and no code files, so it doesn't write or download artifacts during installation — low install-surface risk.

⚠ Credentials

The skill needs a Cloudflare Account ID and API Token according to SKILL.md, but none are declared in the registry metadata (requires.env/primary credential). It also prefers reading local files for credentials (skills/.../config.json and MEMORY.md). Requiring an account-level token is reasonable for Cloudflare API calls, but the lack of declaration plus the instruction to read MEMORY.md is disproportionate and could expose other secrets if MEMORY.md contains them.

ℹ Persistence & Privilege

always:false (good). The skill requests permissions to Read/Write/Edit/Exec in allowed-tools, which is consistent with running curl and saving outputs, but Exec + Read access means it can run commands and read local files when invoked. That capability is expected for a runtime that shells out, but it raises risk if you permit the skill broad autonomous use or store sensitive data in the referenced files.

Version History

v1.0.0

Cloudflare-media 2.2.0 大幅升级，支持多模型文生图与 TTS 语音生成： - 全面整理和扩展图片生成（文生图）与文本转语音（TTS）模型清单，详细参数、用法、费用说明。 - 支持 8 个主流文生图模型（包括 FLUX、SDXL-Lightning、DreamShaper、Leonardo Phoenix/Lucid Origin）。 - 支持 4 个 TTS 模型（Deepgram Aura-2/Aura-1、MyShell MeloTTS），含多语种与声音选择。 - 丰富用户交互指引：请求后自动展示模型、参数和补充说明。 - 明确凭证读取逻辑（config.json 或 MEMORY.md），缺失时主动提醒用户补充。 - 输出格式、适用渠道等使用细节详细说明。

Metadata

Slug cloudflare-media

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is cloudflare-media?

使用 Cloudflare Workers AI 生成图片或语音。触发条件： - 文生图："生成图片"、"文生图"、"text-to-image"、"AI 作图"、"帮我画" - TTS："文字转语音"、"TTS"、"读出来"、"语音合成"、"text-to-speech. It is an AI Agent Skill for Claude Code / OpenClaw, with 95 downloads so far.

How do I install cloudflare-media?

Run "/install cloudflare-media" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is cloudflare-media free?

Yes, cloudflare-media is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does cloudflare-media support?

cloudflare-media is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created cloudflare-media?

It is built and maintained by n0nsense (@n0nsense11); the current version is v1.0.0.

More Skills

cloudflare-media

Cloudflare Workers AI — 图片 & 语音生成

凭证配置

第一部分：文生图（Text-to-Image）

可选模型一览（共10个）

1. FLUX.2 klein 4B / 9B / dev（Black Forest Labs）

2. FLUX.1 schnell（Black Forest Labs）

3. SDXL-Lightning（ByteDance）Beta

4. DreamShaper 8 LCM（lykon）

5. Leonardo Lucid Origin

6. Leonardo Phoenix 1.0

第二部分：TTS（Text-to-Speech）

可选模型一览（共4个）

Deepgram Aura 声音列表

Deepgram Aura 参数

MyShell MeloTTS 参数

第三部分：交互流程

步骤一：检查凭证

步骤二：展示模型选项

步骤三：生成并发送

第四部分：注意事项

What is cloudflare-media?

How do I install cloudflare-media?

Is cloudflare-media free?

Which platforms does cloudflare-media support?

Who created cloudflare-media?

💬 Comments