← 返回 Skills 市场
terrycarter1985

Multimodal Content Creator

作者 terrycarter1985 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
33
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install multimodal-content-creator
功能描述
Multimodal content creation workflow — receive WhatsApp messages (text or voice), transcribe audio via Whisper, generate images with DALL-E 3, and reply auto...
使用说明 (SKILL.md)

Multimodal Content Creator

A WhatsApp-powered content creation workflow that lets customers send text or voice messages and receive AI-generated images in return.

How It Works

  1. Receive a WhatsApp message (text or voice note)
  2. Transcribe voice notes using OpenAI Whisper
  3. Generate an image from the prompt using DALL-E 3
  4. Reply with the generated image back to the customer

Prerequisites

  • OpenAI API key set as OPENAI_API_KEY environment variable
  • WhatsApp CLI authentication (python wacli.py login \x3Ctoken>)

Usage

# Process all unread WhatsApp messages
python scripts/workflow.py process-all

# Generate a single image
python scripts/generate_images.py "a cat riding a skateboard"

# Batch generate from prompts file
python scripts/generate_images.py prompts.txt

# Transcribe an audio file
python scripts/transcribe.py recording.mp3

Files

  • scripts/workflow.py — Main orchestration script
  • scripts/generate_images.py — DALL-E 3 image generation
  • scripts/transcribe.py — Whisper audio transcription (with chunking for large files)
  • scripts/wacli.py — WhatsApp CLI client
安全使用建议
Review the package before installing. If you only want the WhatsApp/OpenAI content workflow, remove the unrelated AGENTS.md/SOUL.md/USER.md/HEARTBEAT.md files and the nested agent-browser skill, use dedicated OpenAI and WhatsApp credentials, and add a dry-run or approval step before sending real customer replies.
功能分析
Type: OpenClaw Skill Name: multimodal-content-creator Version: 1.0.0 The skill bundle provides a legitimate multimodal workflow for processing WhatsApp messages, transcribing audio via OpenAI Whisper, and generating images via DALL-E 3. The Python scripts (workflow.py, generate_images.py, transcribe.py) implement standard API integrations with basic input sanitization and error handling. The extensive documentation (AGENTS.md, SOUL.md, etc.) establishes a functional persona and safety boundaries for the OpenClaw agent, explicitly prohibiting data exfiltration and unauthorized external actions. No evidence of malicious intent, data theft, or harmful prompt injection was found.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
The core scripts match the stated WhatsApp → Whisper → DALL-E → reply workflow, but the package also includes broad agent persona, memory, heartbeat, and nested browser-automation skill files that are not explained by the content-creation purpose.
Instruction Scope
AGENTS.md and related workspace files instruct the agent to treat the folder as its home, maintain memory, be proactive, and commit/push changes, which is much broader than a user-invoked content workflow.
Install Mechanism
No automatic install script is shown, but the package includes Python requirements plus an unexpected nested agent-browser skill with separate global npm/Chromium install instructions.
Credentials
OpenAI and WhatsApp access are expected for the workflow, but customer-facing automatic replies and bundled unrelated agent-control instructions increase the impact beyond a simple generation utility.
Persistence & Privilege
The artifacts include local WhatsApp token storage and, separately, instructions for persistent MEMORY.md/daily memory files and heartbeat-based background activity.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install multimodal-content-creator
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /multimodal-content-creator 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release: WhatsApp → Whisper → DALL-E 3 → Reply workflow
元数据
Slug multimodal-content-creator
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Multimodal Content Creator 是什么?

Multimodal content creation workflow — receive WhatsApp messages (text or voice), transcribe audio via Whisper, generate images with DALL-E 3, and reply auto... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 33 次。

如何安装 Multimodal Content Creator?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install multimodal-content-creator」即可一键安装,无需额外配置。

Multimodal Content Creator 是免费的吗?

是的,Multimodal Content Creator 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Multimodal Content Creator 支持哪些平台?

Multimodal Content Creator 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Multimodal Content Creator?

由 terrycarter1985(@terrycarter1985)开发并维护,当前版本 v1.0.0。

💬 留言讨论