← 返回 Skills 市场

Multimodal Content Creator

Name: Multimodal Content Creator
Author: terrycarter1985

作者 terrycarter1985 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install multimodal-content-creator

功能描述

Multimodal content creation workflow — receive WhatsApp messages (text or voice), transcribe audio via Whisper, generate images with DALL-E 3, and reply auto...

使用说明 (SKILL.md)

Multimodal Content Creator

A WhatsApp-powered content creation workflow that lets customers send text or voice messages and receive AI-generated images in return.

How It Works

Receive a WhatsApp message (text or voice note)
Transcribe voice notes using OpenAI Whisper
Generate an image from the prompt using DALL-E 3
Reply with the generated image back to the customer

Prerequisites

OpenAI API key set as OPENAI_API_KEY environment variable
WhatsApp CLI authentication (python wacli.py login \x3Ctoken>)

Usage

# Process all unread WhatsApp messages
python scripts/workflow.py process-all

# Generate a single image
python scripts/generate_images.py "a cat riding a skateboard"

# Batch generate from prompts file
python scripts/generate_images.py prompts.txt

# Transcribe an audio file
python scripts/transcribe.py recording.mp3

Files

scripts/workflow.py — Main orchestration script
scripts/generate_images.py — DALL-E 3 image generation
scripts/transcribe.py — Whisper audio transcription (with chunking for large files)
scripts/wacli.py — WhatsApp CLI client

安全使用建议

Review the package before installing. If you only want the WhatsApp/OpenAI content workflow, remove the unrelated AGENTS.md/SOUL.md/USER.md/HEARTBEAT.md files and the nested agent-browser skill, use dedicated OpenAI and WhatsApp credentials, and add a dry-run or approval step before sending real customer replies.

功能分析

Type: OpenClaw Skill Name: multimodal-content-creator Version: 1.0.0 The skill bundle provides a legitimate multimodal workflow for processing WhatsApp messages, transcribing audio via OpenAI Whisper, and generating images via DALL-E 3. The Python scripts (workflow.py, generate_images.py, transcribe.py) implement standard API integrations with basic input sanitization and error handling. The extensive documentation (AGENTS.md, SOUL.md, etc.) establishes a functional persona and safety boundaries for the OpenClaw agent, explicitly prohibiting data exfiltration and unauthorized external actions. No evidence of malicious intent, data theft, or harmful prompt injection was found.

能力标签

requires-sensitive-credentials

能力评估

⚠ Purpose & Capability

The core scripts match the stated WhatsApp → Whisper → DALL-E → reply workflow, but the package also includes broad agent persona, memory, heartbeat, and nested browser-automation skill files that are not explained by the content-creation purpose.

⚠ Instruction Scope

AGENTS.md and related workspace files instruct the agent to treat the folder as its home, maintain memory, be proactive, and commit/push changes, which is much broader than a user-invoked content workflow.

⚠ Install Mechanism

No automatic install script is shown, but the package includes Python requirements plus an unexpected nested agent-browser skill with separate global npm/Chromium install instructions.

⚠ Credentials

OpenAI and WhatsApp access are expected for the workflow, but customer-facing automatic replies and bundled unrelated agent-control instructions increase the impact beyond a simple generation utility.

⚠ Persistence & Privilege

The artifacts include local WhatsApp token storage and, separately, instructions for persistent MEMORY.md/daily memory files and heartbeat-based background activity.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install multimodal-content-creator
安装完成后，直接呼叫该 Skill 的名称或使用 /multimodal-content-creator 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release: WhatsApp → Whisper → DALL-E 3 → Reply workflow

元数据

Slug multimodal-content-creator

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题