← Back to Skills Marketplace
terrycarter1985

Multimodal Content Creator

by terrycarter1985 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
33
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install multimodal-content-creator
Description
Multimodal content creation workflow — receive WhatsApp messages (text or voice), transcribe audio via Whisper, generate images with DALL-E 3, and reply auto...
README (SKILL.md)

Multimodal Content Creator

A WhatsApp-powered content creation workflow that lets customers send text or voice messages and receive AI-generated images in return.

How It Works

  1. Receive a WhatsApp message (text or voice note)
  2. Transcribe voice notes using OpenAI Whisper
  3. Generate an image from the prompt using DALL-E 3
  4. Reply with the generated image back to the customer

Prerequisites

  • OpenAI API key set as OPENAI_API_KEY environment variable
  • WhatsApp CLI authentication (python wacli.py login \x3Ctoken>)

Usage

# Process all unread WhatsApp messages
python scripts/workflow.py process-all

# Generate a single image
python scripts/generate_images.py "a cat riding a skateboard"

# Batch generate from prompts file
python scripts/generate_images.py prompts.txt

# Transcribe an audio file
python scripts/transcribe.py recording.mp3

Files

  • scripts/workflow.py — Main orchestration script
  • scripts/generate_images.py — DALL-E 3 image generation
  • scripts/transcribe.py — Whisper audio transcription (with chunking for large files)
  • scripts/wacli.py — WhatsApp CLI client
Usage Guidance
Review the package before installing. If you only want the WhatsApp/OpenAI content workflow, remove the unrelated AGENTS.md/SOUL.md/USER.md/HEARTBEAT.md files and the nested agent-browser skill, use dedicated OpenAI and WhatsApp credentials, and add a dry-run or approval step before sending real customer replies.
Capability Analysis
Type: OpenClaw Skill Name: multimodal-content-creator Version: 1.0.0 The skill bundle provides a legitimate multimodal workflow for processing WhatsApp messages, transcribing audio via OpenAI Whisper, and generating images via DALL-E 3. The Python scripts (workflow.py, generate_images.py, transcribe.py) implement standard API integrations with basic input sanitization and error handling. The extensive documentation (AGENTS.md, SOUL.md, etc.) establishes a functional persona and safety boundaries for the OpenClaw agent, explicitly prohibiting data exfiltration and unauthorized external actions. No evidence of malicious intent, data theft, or harmful prompt injection was found.
Capability Tags
requires-sensitive-credentials
Capability Assessment
Purpose & Capability
The core scripts match the stated WhatsApp → Whisper → DALL-E → reply workflow, but the package also includes broad agent persona, memory, heartbeat, and nested browser-automation skill files that are not explained by the content-creation purpose.
Instruction Scope
AGENTS.md and related workspace files instruct the agent to treat the folder as its home, maintain memory, be proactive, and commit/push changes, which is much broader than a user-invoked content workflow.
Install Mechanism
No automatic install script is shown, but the package includes Python requirements plus an unexpected nested agent-browser skill with separate global npm/Chromium install instructions.
Credentials
OpenAI and WhatsApp access are expected for the workflow, but customer-facing automatic replies and bundled unrelated agent-control instructions increase the impact beyond a simple generation utility.
Persistence & Privilege
The artifacts include local WhatsApp token storage and, separately, instructions for persistent MEMORY.md/daily memory files and heartbeat-based background activity.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install multimodal-content-creator
  3. After installation, invoke the skill by name or use /multimodal-content-creator
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release: WhatsApp → Whisper → DALL-E 3 → Reply workflow
Metadata
Slug multimodal-content-creator
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Multimodal Content Creator?

Multimodal content creation workflow — receive WhatsApp messages (text or voice), transcribe audio via Whisper, generate images with DALL-E 3, and reply auto... It is an AI Agent Skill for Claude Code / OpenClaw, with 33 downloads so far.

How do I install Multimodal Content Creator?

Run "/install multimodal-content-creator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Multimodal Content Creator free?

Yes, Multimodal Content Creator is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Multimodal Content Creator support?

Multimodal Content Creator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Multimodal Content Creator?

It is built and maintained by terrycarter1985 (@terrycarter1985); the current version is v1.0.0.

💬 Comments