← Back to Skills Marketplace

Multimodal Content Creator

Name: Multimodal Content Creator
Author: terrycarter1985

by terrycarter1985 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install multimodal-content-creator

Description

Multimodal content creation workflow — receive WhatsApp messages (text or voice), transcribe audio via Whisper, generate images with DALL-E 3, and reply auto...

README (SKILL.md)

Multimodal Content Creator

A WhatsApp-powered content creation workflow that lets customers send text or voice messages and receive AI-generated images in return.

How It Works

Receive a WhatsApp message (text or voice note)
Transcribe voice notes using OpenAI Whisper
Generate an image from the prompt using DALL-E 3
Reply with the generated image back to the customer

Prerequisites

OpenAI API key set as OPENAI_API_KEY environment variable
WhatsApp CLI authentication (python wacli.py login \x3Ctoken>)

Usage

# Process all unread WhatsApp messages
python scripts/workflow.py process-all

# Generate a single image
python scripts/generate_images.py "a cat riding a skateboard"

# Batch generate from prompts file
python scripts/generate_images.py prompts.txt

# Transcribe an audio file
python scripts/transcribe.py recording.mp3

Files

scripts/workflow.py — Main orchestration script
scripts/generate_images.py — DALL-E 3 image generation
scripts/transcribe.py — Whisper audio transcription (with chunking for large files)
scripts/wacli.py — WhatsApp CLI client

Usage Guidance

Review the package before installing. If you only want the WhatsApp/OpenAI content workflow, remove the unrelated AGENTS.md/SOUL.md/USER.md/HEARTBEAT.md files and the nested agent-browser skill, use dedicated OpenAI and WhatsApp credentials, and add a dry-run or approval step before sending real customer replies.

Capability Analysis

Type: OpenClaw Skill Name: multimodal-content-creator Version: 1.0.0 The skill bundle provides a legitimate multimodal workflow for processing WhatsApp messages, transcribing audio via OpenAI Whisper, and generating images via DALL-E 3. The Python scripts (workflow.py, generate_images.py, transcribe.py) implement standard API integrations with basic input sanitization and error handling. The extensive documentation (AGENTS.md, SOUL.md, etc.) establishes a functional persona and safety boundaries for the OpenClaw agent, explicitly prohibiting data exfiltration and unauthorized external actions. No evidence of malicious intent, data theft, or harmful prompt injection was found.

Capability Tags

requires-sensitive-credentials

Capability Assessment

⚠ Purpose & Capability

The core scripts match the stated WhatsApp → Whisper → DALL-E → reply workflow, but the package also includes broad agent persona, memory, heartbeat, and nested browser-automation skill files that are not explained by the content-creation purpose.

⚠ Instruction Scope

AGENTS.md and related workspace files instruct the agent to treat the folder as its home, maintain memory, be proactive, and commit/push changes, which is much broader than a user-invoked content workflow.

⚠ Install Mechanism

No automatic install script is shown, but the package includes Python requirements plus an unexpected nested agent-browser skill with separate global npm/Chromium install instructions.

⚠ Credentials

OpenAI and WhatsApp access are expected for the workflow, but customer-facing automatic replies and bundled unrelated agent-control instructions increase the impact beyond a simple generation utility.

⚠ Persistence & Privilege

The artifacts include local WhatsApp token storage and, separately, instructions for persistent MEMORY.md/daily memory files and heartbeat-based background activity.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install multimodal-content-creator
After installation, invoke the skill by name or use /multimodal-content-creator
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release: WhatsApp → Whisper → DALL-E 3 → Reply workflow

Metadata

Slug multimodal-content-creator

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Multimodal Content Creator?

Multimodal content creation workflow — receive WhatsApp messages (text or voice), transcribe audio via Whisper, generate images with DALL-E 3, and reply auto... It is an AI Agent Skill for Claude Code / OpenClaw, with 33 downloads so far.

How do I install Multimodal Content Creator?

Run "/install multimodal-content-creator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Multimodal Content Creator free?

Yes, Multimodal Content Creator is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Multimodal Content Creator support?

Multimodal Content Creator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Multimodal Content Creator?

It is built and maintained by terrycarter1985 (@terrycarter1985); the current version is v1.0.0.

More Skills

Multimodal Content Creator

Multimodal Content Creator

How It Works

Prerequisites

Usage

Files

What is Multimodal Content Creator?

How do I install Multimodal Content Creator?

Is Multimodal Content Creator free?

Which platforms does Multimodal Content Creator support?

Who created Multimodal Content Creator?

💬 Comments