Description

The ai-text-to-video-generator skill on ClawHub transforms written prompts into fully rendered video content through a conversational interface. Describe you...

README (SKILL.md)

0. First Contact

When the user opens this skill or sends their first message, greet them immediately:

🎵 Ai Text To Video Generator at your service! Upload a video or tell me what you're looking for.

Try saying:

"add effects to this clip"
"help me create a short video"
"edit my video"

IMPORTANT: Do NOT wait silently. Always greet the user proactively on first contact.

Auto-Setup

When the user first interacts, set up the connection:

Check token: If NEMO_TOKEN env var is set, use it. Otherwise:
Read or generate Client-ID:
- Read ~/.config/nemovideo/client_id if it exists
- Otherwise generate a UUID, save it to ~/.config/nemovideo/client_id
Acquire anonymous token:
```
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
```
Store the returned token as NEMO_TOKEN for this session. You get 100 free credits.
Create a session (§3.0) so you're ready to work immediately.

Let the user know briefly: "Setting things up… ready!" then proceed with their request.

From a Single Sentence to a Finished Video

Name: Ai Text To Video Generator
Author: peand-rover

Most video creation tools demand timelines, keyframes, and export settings before you see a single frame. The ai-text-to-video-generator skill flips that entirely — you describe what you want in plain conversational language, and the skill handles the translation from words to moving images. Want a cinematic product reveal with warm lighting and slow motion? Just say so.

Behind the scenes, the OpenClaw agent acts as the orchestration layer between your natural-language instructions and the video rendering pipeline. It parses your intent, resolves ambiguities by asking clarifying follow-up questions when needed, and passes structured directives to the generation backend. This means your prompt isn't just keyword-matched — it's genuinely understood in context, preserving nuance like mood, pacing, and narrative arc across multi-scene requests.

The skill is particularly well-suited for iterative workflows. You can refine a generated clip by continuing the conversation — adjusting duration, swapping visual styles, or adding captions — without starting over. Each exchange builds on prior context, making the process feel less like operating a tool and more like collaborating with a director who remembers everything you've discussed.

Environment Variables

Variable	Required	Default
`NEMO_TOKEN`	No	Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens)
`NEMO_API_URL`	No	`https://mega-api-prod.nemovideo.ai`
`NEMO_WEB_URL`	No	`https://nemovideo.com`
`NEMO_CLIENT_ID`	No	Auto-generated UUID, persisted to `~/.config/nemovideo/client_id` (UUID only, no secrets)
`SKILL_SOURCE`	No	Auto-detected from install path, fallback `unknown`

If NEMO_TOKEN is not set, get one (requires X-Client-Id header):

# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
  CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
  mkdir -p ~/.config/nemovideo && echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}

Save token as NEMO_TOKEN, CLIENT_ID as NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same X-Client-Id.

Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.

2. Routing Incoming Requests to the Correct Endpoint

Use the table below to determine which API endpoint should handle each type of incoming request.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"	→ §3.3 Credits	✅
"status" / "状态" / "show tracks"	→ §3.4 State	✅
"upload" / "上传" / user sends file	→ §3.2 Upload	✅
Everything else (generate, edit, add BGM…)	→ §3.1 SSE	❌

3. Primary Operation Flows

$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.

Skill attribution variables — do NOT hardcode:

$SKILL_NAME = this file's YAML frontmatter name
$SKILL_VERSION = this file's YAML frontmatter version
$SKILL_SOURCE = detect at runtime using the first match:
1. Env SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g. ~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor)
3. Fallback: unknown

CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.

X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE

3.0 Establishing a New Session

curl -s -X POST "$API/api/tasks/me/with-session/nemo_agent" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
  -d '{"task_name":"project","language":"\x3Clang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}

Before any other operations can proceed, a session must be initialized. This session token anchors all subsequent requests within the same working context.

Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo: $WEB/workspace/claim?token=$TOKEN&task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE

3.1 Delivering Messages Over SSE

curl -s -X POST "$API/run_sse" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
  -d '{"app_name":"nemo_agent","user_id":"me","session_id":"\x3Csid>","new_message":{"parts":[{"text":"\x3Cmsg>"}]}}'

All conversational messages are transmitted to the backend through a persistent Server-Sent Events connection.

SSE Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result	Wait silently, don't forward
`heartbeat` / empty `data:`	Keep waiting. Every 2 min: "⏳ Still working..."
Stream closes	Process final response

Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.

Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).

Ignore trailing "I encountered a temporary issue" if prior responses were normal.

Silent Response Fallback (CRITICAL)

Approximately 30% of edit operations return no visible text in the response stream. When this occurs: (1) do not treat the absence of text as a failure; (2) poll the task state endpoint to confirm the job is progressing; (3) once the task reaches a completed state, proceed directly to the export flow; (4) inform the user that processing is underway rather than implying an error has occurred.

Two-stage generation: The backend automatically enriches the raw video output in a second pass — background music and a title overlay are appended without any additional instruction from the AI. Expect two distinct completion events: the first signals that the raw video clip is ready, and the second signals that the fully decorated version with BGM and title is available. Always wait for the second stage before presenting the final result to the user.

3.2 Handling File Uploads

File upload: curl -s -X POST "$API/api/upload-video/nemo_agent/me/\x3Csid>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -F "files=@/path/to/file"

URL upload: curl -s -X POST "$API/api/upload-video/nemo_agent/me/\x3Csid>" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"urls":["\x3Curl>"],"source_type":"url"}'

Use me in the path; backend resolves user from token.

Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

The upload endpoint accepts user-supplied media assets that can be referenced as source material during video generation.

3.3 Checking Available Credits

curl -s "$API/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}

Query the credits endpoint before initiating any generation task to confirm the user has a sufficient balance to cover the operation.

3.4 Polling Current Task State

curl -s "$API/api/state/nemo_agent/me/\x3Csid>/latest" -H "Authorization: Bearer $TOKEN" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"

Use me for user in path; backend resolves from token. Key fields: data.state.draft, data.state.video_infos, data.state.canvas_config, data.state.generated_media.

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Draft ready for export when draft.t exists with at least one track with non-empty sg.

Track summary format:

Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

3.5 Exporting and Delivering the Final Video

Export does NOT cost credits. Only generation/editing consumes credits.

Triggering an export consumes no credits. To deliver the finished video: (a) call the export endpoint with the completed task identifier; (b) await the signed download URL in the response; (c) verify the URL resolves to a playable file; (d) present the URL or an embedded player to the user; (e) include the video title and any relevant metadata alongside the delivered link.

b) Submit: curl -s -X POST "$API/api/render/proxy/lambda" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" -d '{"id":"render_\x3Cts>","sessionId":"\x3Csid>","draft":\x3Cjson>,"output":{"format":"mp4","quality":"high"}}'

Note: sessionId is camelCase (exception). On failure → new id, retry once.

c) Poll (every 30s, max 10 polls): curl -s "$API/api/render/proxy/lambda/\x3Cid>" -H "Authorization: Bearer $TOKEN" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"

Status at top-level status: pending → processing → completed / failed. Download URL at output.url.

d) Download from output.url → send to user. Fallback: $API/api/render/proxy/\x3Cid>/download.

e) When delivering the video, always also give the task detail link: $WEB/workspace/claim?token=$TOKEN&task={task_id}&session={session_id}&skill_name=$SKILL_NAME&skill_version=$SKILL_VERSION&skill_source=$SKILL_SOURCE

Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.

3.6 Recovering from an SSE Disconnection

If the SSE stream drops unexpectedly, follow these steps: (1) capture the last known task ID before the connection was lost; (2) wait a minimum of three seconds before attempting to reconnect to avoid hammering the server; (3) re-open the SSE stream using the same session token; (4) resume polling the task state endpoint with the saved task ID to determine current progress; (5) once a terminal state is confirmed, continue with the normal export and delivery flow as if no interruption occurred.

4. Translating Backend GUI References

The backend is designed around a graphical interface and will occasionally reference UI elements — never relay these GUI-specific instructions verbatim to the user.

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"	Show state via §3.4
"drag/drop" / "拖拽"	Send edit via SSE
"preview in timeline"	Show track summary
"Export button" / "导出"	Execute §3.5
"check account/billing"	Check §3.3

Keep content descriptions. Strip GUI actions.

5. Recommended Interaction Patterns

• Acknowledge the user's request immediately and set clear expectations about generation time before the task begins. • Provide periodic progress updates by surfacing status information retrieved from the task state endpoint rather than leaving the user in silence. • When a silent response is received, reassure the user that work is in progress instead of suggesting something went wrong. • After the second-stage enrichment completes, present the final video alongside its title and a brief summary of what was created. • If a recoverable error occurs, explain what happened in plain language and automatically retry using the established recovery flow before asking the user to take any action.

6. Known Limitations

• Video generation is asynchronous and cannot be accelerated — estimated wait times should be communicated honestly to the user. • The AI cannot modify or override background music or title overlays applied during the automatic second-stage enrichment pass. • File uploads are subject to size and format restrictions enforced by the upload endpoint; unsupported files will be rejected before generation begins. • Credit balances are read-only from the AI's perspective; the AI can report a balance but cannot add, transfer, or adjust credits. • SSE connections may be dropped by network intermediaries on long-running tasks; the disconnect recovery flow must be implemented to ensure reliable delivery.

7. Error Identification and Handling

The table below maps common HTTP status codes and API error identifiers to their causes and the recommended corrective action.

Code	Meaning	Action
0	Success	Continue
1001	Bad/expired token	Re-auth via anonymous-token (tokens expire after 7 days)
1002	Session not found	New session §3.0
2001	No credits	Anonymous: show registration URL with `?bind=\x3Cid>` (get `\x3Cid>` from create-session or state response when needed). Registered: "Top up at nemovideo.ai"
4001	Unsupported file	Show supported formats
4002	File too large	Suggest compress/trim
400	Missing X-Client-Id	Generate Client-Id and retry (see §1)
402	Free plan export blocked	Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export."
429	Rate limit (1 token/client/7 days)	Retry in 30s once

Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.

8. API Version and Required Token Scopes

Always verify that requests target the documented API version before going live; calls made against a deprecated version may return unexpected results or fail silently. The access token supplied with each request must carry all required scopes for the operations being performed — missing scopes will result in authorization errors even when the token itself is otherwise valid. Confirm both the version header and scope grants during initial integration testing.

Usage Guidance

This skill will contact nemovideo.com (or the configured NEMO_API_URL) and will persist a small non‑secret client_id at ~/.config/nemovideo/client_id. It will obtain or use a NEMO_TOKEN to call the API (anonymous tokens expire in 7 days). Before installing: 1) Confirm you trust nemovideo.com and the referenced API endpoints, 2) Avoid uploading sensitive/private media unless you accept the service's handling and retention policies, 3) Be aware the skill will perform outbound network calls and may use credits tied to the token, and 4) If you prefer, create a dedicated/toxic‑content‑free account or token on the service you can revoke later. There is no install script to audit (instruction-only), so review the service's privacy/terms and the skill's repository (if you need more assurance) before use.

Capability Analysis

Type: OpenClaw Skill Name: ai-text-to-video-generator Version: 1.2.1 The skill provides video generation capabilities but requires high-risk operations, including executing shell commands (curl, uuidgen, mkdir) and managing local state in ~/.config/nemovideo/. It instructs the agent to perform file uploads and network requests to mega-api-prod.nemovideo.ai, which are necessary for the service but grant the agent broad system and network access. While the behavior appears aligned with the stated purpose, the use of shell-based primitives and local filesystem persistence qualifies as suspicious under the provided criteria.

Capability Assessment

✓ Purpose & Capability

The skill declares a primary credential (NEMO_TOKEN) and a nemovideo config path; those map directly to a hosted video generation service. There are no unrelated credentials, binaries, or config paths requested.

✓ Instruction Scope

SKILL.md instructs the agent to greet users proactively, obtain or read a Client-ID at ~/.config/nemovideo/client_id, call the nemovideo anonymous-token endpoint via curl, and create a session. These actions are within scope for a cloud API-backed video generator. The skill does write a client_id file and performs network requests to the stated API domain.

✓ Install Mechanism

There is no install spec or code to run on install (instruction-only), so nothing arbitrary is downloaded or written beyond the documented client_id persistence at runtime.

✓ Credentials

Requested environment variables are limited to service-related items (NEMO_TOKEN, NEMO_API_URL, NEMO_WEB_URL, NEMO_CLIENT_ID). The NEMO_TOKEN credential is appropriate for accessing the remote API; the only on-disk write is a UUID client_id (non-secret).

✓ Persistence & Privilege

always is false. The only persistent change is writing ~/.config/nemovideo/client_id (UUID only). The skill does not request system‑wide configuration changes or access to other skills' credentials.

Version History

v1.2.1

**Summary:** Significant update improving user onboarding, auto-setup, and overall workflow clarity. - Added auto-setup instructions for seamless initial user experience, including token management and session creation. - Revised and clarified environment variable usage and fallbacks. - Provided explicit user greeting and proactive first-contact behavior. - Overhauled skill description for conciseness and focus on conversational, context-aware video generation. - Included tables and workflow guides for routing requests and API integration. - Updated metadata, homepage, and repository links.

v1.2.0

- Updated skill description, display name, and feature summary for improved clarity and expanded keyword coverage. - Rewrote use case examples to cover a broader range of scenarios and emphasize scale and professional applications. - Added detailed explanations of how AI interprets and transforms text scripts, briefs, and documents into professional videos. - Refreshed instructions and API usage to clarify workflow and configuration steps. - Enhanced formatting and organization for easier reading and faster reference.

v1.1.0

Summary: Major update refocusing the skill on turning any text into a complete, AI-produced video — no editing required. - Skill description and documentation overhauled for clarity, emphasizing entire text-to-video automation. - Now supports input of any kind of text: scripts, blog posts, meeting notes, articles, etc. - Explains feature set: AI-matched visuals, natural voiceover, animated text overlays, music, transitions, and subtitles — all fully automated. - Provides common use cases and detailed parameter guidance for customizing video style and format. - Step-by-step instructions, sample API calls, and output examples added for easier onboarding and experimentation.

v1.0.4

**Improved onboarding and first-contact experience.** - First-contact greeting is now always proactive, with clear setup status for the user. - Setup instructions and language streamlined for easier understanding. - Auto-setup guidance clarified: setup progress is now briefly shown to users ("Setting things up… ready!"). - Minor corrections to session claim URL (token included). - Improved markdown formatting and fixed typos.

v1.0.3

ai-text-to-video-generator 1.0.3 - Updated documentation in SKILL.md with clarifications and minor edits. - No changes to code or runtime behavior.

v1.0.2

Version 1.0.2 - Updated documentation in SKILL.md for improved clarity and formatting. - Corrected minor typos and formatting in markdown tables and code blocks. - Adjusted endpoint routing table for enhanced readability. - No changes to code, features, or user-facing functionality.

v1.0.1

ai-text-to-video-generator 1.0.1 - Adds a mandatory initial user greeting and sample suggestions upon first contact. - Implements silent auto-setup of authentication and session creation before responding. - Instructs never to mention authentication, tokens, or setup to the user for a smoother onboarding. - No changes to API routing or operational flows; all video generation and editing functions remain.

v1.0.0

ai-text-to-video-generator 1.0.0 - Initial release: transform plain-language descriptions into finished video clips. - Supports multiple video formats: mp4, mov, avi, webm, and mkv. - Conversational workflow: refine, edit, or adjust previous renders iteratively. - Automatic setup and token management; stores non-sensitive client ID locally. - Built-in routing for status, uploads, credits, video export, and editing requests. - No technical expertise required—ideal for content creators, marketers, and educators seeking effortless video generation.

Metadata

Slug ai-text-to-video-generator

Version 1.2.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 8

Frequently Asked Questions

What is Ai Text To Video Generator?

The ai-text-to-video-generator skill on ClawHub transforms written prompts into fully rendered video content through a conversational interface. Describe you... It is an AI Agent Skill for Claude Code / OpenClaw, with 199 downloads so far.

How do I install Ai Text To Video Generator?

Run "/install ai-text-to-video-generator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Ai Text To Video Generator free?

Yes, Ai Text To Video Generator is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Ai Text To Video Generator support?

Ai Text To Video Generator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Ai Text To Video Generator?

It is built and maintained by peandrover adam (@peand-rover); the current version is v1.2.1.

More Skills

Ai Text To Video Generator