Description

สร้างรูปภาพและวิดีโอ AI ฟรีผ่าน Google Gemini และ Google Flow โดยใช้ browser automation ไม่ต้องจ่าย API fee ใช้เมื่อต้องการสร้างสื่อ visual (รูปปก, thumbnail...

README (SKILL.md)

Google Free Media Generator

Name: Google Free Media Skill
Author: pbseiya

Skill สำหรับสร้างรูปภาพและวิดีโอ AI ฟรีผ่าน Google Gemini และ Google Flow โดยใช้ browser automation

🎯 เมื่อไหร่ควรใช้ Skill นี้

ใช้เมื่อผู้ใช้ต้องการ:

สร้างรูปภาพ AI สำหรับ cover, thumbnail, banner
สร้างวิดีโอจากข้อความหรือรูปภาพ (text-to-video, image-to-video)
ประหยัดค่า API (0 บาท vs 1-3 บาท/รูป ผ่าน API ปกติ)
สร้างสื่อจำนวนมากโดยไม่กังวลเรื่องต้นทุน

⚠️ ข้อจำกัดที่ต้องรู้

Quota ฟรีจำกัด: Gemini ~100 รูป/วัน, Flow ~50 credits/วัน (อาจเปลี่ยนแปลง)
ช้ากว่า API: ต้องเปิด browser และรอ UI load (5-10x ช้ากว่า)
เสี่ยง UI เปลี่ยน: Google เปลี่ยนปุ่ม/ตำแหน่งบ่อย → อาจต้อง update skill
Terms of Service: Automation อาจขัดกับ ToS ของ Google free tier

📋 ขั้นตอนการทำงาน

1. ตรวจสอบ Quota ก่อนเริ่ม

node scripts/quota_manager.mjs check

ดูว่าเหลือ quota เท่าไหร่
แจ้งเตือนถ้าใกล้หมด

2. สร้างรูปภาพ (Gemini)

node scripts/generate_image.mjs --prompt "คำอธิบายรูป" --output /path/to/output.jpg

การทำงาน:

เปิด browser ไปยัง gemini.google.com
Login (ถ้ายังไม่ได้ login)
กดปุ่มสร้างรูป (Image generation)
ส่ง prompt ที่ enhance แล้ว
รอ generate และดึงรูป full resolution (=s0 trick)
บันทึกลงไฟล์

3. สร้างวิดีโอ (Google Flow)

node scripts/generate_video.mjs --prompt "คำอธิบายวิดีโอ" --output /path/to/output.mp4

การทำงาน:

เปิด browser ไปยัง labs.google/flow
เลือกโหมด (Text-to-Video หรือ Image-to-Video)
ส่ง prompt หรืออัพโหลดรูป
รอ generate
ดาวน์โหลดวิดีโอ

🔧 Scripts

generate_image.mjs

สร้างรูปภาพผ่าน Google Gemini

Arguments:

--prompt: คำอธิบายรูป (required)
--output: path ไฟล์ output (required)
--style: style ของรูป (optional: realistic, artistic, minimalist)
--enhance: ให้ AI enhance prompt อัตโนมัติ (default: true)

generate_video.mjs

สร้างวิดีโอผ่าน Google Flow (Veo 3.1)

Arguments:

--prompt: คำอธิบายวิดีโอ (required)
--output: path ไฟล์ output (required)
--mode: โหมดการสร้าง (text-to-video, image-to-video)
--image: path รูปต้นทาง (สำหรับ image-to-video)
--duration: ระยะเวลาวิดีโอ (5-10 วินาที)

quota_manager.mjs

จัดการและติดตาม quota การใช้งาน

Commands:

check: ตรวจสอบ quota ที่เหลือ
reset: รีเซ็ต counter (เริ่มวันใหม่)
log: ดู log การใช้งาน

Config File: configs/quota.json

{
  "dailyLimits": {
    "images": 100,
    "videoCredits": 50
  },
  "currentUsage": {
    "images": 0,
    "videoCredits": 0
  },
  "lastReset": "2026-03-02T00:00:00+07:00"
}

💡 เทคนิคสำคัญ

1. ดึงรูป Full Resolution

รูปบน Gemini แสดงที่ 1024px แต่สามารถดึง full resolution (1408x768) ได้โดยเปลี่ยน URL:

จาก: https://.../image=s1024
เป็น: https://.../image=s0

2. Session Persistence

Login ครั้งเดียวแล้วเก็บ cookie ไว้ใช้ต่อ
ไม่ต้อง login ใหม่ทุกครั้งที่สร้างรูป
ใช้ Puppeteer/Playwright session storage

3. Prompt Enhancement

ก่อนส่งให้ Gemini ควร enhance prompt ให้มี:

Lighting (soft lighting, dramatic lighting, golden hour)
Composition (rule of thirds, centered, wide angle)
Style (photorealistic, cinematic, minimalist, vibrant)
Quality keywords (4K, ultra detailed, professional)

ตัวอย่าง:

Input: "รูปแมวใส่แว่น"
Enhanced: "A photorealistic portrait of a cute cat wearing round glasses, 
soft studio lighting, centered composition, professional photography, 
4K ultra detailed, warm tones"

📁 Storage Organization

ไฟล์ที่สร้างจะเก็บที่:

/mnt/storage/ada_projects/ai_media/
├── images/YYYY-MM/
├── videos/YYYY-MM/
└── metadata.json

🔄 Fallback Strategy

ถ้า Google ใช้ไม่ได้ มีทางเลือกสำรอง:

Bing Image Creator (ฟรี)
Leonardo.ai (ฟรี tier)
Stable Diffusion Online

🚨 การแก้ปัญหา

Login ไม่ได้

ตรวจสอบว่า browser ไม่ใช่ headless mode
ถ้าใช้ VPS ต้องตั้ง Xvfb เป็นจอเสมือน
ลอง clear cookie แล้ว login ใหม่

UI เปลี่ยน/ปุ่มหาย

Update selector ใน scripts
ตรวจสอบ Google เปลี่ยนตำแหน่งฟีเจอร์

Quota หมด

รอวันถัดไป (reset ตอน 00:00)
ใช้ fallback services แทน

📝 ตัวอย่างการใช้งาน

# สร้างรูปปกโพสต์
node scripts/generate_image.mjs \
  --prompt "AI workflow diagram, futuristic style, blue and purple gradient" \
  --output /mnt/storage/ada_projects/ai_media/images/2026-03/cover_001.jpg \
  --style artistic

# สร้างวิดีโอจากข้อความ
node scripts/generate_video.mjs \
  --prompt "Ocean waves at sunset, cinematic slow motion" \
  --output /mnt/storage/ada_projects/ai_media/videos/2026-03/sunset.mp4 \
  --duration 8

# ตรวจสอบ quota
node scripts/quota_manager.mjs check

Usage Guidance

This skill is coherent with what it says: it’s a local browser-automation skeleton for using Google web UIs to produce images/videos. Before installing/running: 1) Be aware this method may violate Google’s Terms of Service — use at your own risk. 2) Inspect and understand the Puppeteer/Playwright code you will add; the included scripts are placeholders/skeletons and expect you to implement selectors and login handling. 3) Do not store or expose Google session cookies on shared or public machines; prefer a dedicated account with limited permissions. 4) Install Puppeteer from the official npm registry and review its version/lockfile; avoid running unreviewed install scripts as root. 5) Run in an isolated environment (VM/container) if you are concerned about account/session exposure. 6) If you want the agent to run this autonomously, explicitly weigh the risk that the agent could open a logged-in browser session and act without further prompts. If any of the above is unacceptable, do not run the scripts or request clarifications from the author.

Capability Analysis

Type: OpenClaw Skill Name: google-free-media-skill Version: 1.0.0 The skill bundle provides a framework for automating image and video generation via Google Gemini and Google Flow to avoid API fees. The provided JavaScript files (generate_image.mjs, generate_video.mjs) are currently 'skeleton' implementations that simulate the workflow and manage local usage quotas via quota_manager.mjs. There is no evidence of data exfiltration, credential theft, or malicious execution; the scripts use standard file system operations and hardcoded shell checks for browser availability.

Capability Assessment

✓ Purpose & Capability

The name/description (use browser automation to generate images/videos via Google Gemini/Flow) matches the included files and SKILL.md. The repository contains image/video generator scripts and a local quota manager consistent with the stated purpose. No unrelated credentials, cloud services, or unrelated binaries are requested.

ℹ Instruction Scope

SKILL.md and the scripts instruct the agent/user to open a browser, log into Google, and drive the web UI via Puppeteer/Playwright (skeleton pseudo-code present). The scripts read/write only local paths under the skill (outputs, configs, logs). They reference session persistence (cookies/session storage) which is logical for this use case but is a potential privacy/security consideration (storing session cookies could expose account access if handled insecurely). The instructions do not attempt to read unrelated system files or environment variables.

ℹ Install Mechanism

There is no install spec; this is instruction-plus-scripts. README instructs 'npm install puppeteer' which is expected for browser automation, but the skill does not automate or pin dependency installation. This is lower-risk than remote download/extract, but the user must manually install and trust the npm package used (puppeteer).

✓ Credentials

The skill requests no environment variables, no credentials, and no config paths outside its own repo. The main sensitive operation is requiring you to use your Google account via a browser login; that is proportional to the stated purpose but raises the usual account/session safety considerations (don't use high-privilege accounts; be careful where session cookies are stored).

✓ Persistence & Privilege

always:false and no modifications to other skills or global agent config. The skill stores per-repo quota/config/log files under configs/ and writes outputs under the specified output path. Autonomous invocation is allowed by default (not unusual) but does not introduce extra privileges here.

Version History

v1.0.0

Google Free Media Generator skill v1.0.0 - Initial release: generate AI images (via Gemini) and videos (via Google Flow) for free using browser automation—no API fees required. - Manage daily quota usage automatically for both image and video generation. - Supports full-resolution image downloads and session persistence for faster logins. - Includes fallback options to Bing, Leonardo, and Stable Diffusion if Google is unavailable. - Comprehensive CLI scripts provided for image, video, and quota management.

Metadata

Slug google-free-media-skill

Version 1.0.0

License —

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is Google Free Media Skill?

สร้างรูปภาพและวิดีโอ AI ฟรีผ่าน Google Gemini และ Google Flow โดยใช้ browser automation ไม่ต้องจ่าย API fee ใช้เมื่อต้องการสร้างสื่อ visual (รูปปก, thumbnail... It is an AI Agent Skill for Claude Code / OpenClaw, with 264 downloads so far.

How do I install Google Free Media Skill?

Run "/install google-free-media-skill" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Google Free Media Skill free?

Yes, Google Free Media Skill is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Google Free Media Skill support?

Google Free Media Skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Google Free Media Skill?

It is built and maintained by pbseiya (@pbseiya); the current version is v1.0.0.

More Skills

Google Free Media Skill