/install imsg-media
imsg-media
Full iMessage multimedia pipeline:
- 🎙️ Voice memo → text via Silicon Flow ASR (SenseVoiceSmall, cloud, no local model)
- 🖼️ Image → description/OCR via agent's built-in vision model
Requirements
macOS permissions
- Full Disk Access must be granted to the process running OpenClaw
- Settings → Privacy & Security → Full Disk Access → add your terminal/app
- Without this,
imsgcannot read~/Library/Messages/chat.dband will returnpermissionDenied
API key (audio only)
- Silicon Flow API key — sign up free at https://siliconflow.cn
- Long-term use: add to
~/.openclaw/.env:SILICON_FLOW_KEY=sk-... - Quick test / override: pass
--api-key sk-...directly to the script - Image analysis does not require this key
CLI dependency
imsgCLI:npm install -g imsg
Trigger conditions
Activate this skill when:
- Incoming message text contains the attachment placeholder
 - User says "语音转文字", "转写", "识别语音", "transcribe"
- User says "看图", "识别图片", "读图", "OCR", "截图里写的什么"
- User references a photo/audio/file they just sent via iMessage
Decision flow
Attachment detected?
├── Audio (.m4a / .caf / .wav / .mp3) → transcribe via Silicon Flow ASR
├── Image (.jpg / .png / .heic / .gif) → read with vision model
└── Unknown / not downloaded → increase --limit or ask user to resend
Workflow
Step 1 — Get the sender identifier
Always read from the message envelope:
[iMessage [email protected] ...]→ use[email protected][SMS +1234567890 ...]→ use+1234567890- Never hardcode an address
Step 2 — Fetch the attachment
# Run from the skill directory
cd ~/.openclaw/skills/imsg-voice-transcribe
python3 scripts/imsg_voice_transcribe.py fetch \
--identifier "[email protected]" \
--limit 50
Returns JSON with file, type (audio or image), and metadata.
If nothing found, try --limit 100.
Step 3a — Audio: transcribe
# One-liner (fetch + transcribe)
python3 scripts/imsg_voice_transcribe.py auto \
--identifier "[email protected]" \
--limit 50 --raw
# Or transcribe a specific file
python3 scripts/imsg_voice_transcribe.py transcribe \
--file /path/to/audio.m4a --raw
# Quick test with explicit API key (no env setup needed)
python3 scripts/imsg_voice_transcribe.py transcribe \
--file /path/to/audio.m4a --api-key sk-... --raw
Step 3b — Image: analyze
After fetch returns an image path (e.g. {"file": "/path/to/photo.jpg", "type": "image"}):
# Example: fetch image from a sender
python3 scripts/imsg_voice_transcribe.py fetch \
--identifier "[email protected]" --type image --limit 50
# → {"file": "/Users/.../Messages/Attachments/photo.jpg", "type": "image", ...}
Then in the agent:
- If HEIC/HEIF: convert first →
sips -s format png "input.heic" --out "output.png" - Open with the
readtool → agent vision model processes it - Respond with: what it is, main subject, any text/OCR, notable details
Default image response format:
- What it is: photo / screenshot / document
- Main subject: 1–2 sentences
- Text (OCR): quote key text, or "无明显文字"
- Details: 3–5 bullets
- Follow-up: ask if they want OCR / table extraction / comparison / etc.
Supported formats
| Format | Type | Notes |
|---|---|---|
.m4a |
Audio | Standard iMessage voice memo |
.caf |
Audio | Older iOS voice memo (AAC in CAF) |
.wav .mp3 |
Audio | Other sources |
.jpg .jpeg .png |
Image | Standard photos |
.heic .heif |
Image | iPhone default — convert to PNG first |
.gif |
Image | Animated or static |
Troubleshooting
| Error | Cause | Fix |
|---|---|---|
permissionDenied |
No Full Disk Access | Grant FDA in System Settings |
SILICON_FLOW_KEY not set |
Missing API key | Add to ~/.openclaw/.env |
No attachments found |
Low limit or iCloud not synced | Increase --limit; ask user to resend |
| Request timed out | Network or large file | Retry; check file \x3C 25MB |
| HEIC not displaying | Format not supported by read |
Convert with sips first |
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install imsg-media - 安装完成后,直接呼叫该 Skill 的名称或使用
/imsg-media触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Imsg Media 是什么?
Fetch iMessage/Messages.app attachments (voice memos and images) and process them — transcribe audio via Silicon Flow ASR (SenseVoiceSmall), and analyze imag... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 424 次。
如何安装 Imsg Media?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install imsg-media」即可一键安装,无需额外配置。
Imsg Media 是免费的吗?
是的,Imsg Media 完全免费(开源免费),可自由下载、安装和使用。
Imsg Media 支持哪些平台?
Imsg Media 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Imsg Media?
由 tankyhsu(@tankyhsu)开发并维护,当前版本 v1.0.1。