← Back to Skills Marketplace

dmxapi-image-recognition

Name: dmxapi-image-recognition
Author: onee-io

by cryptonee.eth · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

125

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install dmxapi-image-recognition

Description

使用 DMXAPI 平台进行图像识别和理解。支持 Gemini 等多模态视觉模型。可进行图片描述、OCR文字识别、图表数据分析、物体检测、场景理解等任务。当用户需要识别图片内容、提取图片文字、分析图表、理解图像时使用此技能。

README (SKILL.md)

DMXAPI 图像识别/理解

通过 DMXAPI 统一 CLI 调用多种 AI 视觉模型进行图像识别和理解。

前置准备

安装 CLI 工具（需要 Node.js 20+）：
```
npm install -g dmxapi-cli
```
配置 API Key（从 DMXAPI 控制台获取）：
```
dmxapi config set apiKey sk-your-api-key
```

命令格式

dmxapi chat -m \x3Cmodel> "提示词" --image \x3Cpath>

选项

选项	说明	示例
`-m, --model \x3Cmodel>`	视觉模型名称（默认 `gpt-5-mini`）	`-m gemini-3-flash-preview`
`--image \x3Cpath>`	图片路径（本地文件或 URL）	`--image ./photo.png`
`-s, --system \x3Cmessage>`	系统消息（定义识别任务）	`-s "你是一个OCR专家"`
`-t, --temperature \x3Cnumber>`	采样温度 0-2	`-t 0.3`
`--max-tokens \x3Cnumber>`	最大输出 token 数	`--max-tokens 2000`

支持的图片格式

PNG (.png)
JPEG (.jpg, .jpeg)
WebP (.webp)
GIF (.gif)

图片输入方式

本地文件路径：自动转换为 base64 data URL

dmxapi chat "描述这张图片" --image ./photo.jpg

远程 URL：直接使用网络图片

dmxapi chat "分析这张图片" --image https://example.com/image.png

模型	特点	适用场景
`gpt-5-mini`	默认模型，速度快，成本低	通用图像识别
`gemini-3-flash-preview`	Google 最新视觉模型	复杂图像分析、场景理解

使用步骤

确定用户的图像识别需求类型（描述、OCR、分析等）
选择合适的视觉模型
根据任务类型编写精确的提示词
构建 dmxapi chat 命令并执行
将识别结果返回给用户

示例

图片描述

# 基本描述
dmxapi chat "请详细描述这张图片的内容" --image ./landscape.jpg

# 简洁描述
dmxapi chat "用一句话描述这张图片" --image ./photo.png

OCR 文字识别

# 通用 OCR
dmxapi chat "识别图片中的所有文字，按原始排版输出" --image ./document.png

# 手写文字识别
dmxapi chat "识别图片中的手写文字" --image ./handwriting.jpg

# 表格识别
dmxapi chat "识别图片中的表格，以 Markdown 表格格式输出" --image ./table.png

图表数据分析

# 图表解读
dmxapi chat "分析这张图表，提取关键数据点并总结趋势" --image ./chart.png

# 数据提取
dmxapi chat "提取图中柱状图的所有数值，以 JSON 格式输出" --image ./bar-chart.jpg

物体检测与识别

# 物体检测
dmxapi chat "识别图片中的所有物体，列出它们的名称和位置" --image ./room.jpg

# 动植物识别
dmxapi chat "识别图片中的植物种类" --image ./flower.png

场景理解

# 场景分析
dmxapi chat "分析这张图片的场景，描述环境、氛围和可能的用途" --image ./scene.jpg

# 安全检查
dmxapi chat "检查这张图片是否存在安全隐患" --image ./workplace.png

文档理解

# 文档摘要
dmxapi chat "总结这张文档图片的主要内容" --image ./contract.png

# 信息提取
dmxapi chat "从身份证图片中提取姓名和身份证号" --image ./id-card.jpg

代码/截图识别

# 代码识别
dmxapi chat "识别图片中的代码并输出为可复制的文本格式" --image ./code-screenshot.png

# UI 分析
dmxapi chat "分析这个 UI 界面的设计元素和布局" --image ./ui-screenshot.jpg

使用 System 消息增强效果

通过 -s 参数设置 system 消息，可以让模型专注于特定任务：

# OCR 专家模式
dmxapi chat -s "你是一个专业的OCR识别助手，只输出识别到的文字内容，不要添加任何解释" "识别文字" --image ./doc.png

# 数据分析专家模式
dmxapi chat -s "你是一个数据分析专家，擅长从图表中提取数据" "分析图表" --image ./chart.png

# 多语言识别
dmxapi chat -s "识别图片中的文字，如果是英文请翻译成中文" "识别并翻译" --image ./english-doc.png

注意事项

本地图片文件会自动转换为 base64 data URL 上传
远程 URL 图片直接传递给 API 处理
对于复杂识别任务，建议使用 gemini-3-flash-preview
如果识别结果不满意，可以调整提示词或降低 temperature 参数获得更确定的输出

Usage Guidance

This skill looks like an instruction-only wrapper for the third-party 'dmxapi-cli', but its registry metadata omits important requirements. Before installing or using it: 1) Verify the dmxapi-cli package on npm (author, downloads, repository, install scripts) and confirm the DMXAPI service (https://www.dmxapi.cn/) is legitimate. 2) Do not upload sensitive images (ID cards, passports, medical records) until you trust the provider—the skill will send local images (base64) to an external API. 3) Prefer supplying a scoped API key with minimal privileges and remove it from local config when no longer needed; be aware `dmxapi config set` will persist the key locally. 4) If you need stronger assurance, request a version of the skill that declares its required env vars and install steps in registry metadata or one that uses an official, audited SDK/source repository. 5) If you cannot verify the npm package or service, avoid running `npm install -g` globally; consider running it inside an isolated VM/container for testing.

Capability Analysis

Type: OpenClaw Skill Name: dmxapi-image-recognition Version: 1.0.0 The skill bundle instructs the AI agent to perform high-risk operations, including the global installation of an external NPM package (dmxapi-cli) and the execution of shell commands to configure API keys and interact with a third-party service (dmxapi.cn). While these actions are consistent with the stated goal of image recognition, the reliance on unverified external binaries and shell-based secret handling poses significant supply chain and execution risks. Additionally, the mention of non-existent models like 'gpt-5-mini' and 'gemini-3-flash-preview' in SKILL.md suggests an unreliable or deceptive service provider.

Capability Assessment

⚠ Purpose & Capability

The SKILL.md describes a CLI wrapper for DMXAPI (image description, OCR, chart analysis, etc.), which is coherent with the stated purpose. However the registry metadata claims no required binaries or env vars while the instructions explicitly require Node.js 20+, installing dmxapi-cli, and setting an API key — this mismatch is unexpected and incoherent.

⚠ Instruction Scope

Instructions tell the agent/user to convert local images to base64 and upload them (or pass remote URLs) to DMXAPI. That is expected for an image-recognition skill, but it also means local files (including PII like ID cards) will be transmitted off-machine. The SKILL.md does not ask to read other unrelated files or secrets, but it does instruct persistent CLI configuration of an API key.

⚠ Install Mechanism

There is no formal install spec in the registry, yet SKILL.md tells users to run `npm install -g dmxapi-cli`. A global npm install executes unvetted package install scripts and grants the package filesystem/exec capabilities on the host. Because the package and its origin are not validated in the metadata (no homepage/source provided), this raises installation risk.

⚠ Credentials

The registry lists no required environment variables or primary credential, but the runtime instructions require configuring an API key (`dmxapi config set apiKey sk-your-api-key`). That mismatch is problematic: the skill will store and use a service credential but does not declare it in metadata, preventing automated permission review. Requesting a single API key is reasonable for the described functionality, but it must be declared and verified.

⚠ Persistence & Privilege

The skill is not marked always:true and does not request elevated platform privileges. However, the CLI step `dmxapi config set apiKey ...` will persist the API key in the user's dmxapi CLI config (local persistence) and a global npm install will write files system-wide. These behaviors are normal for a CLI tool but were not declared in the registry metadata.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install dmxapi-image-recognition
After installation, invoke the skill by name or use /dmxapi-image-recognition
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- 首次发布 dmxapi-image-recognition 技能，支持多种图像识别与理解任务。 - 支持图片描述、OCR文字识别、图表分析、物体检测、场景理解等多种任务类型。 - 兼容多种图片输入格式（PNG、JPEG、WebP、GIF），支持本地文件与远程 URL。 - 命令行使用 dmxapi-cli，灵活选择模型与参数，提升视觉任务效果。 - 提供丰富的使用示例，涵盖常见图片识别与数据提取场景。

Metadata

Slug dmxapi-image-recognition

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is dmxapi-image-recognition?

使用 DMXAPI 平台进行图像识别和理解。支持 Gemini 等多模态视觉模型。可进行图片描述、OCR文字识别、图表数据分析、物体检测、场景理解等任务。当用户需要识别图片内容、提取图片文字、分析图表、理解图像时使用此技能。 It is an AI Agent Skill for Claude Code / OpenClaw, with 125 downloads so far.

How do I install dmxapi-image-recognition?

Run "/install dmxapi-image-recognition" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is dmxapi-image-recognition free?

Yes, dmxapi-image-recognition is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does dmxapi-image-recognition support?

dmxapi-image-recognition is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created dmxapi-image-recognition?

It is built and maintained by cryptonee.eth (@onee-io); the current version is v1.0.0.

More Skills