← Back to Skills Marketplace
18072937735

Large Model Visual Question Answering Skill | 大模型视觉问答技能

by smyx-skills · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
74
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install smyx-visual-qa-analysis
Description
Conducts open-ended Q&A on image content based on computer vision and large language models, supporting any questions to receive natural language responses....
Usage Guidance
This skill will upload images and metadata to external API endpoints (the code points to lifeemergence/open-api hosts by default) and will read/write config files and may create a local SQLite under the workspace. Before installing or running: 1) Confirm the remote API host and privacy policy — sensitive images should not be uploaded unless you trust the service. 2) Inspect or sandbox the skill (run in an isolated environment/container) — it can create files under OPENCLAW_WORKSPACE and save attachments. 3) Review workspace config files for secrets that the skill might read; avoid placing sensitive API keys or tokens in shared configs. 4) If you need strictly local-only VQA, do not use this skill. If anything is unclear, ask the publisher which endpoints receive images and what data is stored remotely vs locally.
Capability Analysis
Type: OpenClaw Skill Name: smyx-visual-qa-analysis Version: 1.0.0 The skill implements a complex integration with a third-party cloud service (lifeemergence.com) and contains highly controlling prompt instructions in SKILL.md that explicitly forbid the AI agent from accessing its own local memory or LanceDB, forcing it to rely solely on the remote API for history. The common library (smyx_common) includes logic in util.py and dao.py to automatically register/login users using identifiers like phone numbers, subsequently storing session tokens in a local SQLite database (smyx-common-claw.db). Additionally, AgentSkill in smyx_common/scripts/skill.py uses subprocess.run to recursively invoke the 'openclaw' agent, which is a high-privilege capability. While these features support the stated VQA and health analysis purposes, the aggressive override of agent memory and the automated credential management/persistence are high-risk behaviors.
Capability Tags
requires-sensitive-credentials
Capability Assessment
Purpose & Capability
The skill's name/description claim a Visual Question Answering feature, which matches the scripts that call an external VQA API. However the repository includes substantial unrelated functionality (face_analysis, pet-health references, TCM face-diagnosis code and a large common library). That broad code surface and references to multiple analysis endpoints are disproportionate to a single VQA skill and increase the attack/abuse surface (e.g., extra API endpoints, DB code).
Instruction Scope
SKILL.md explicitly forbids reading local memory and requires retrieving an open-id from config files; the runtime scripts do read and write local files (validate and read image files, save attachments, and BaseEnum will create/read config.yaml). The scripts call remote APIs and will upload image data. The 'do not use local memory' requirement in documentation is not strongly enforced by the code, producing a mismatch between instructions and actual behavior.
Install Mechanism
There is no install spec (instruction-only), which is lower-risk from an installer standpoint. However the bundle includes a large requirements.txt in smyx_common and face_analysis, implying many external Python packages would be needed to run; there is no automated, vetted install path provided.
Credentials
Registry metadata declares no required environment variables or credentials, but the code reads environment variables (OPENCLAW_SENDER_OPEN_ID, OPENCLAW_WORKSPACE, FEISHU_OPEN_ID) and expects to fetch api-key/open-id from local config files under the skill or workspace. The SKILL.md forces an 'open-id' and instructs checking skill/workspace config files for api-key, yet these env/config accessors were not declared in metadata — mismatch and potential for unexpected access to workspace config or secrets.
Persistence & Privilege
The skill will save uploaded attachments to a local attachments directory and the shared common modules include a DAO that writes/reads a local SQLite under a workspace 'data' directory. The skill does not set always:true, but it will create or modify files in the workspace if executed.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install smyx-visual-qa-analysis
  3. After installation, invoke the skill by name or use /smyx-visual-qa-analysis
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of Visual Q&A Analysis skill: - Enables open-ended natural language Q&A for images using computer vision and large language models. - Supports image understanding, scene description, detail identification, and knowledge reasoning from user questions and images. - Strictly enforces cloud-based history retrieval; never reads or summarizes from local memory or long-term storage. - Requires secure open-id acquisition via config file or user prompt before any operation. - Provides clear operational instructions, output formatting, and usage constraints for reliability and privacy.
Metadata
Slug smyx-visual-qa-analysis
Version 1.0.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is Large Model Visual Question Answering Skill | 大模型视觉问答技能?

Conducts open-ended Q&A on image content based on computer vision and large language models, supporting any questions to receive natural language responses.... It is an AI Agent Skill for Claude Code / OpenClaw, with 74 downloads so far.

How do I install Large Model Visual Question Answering Skill | 大模型视觉问答技能?

Run "/install smyx-visual-qa-analysis" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Large Model Visual Question Answering Skill | 大模型视觉问答技能 free?

Yes, Large Model Visual Question Answering Skill | 大模型视觉问答技能 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Large Model Visual Question Answering Skill | 大模型视觉问答技能 support?

Large Model Visual Question Answering Skill | 大模型视觉问答技能 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Large Model Visual Question Answering Skill | 大模型视觉问答技能?

It is built and maintained by smyx-skills (@18072937735); the current version is v1.0.0.

💬 Comments