← Back to Skills Marketplace
231
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install sjht-data-annotation
Description
通用数据标注处理工具。当用户提到需要数据标注、有标注任务、数据处理、数据集生成、 标注查看/编辑时使用此 skill。支持图像、视频、文本等多种数据类型,调用模型进行内容理解 和标注,生成结构化标注数据,提供 Web 查看编辑界面。 触发短语:「标注」「annotation」「数据集」「label」「tag da...
Usage Guidance
What to consider before installing/using this skill:
- Functional fit: The skill matches a data-annotation workflow and includes a viewer and a small HTTP API. If you only need a local ad-hoc annotator, it is plausible to use.
- Security issues to fix or review before use:
- Restrict the API to the intended data directory: modify scripts/annotation-api.py so both listing (dir param) and save (results_file) are validated to reside under the configured DATA_DIR (use realpath checks). Right now GET allows listing arbitrary dirs and POST can write to arbitrary paths.
- Do NOT chmod /root to 755. Instead place annotation data under a dedicated directory you control (e.g., /var/lib/annotation or a user home subdirectory) and run the service under a dedicated unprivileged user.
- Avoid exposing the API publicly. If you proxy via nginx, require authentication, limit allowed origins, and avoid setting Access-Control-Allow-Origin: * for the API. Consider binding to a unix socket or 127.0.0.1 only and restricting nginx to authenticated locations.
- Sanitize inputs in the POST save path and ensure files are not overwritten unintentionally. Prefer appending JSONL within DATA_DIR and disallow absolute paths in client-supplied file parameters.
- Run the API and viewer in an isolated environment (container or dedicated VM) to limit impact if abused.
- Operational cautions:
- The SKILL.md suggests restarting nginx and modifying system config; perform these steps only if you understand the server and have backups. Prefer adding locations in an existing site config without changing root permissions.
- The skill references external model APIs and examples that require API keys. Ensure you supply keys securely (environment variables, secret store) and do not embed them in client-side code.
- If you want to proceed safely: review and patch annotation-api.py to enforce DATA_DIR bounds for both listing and saving; change default DATA_DIR to a non-root path; remove or rework instructions that require changing /root permissions; and run behind authentication/firewall. If you cannot audit/patch the code, treat the skill as risky and avoid exposing the service to other users or the internet.
Capability Analysis
Type: OpenClaw Skill
Name: sjht-data-annotation
Version: 1.0.0
The skill bundle provides a data annotation workflow with a web-based viewer, but it contains significant security vulnerabilities and risky instructions. The `scripts/annotation-api.py` script allows writing data to arbitrary file paths via the `file` parameter in POST requests, lacking the path-traversal protections found in its file-reading logic. Additionally, `SKILL.md` instructs the agent to perform high-risk system modifications, such as changing the `/root` directory permissions to `755` and altering the default Nginx configuration. While these actions are contextually related to deploying the annotation viewer, they introduce severe security flaws without evidence of direct malicious intent.
Capability Assessment
Purpose & Capability
Name/description and included files (annotation API, viewer template, SKILL.md) align with a data-annotation tool. However some operational recommendations (symlinking into /root, chmod 755 /root) and nginx/systemd instructions are more intrusive than needed for a simple annotation viewer and are not justified by the core purpose.
Instruction Scope
SKILL.md instructs reading user-provided document paths and data directories (expected), but also instructs privileged system operations: changing /root permissions to 755, restarting nginx, and adding permissive CORS headers. The shipped API accepts a 'dir' query parameter for listing which is not constrained to the configured DATA_DIR (information disclosure risk), and the POST save endpoint lets callers specify an arbitrary results file path and will write to it (arbitrary file write). These behaviors permit listing and partial modification of files outside the intended data directory when proxied/exposed.
Install Mechanism
No install spec or external downloads — the skill is instruction-only plus bundled code. This lowers supply-chain risk; nothing is fetched from remote URLs during install.
Credentials
The skill declares no environment variables or credentials (appropriate), but README and SKILL.md examples show calling external model APIs with Authorization: Bearer <API_KEY>. The skill does not declare or manage those credentials; the agent/user will need to supply them. Lack of explicit guidance about where to store API keys is a minor omission but not necessarily malicious.
Persistence & Privilege
The skill's metadata does not request persistent or elevated privileges, but the runtime instructions encourage system-wide changes (modifying /root permissions, adding nginx location blocks, restarting nginx) that require root. This combination (code that can write arbitrary paths + guidance to make data available under /root and to restart system services) increases the blast radius if the API is exposed or misused.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install sjht-data-annotation - After installation, invoke the skill by name or use
/sjht-data-annotation - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
# Changelog — data-annotation skill
所有日期格式为 YYYY-MM-DD。
## [1.1.0] - 2026-03-19
### 变更
- **计划驱动工作流**:新增 plan.json 标注计划机制,处理前先制定计划列出所有数据,逐条处理并更新进度
- **逐条处理防超时**:从批量处理改为每次只处理 1 条数据,处理完立即保存到 JSONL,避免超时丢失进度
- **进度汇报机制**:每处理完几条数据汇报进度(已处理 X/Y,耗时 N 秒),快超时时暂停并汇报
### 修复(实战经验)
- **视频抽帧密度增加**:每秒至少 2 帧,短视频至少 15 帧,中视频至少 20 帧,长视频至少 30 帧
- **nginx 配置教训**:
- 不要创建独立 server 块监听 80(会冲突),改为在已有站点中添加 location
- 使用 `^~` 前缀匹配避免正则 location 劫持 mp4/jpg 请求
- `/root` 目录权限必须 755,否则 nginx 无法访问
- nginx reload 可能不够,必要时 restart
- **Web 页面修复**:
- apiBase 必须用 nginx 反代路径(`/annotation-api/`),不硬编码 localhost:8888
- 所有文本字段 contentEditable,标签支持增删
- 未保存修改时离开页面要有 beforeunload 警告
- 视频文件通过 nginx 静态服务,不通过 API
- **docx 读取**:新增 pandoc 备选方案(python-docx 失败时)
## [1.0.0] - 2026-03-19
### 新增
- **完整工作流**:需求确认 → 数据读取 → 模型处理 → 标注生成 → 结果存储 → Web 查看/编辑 → Nginx 部署
- **SKILL.md**:7 步工作流程说明,包含模型选择策略、输出格式、部署流程
- **annotation-viewer.html**:Web 标注查看/编辑页面模板
- **annotation-api.py**:轻量 HTTP API 服务(文件列表/读取/保存)
- **annotation-api.service**:systemd 服务模板
- **output-formats.md**:常见标注输出格式参考
- **skill.json**:skill 元数据配置
### 设计决策
- 结果存储在数据同目录的 `results/` 子目录下
- 使用 JSONL 作为默认输出格式
- Web 页面为纯静态 HTML,通过 Python API 服务处理保存
- API 服务绑定 127.0.0.1,通过 nginx 反向代理对外提供访问
- 模型按数据类型选择:图像/视频用 VL 模型,文本用 LLM
Metadata
Frequently Asked Questions
What is sjht-data-annotation?
通用数据标注处理工具。当用户提到需要数据标注、有标注任务、数据处理、数据集生成、 标注查看/编辑时使用此 skill。支持图像、视频、文本等多种数据类型,调用模型进行内容理解 和标注,生成结构化标注数据,提供 Web 查看编辑界面。 触发短语:「标注」「annotation」「数据集」「label」「tag da... It is an AI Agent Skill for Claude Code / OpenClaw, with 231 downloads so far.
How do I install sjht-data-annotation?
Run "/install sjht-data-annotation" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is sjht-data-annotation free?
Yes, sjht-data-annotation is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does sjht-data-annotation support?
sjht-data-annotation is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created sjht-data-annotation?
It is built and maintained by Aowind (@aowind); the current version is v1.0.0.
More Skills