← 返回 Skills 市场
aowind

sjht-data-annotation

作者 Aowind · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
231
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install sjht-data-annotation
功能描述
通用数据标注处理工具。当用户提到需要数据标注、有标注任务、数据处理、数据集生成、 标注查看/编辑时使用此 skill。支持图像、视频、文本等多种数据类型,调用模型进行内容理解 和标注,生成结构化标注数据,提供 Web 查看编辑界面。 触发短语:「标注」「annotation」「数据集」「label」「tag da...
安全使用建议
What to consider before installing/using this skill: - Functional fit: The skill matches a data-annotation workflow and includes a viewer and a small HTTP API. If you only need a local ad-hoc annotator, it is plausible to use. - Security issues to fix or review before use: - Restrict the API to the intended data directory: modify scripts/annotation-api.py so both listing (dir param) and save (results_file) are validated to reside under the configured DATA_DIR (use realpath checks). Right now GET allows listing arbitrary dirs and POST can write to arbitrary paths. - Do NOT chmod /root to 755. Instead place annotation data under a dedicated directory you control (e.g., /var/lib/annotation or a user home subdirectory) and run the service under a dedicated unprivileged user. - Avoid exposing the API publicly. If you proxy via nginx, require authentication, limit allowed origins, and avoid setting Access-Control-Allow-Origin: * for the API. Consider binding to a unix socket or 127.0.0.1 only and restricting nginx to authenticated locations. - Sanitize inputs in the POST save path and ensure files are not overwritten unintentionally. Prefer appending JSONL within DATA_DIR and disallow absolute paths in client-supplied file parameters. - Run the API and viewer in an isolated environment (container or dedicated VM) to limit impact if abused. - Operational cautions: - The SKILL.md suggests restarting nginx and modifying system config; perform these steps only if you understand the server and have backups. Prefer adding locations in an existing site config without changing root permissions. - The skill references external model APIs and examples that require API keys. Ensure you supply keys securely (environment variables, secret store) and do not embed them in client-side code. - If you want to proceed safely: review and patch annotation-api.py to enforce DATA_DIR bounds for both listing and saving; change default DATA_DIR to a non-root path; remove or rework instructions that require changing /root permissions; and run behind authentication/firewall. If you cannot audit/patch the code, treat the skill as risky and avoid exposing the service to other users or the internet.
功能分析
Type: OpenClaw Skill Name: sjht-data-annotation Version: 1.0.0 The skill bundle provides a data annotation workflow with a web-based viewer, but it contains significant security vulnerabilities and risky instructions. The `scripts/annotation-api.py` script allows writing data to arbitrary file paths via the `file` parameter in POST requests, lacking the path-traversal protections found in its file-reading logic. Additionally, `SKILL.md` instructs the agent to perform high-risk system modifications, such as changing the `/root` directory permissions to `755` and altering the default Nginx configuration. While these actions are contextually related to deploying the annotation viewer, they introduce severe security flaws without evidence of direct malicious intent.
能力评估
Purpose & Capability
Name/description and included files (annotation API, viewer template, SKILL.md) align with a data-annotation tool. However some operational recommendations (symlinking into /root, chmod 755 /root) and nginx/systemd instructions are more intrusive than needed for a simple annotation viewer and are not justified by the core purpose.
Instruction Scope
SKILL.md instructs reading user-provided document paths and data directories (expected), but also instructs privileged system operations: changing /root permissions to 755, restarting nginx, and adding permissive CORS headers. The shipped API accepts a 'dir' query parameter for listing which is not constrained to the configured DATA_DIR (information disclosure risk), and the POST save endpoint lets callers specify an arbitrary results file path and will write to it (arbitrary file write). These behaviors permit listing and partial modification of files outside the intended data directory when proxied/exposed.
Install Mechanism
No install spec or external downloads — the skill is instruction-only plus bundled code. This lowers supply-chain risk; nothing is fetched from remote URLs during install.
Credentials
The skill declares no environment variables or credentials (appropriate), but README and SKILL.md examples show calling external model APIs with Authorization: Bearer <API_KEY>. The skill does not declare or manage those credentials; the agent/user will need to supply them. Lack of explicit guidance about where to store API keys is a minor omission but not necessarily malicious.
Persistence & Privilege
The skill's metadata does not request persistent or elevated privileges, but the runtime instructions encourage system-wide changes (modifying /root permissions, adding nginx location blocks, restarting nginx) that require root. This combination (code that can write arbitrary paths + guidance to make data available under /root and to restart system services) increases the blast radius if the API is exposed or misused.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install sjht-data-annotation
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /sjht-data-annotation 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
# Changelog — data-annotation skill 所有日期格式为 YYYY-MM-DD。 ## [1.1.0] - 2026-03-19 ### 变更 - **计划驱动工作流**:新增 plan.json 标注计划机制,处理前先制定计划列出所有数据,逐条处理并更新进度 - **逐条处理防超时**:从批量处理改为每次只处理 1 条数据,处理完立即保存到 JSONL,避免超时丢失进度 - **进度汇报机制**:每处理完几条数据汇报进度(已处理 X/Y,耗时 N 秒),快超时时暂停并汇报 ### 修复(实战经验) - **视频抽帧密度增加**:每秒至少 2 帧,短视频至少 15 帧,中视频至少 20 帧,长视频至少 30 帧 - **nginx 配置教训**: - 不要创建独立 server 块监听 80(会冲突),改为在已有站点中添加 location - 使用 `^~` 前缀匹配避免正则 location 劫持 mp4/jpg 请求 - `/root` 目录权限必须 755,否则 nginx 无法访问 - nginx reload 可能不够,必要时 restart - **Web 页面修复**: - apiBase 必须用 nginx 反代路径(`/annotation-api/`),不硬编码 localhost:8888 - 所有文本字段 contentEditable,标签支持增删 - 未保存修改时离开页面要有 beforeunload 警告 - 视频文件通过 nginx 静态服务,不通过 API - **docx 读取**:新增 pandoc 备选方案(python-docx 失败时) ## [1.0.0] - 2026-03-19 ### 新增 - **完整工作流**:需求确认 → 数据读取 → 模型处理 → 标注生成 → 结果存储 → Web 查看/编辑 → Nginx 部署 - **SKILL.md**:7 步工作流程说明,包含模型选择策略、输出格式、部署流程 - **annotation-viewer.html**:Web 标注查看/编辑页面模板 - **annotation-api.py**:轻量 HTTP API 服务(文件列表/读取/保存) - **annotation-api.service**:systemd 服务模板 - **output-formats.md**:常见标注输出格式参考 - **skill.json**:skill 元数据配置 ### 设计决策 - 结果存储在数据同目录的 `results/` 子目录下 - 使用 JSONL 作为默认输出格式 - Web 页面为纯静态 HTML,通过 Python API 服务处理保存 - API 服务绑定 127.0.0.1,通过 nginx 反向代理对外提供访问 - 模型按数据类型选择:图像/视频用 VL 模型,文本用 LLM
元数据
Slug sjht-data-annotation
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

sjht-data-annotation 是什么?

通用数据标注处理工具。当用户提到需要数据标注、有标注任务、数据处理、数据集生成、 标注查看/编辑时使用此 skill。支持图像、视频、文本等多种数据类型,调用模型进行内容理解 和标注,生成结构化标注数据,提供 Web 查看编辑界面。 触发短语:「标注」「annotation」「数据集」「label」「tag da... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 231 次。

如何安装 sjht-data-annotation?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install sjht-data-annotation」即可一键安装,无需额外配置。

sjht-data-annotation 是免费的吗?

是的,sjht-data-annotation 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

sjht-data-annotation 支持哪些平台?

sjht-data-annotation 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 sjht-data-annotation?

由 Aowind(@aowind)开发并维护,当前版本 v1.0.0。

💬 留言讨论