Description

Use when the user asks to batch-search candidates, verify public web evidence, dedupe results, and organize them into Feishu/Lark docs. Use especially for re...

README (SKILL.md)

\r \r

brightdata-research\r

Name: brightdata-research
Author: 16miku

\r

GitHub: https://github.com/16Miku/brightdata-research-skill\r ClawHub: https://clawhub.ai/16miku/brightdata-research\r \r 把"批量搜索 + 网页抓取 + 候选验证 + 结构化整理 + 飞书追加写入"做成一个稳定、可复用的研究流水线。\r \r

执行模式\r

\r 本 skill 有两种执行模式。根据环境状态自动选择。\r \r

Mode A — 直接执行\r

\r 前提：搜索、抓取、飞书写入能力均已就绪。\r 行为：跳过环境准备，直接进入 Step 0 开始研究流程。\r \r

Mode B — 环境准备 + 执行\r

\r 前提：首次使用，或 preflight 发现缺少关键能力。\r 行为：先按 references/environment-checklist.md 逐项检查并修复，然后进入 Mode A。\r \r 环境准备的自动修复顺序见 references/lark-cli-install-and-auth.md 和 references/brightdata-mcp-setup.md。\r \r

核心原则\r

\r

搜索和抓取可以并行。\r
最终去重、风险分层、飞书写入必须由主代理串行完成。\r
先汇总，再写入。 不要边搜索边直接写飞书。\r
保留 evidence。 每条候选都应尽量保留公开证据链接。\r
环境不齐就降级。 缺搜索、抓取、飞书、subagent 或 git/worktree 条件时，明确说明并切到 fallback。\r
不要依赖脆弱的 shell 多行拼接。 写飞书时优先构造稳定的完整 Markdown。\r
上下文复用。 如果当前对话已有历史候选池或目标文档信息，直接复用，不要重复询问用户。\r \r

标准工作流\r

\r

Step 0. 明确本轮目标\r

\r 从用户请求或历史上下文提取：\r

研究主题\r
目标数量\r
范围 / 国家 / 语言 / 模型范围\r
已有候选池或目标飞书文档\r
是"继续追加"还是"新建文档"\r
是否允许使用 subagent\r \r 上下文复用规则： 如果当前对话里已经出现过目标文档 URL/ID、历史候选列表、或研究主题，直接复用这些信息，不要再问用户"请提供文档 ID"。\r \r

Step 1. Preflight 环境检查\r

\r 按 references/environment-checklist.md 检查：\r \r | 能力 | 检查方式 | 缺失时行为 |\r |------|----------|------------|\r | 搜索 | 检查 BrightData MCP 工具或 CLI 是否可用 | 不能扩充候选池，只能验证用户给定名单 |\r | 抓取 | 检查 BrightData scrape 工具或 CLI 是否可用 | 只输出低置信度线索 |\r | 飞书写入 | 检查 lark-cli / lark-doc skill 是否可用 | 先输出 Markdown，告知用户未写入飞书 |\r | 目标文档 | 检查上下文是否有 doc_id / URL | 询问用户：新建还是追加 |\r | 历史去重 | 尝试读取已有文档内容 | 只做本轮内部去重，声明无法保证历史去重 |\r | subagent | 检查 git 仓库和 HEAD 是否可解析 | 改为主代理串行执行 |\r \r 如果缺失项可自动修复（如 lark-cli 未安装），按 Mode B 修复后继续。\r 如果缺失项无法自动修复（如用户未提供 API token），明确告知用户并降级。\r \r

Step 2. 制定搜索批次\r

\r 把任务拆成多个独立批次：\r

不同 query 变体\r
不同语言关键词\r
不同来源入口（官网、文档、pricing、help、faq、terms、privacy）\r
不同平台类别关键词（gateway、aggregator、relay、OpenAI-compatible API 等）\r \r

Step 3. 并行搜索与初筛\r

\r 优先使用 BrightData 搜索和抓取工具：\r

搜索候选平台\r
获取官网、文档页、定价页、条款页等公开入口\r
记录标题、URL、摘要、来源 query\r \r 初筛时保留高相关候选，剔除明显无关页、镜像页、纯广告页。\r \r

Step 4. 去重\r

\r 去重分两阶段：\r \r 阶段 A — 本轮内部去重：\r

域名规范化：去掉 www/http(s)/尾部斜杠，统一小写\r
品牌别名识别：同一平台可能有多个域名或品牌名（如 openrouter.ai 和 OpenRouter），应识别为同一候选\r
保留证据更完整、官网性更强的一条\r \r 阶段 B — 历史去重（如果能读取历史文档）：\r
读取已有飞书文档内容\r
提取历史候选名单（名称 + 域名）\r
与本轮候选交叉比对\r
已在历史文档中出现的，不重复写入，但在去重说明中列出\r \r 如果无法读取历史文档，只做阶段 A，并明确声明。\r \r

Step 5. 结构化字段提取\r

\r 默认推荐字段：\r

名称\r
官网\r
文档/API 页\r
定价页或价格线索\r
支持模型证据\r
OpenAI-compatible / 统一 API 兼容证据\r
初步风险等级\r
备注\r \r 如果用户有自定义字段，优先满足用户字段 schema。\r \r

Step 6. 风险分层\r

\r 使用 checklist 式评分：\r \r | 维度 | 有=1分 | 无=0分 |\r |------|--------|--------|\r | 可访问的官网 | 1 | 0 |\r | 公开 API 文档 | 1 | 0 |\r | 定价页或明确价格信息 | 1 | 0 |\r | Terms of Service / Privacy Policy | 1 | 0 |\r | 可查证的公司/团队主体 | 1 | 0 |\r | OpenAI-compatible 或统一 API 兼容证据 | 1 | 0 |\r \r 分层规则：\r

A / 较低风险（5-6 分）：公开资料完整，文档与能力证据充足\r
B / 中风险（3-4 分）：有一定公开证据，但部分维度需补验\r
C / 高风险 / 待验证（0-2 分）：主要依赖搜索摘要，暂不适合高置信纳入\r \r 每条候选附一句风险原因。\r \r

Step 7. 主代理统一收口\r

\r 主代理负责：\r

汇总所有候选\r
最终去重\r
字段格式统一\r
风险口径统一\r
决定哪些算"新增不重复候选"\r
生成最终写入飞书的 Markdown\r \r

Step 8. 串行写入飞书\r

\r 如果用户要求写入飞书文档：\r

先遵守 lark-shared 与 lark-doc 的认证和安全规则\r
复用现有文档 ID；若无则按用户意图新建或先确认\r
默认使用 --as user 访问用户自己的文档\r
以统一模板生成一轮完整 Markdown\r
由主代理一次性或顺序串行追加写入\r \r 不要让 subagent 直接写同一个飞书文档。\r \r

输出格式\r

\r 默认按下面结构向用户汇报，并尽量按同结构写入飞书文档：\r \r

## 第X轮新增候选（来源说明）\r
\r
### 1. 平台名称\r
- 官网：\r
- 文档：\r
- 定价：\r
- 支持模型证据：\r
- OpenAI 兼容证据：\r
- 初步风险：A/B/C（得分 X/6，原因：...）\r
- 备注：\r
\r
## 本轮待进一步验证候选\r
...\r
\r
## 本轮去重说明\r
- 本轮内部去重：哪些被合并\r
- 历史去重：哪些平台已在历史轮次出现，因此不重复写入\r
\r
## 本轮阶段性结论\r
- 本轮新增较高可信候选：\r
- 本轮新增待验证候选：\r
- 下一步建议：\r
```\r
\r
如果用户没有要求写飞书，也建议先按这个模板输出到对话中。\r
\r
## 何时调用 subagent\r
\r
适合调用 subagent 的场景：\r
- 需要扩展多组搜索 query\r
- 需要并行搜索多个类别或多个国家 / 语言\r
- 需要对多个候选分别做公开网页核验\r
- 需要快速拉回 5-10 个新候选并形成候选池\r
\r
### subagent 可负责\r
- 搜索 query 扩展\r
- 搜索结果拉取\r
- 单个平台公开信息初步核验\r
- 初步结构化字段整理\r
\r
### subagent 不应负责\r
- 最终历史去重判定\r
- 最终风险分层定稿\r
- 飞书文档写入\r
- 最终面向用户的主结论\r
\r
如果环境不满足 subagent/worktree 前置条件，改为主代理串行执行。详见 `references/subagent-git-prerequisites.md`。\r
\r
## 子工作流：文档去重清理\r
\r
当用户要求"检查飞书文档有没有重复"或"去重"时，执行以下子流程：\r
\r
1. 读取目标飞书文档全文\r
2. 提取所有候选平台的名称和域名\r
3. 域名规范化 + 品牌别名识别\r
4. 找出完全重复条目（同一平台在不同轮次被当作新候选重复写入）\r
5. 区分"重复"与"补充验证"（后续轮次对已有平台补充新证据，不算重复）\r
6. 向用户报告发现的重复项，由用户确认后删除\r
\r
## 边界与禁忌\r
\r
- 不要把搜索结果摘要当成已核验事实；能补官网或文档页就尽量补。\r
- 不要把营销话术直接当能力证明；优先找 docs、pricing、terms、privacy、company 等公开页。\r
- 不要为了追求数量忽略去重；"新增不重复候选"比"看起来很多"更重要。\r
- 不要在写飞书前跳过统一整理步骤。\r
- 不要多个 agent 并发写同一文档。\r
- 不要输出无法回溯来源的结论。\r
- 不要假装环境齐全；缺前置条件时应明确说明并降级。\r
\r
## 成功标准\r
\r
以下条件大致满足时，可认为本轮执行成功：\r
- 成功找到了本轮新增、不重复的候选\r
- 候选至少具备核心结构化字段\r
- 已明确区分"较高可信"与"待进一步验证"\r
- 风险评级基于 checklist 打分，而非纯主观判断\r
- 最终写入飞书时格式稳定、换行正常、无明显重复\r
- 在新环境里也能先做 preflight，再决定完整执行还是降级执行\r
\r
## 参考文档索引\r
\r
| 文档 | 用途 |\r
|------|------|\r
| `references/environment-checklist.md` | Preflight 检查清单，区分可自动修复和需用户介入的项 |\r
| `references/brightdata-mcp-setup.md` | BrightData MCP 和 CLI 的安装、认证与验证 |\r
| `references/lark-cli-install-and-auth.md` | lark-cli 安装、配置、认证的完整步骤 |\r
| `references/feishu-setup.md` | 飞书文档写入规则和身份选择 |\r
| `references/known-failures-and-fallbacks.md` | 常见失败场景和降级策略 |\r
| `references/subagent-git-prerequisites.md` | subagent/worktree 的前置条件和降级规则 |\r
| `references/smoke-tests.md` | 每项能力的最小验证命令 |\r

Usage Guidance

This skill appears to do what it claims: it uses BrightData for parallel search/scraping, dedupes and risk-scores results, and can append structured Markdown to Feishu/Lark docs via lark-cli. Before installing or running it, consider: (1) It will ask you to provide BrightData API tokens and to authorize lark-cli (Feishu) — only provide credentials you trust and prefer least-privilege tokens. (2) The skill's instructions include running system commands (npm install -g, npx skills add, git init/commit, claude mcp add) which will modify the environment; review/approve these commands before allowing execution. (3) If you need to avoid global installs or git changes, run the skill in an isolated environment or a disposable container/session. (4) Because the manifest is instruction-only, the skill does not itself store credentials in code, but the agent executing these instructions will need your tokens to read/write Feishu documents — confirm how your agent handles secrets. (5) If you are concerned, test on a non-sensitive Feishu doc or use read-only trials first. Overall the skill is internally coherent with its purpose, but exercise normal caution around credentials and system-level installs.

Capability Analysis

Type: OpenClaw Skill Name: brightdata-research Version: 1.0.0 The brightdata-research skill bundle provides a legitimate workflow for batch web research, data extraction, and structured reporting to Feishu/Lark documents. It utilizes standard tools like BrightData (for search/scraping) and lark-cli (for document management). The instructions in SKILL.md and the reference documents emphasize security best practices, such as requiring manual user authentication for API tokens, using the '--as user' flag for document access, and providing clear fallback mechanisms when environment dependencies are missing. No evidence of data exfiltration, malicious execution, or prompt injection for harmful purposes was found; the automated installation of CLI tools (npm install -g) is explicitly documented as part of the environment setup process.

Capability Tags

requires-oauth-token

Capability Assessment

✓ Purpose & Capability

The skill name/description describe batch web search, scraping, dedupe, risk-scoring, and writing to Feishu/Lark; the SKILL.md and reference docs consistently require BrightData for search/scrape and lark-cli for Feishu writes. No unrelated services or secrets are requested. Note: the manifest lists no declared required env vars, yet the runtime instructions require the user to provide BrightData API keys and Feishu auth (this is expected for an instruction-only skill but is a documentation gap in metadata rather than a mismatch of purpose).

ℹ Instruction Scope

The instructions legitimately instruct the agent to check environment state, call MCP tools or CLI, read Feishu docs, and optionally run subagents. These actions are within the skill's stated remit (preflight checks, reading/writing target Feishu docs, git/worktree checks). Important: the skill tells the agent to run system commands (npm installs, git init/commit, lark-cli calls, brightdata CLI calls) and to read remote/local document contents — all of which are powerful actions but consistent with the purpose.

ℹ Install Mechanism

There is no install spec in the manifest (instruction-only), but the docs explicitly instruct installing packages via npm (e.g., @brightdata/cli, @larksuite/cli) and adding skills via npx. These are standard package sources (npm, GitHub). No untrusted direct-download URLs or obfuscated installers are present; automatic global installs and npx commands require elevated privileges and network access, so the user should expect those side effects.

ℹ Credentials

The skill requires sensitive credentials at runtime (BrightData API token / MCP config and Feishu/Lark authentication) and requires Node/npm and possibly git. Those credentials are proportionate to the skill's functionality. However, the manifest does not explicitly declare required env vars or a primary credential — the references and SKILL.md do instruct how to provide them (including BRIGHTDATA_API_KEY examples). This is a documentation gap: the skill will need those secrets to operate but they are not enumerated in the registry metadata.

ℹ Persistence & Privilege

The skill does not request always:true and does not modify other skills. It does instruct commands that modify the system (npm global installs, git init/commit, adding MCP entries) and will read/write user Feishu documents when authorized. Those are expected for the workflow but have real side effects and require user consent/credentials.

Version History

v1.0.0

首次发布：批量搜索+抓取+去重+风险分层+飞书写入研究流水线

Metadata

Slug brightdata-research

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is brightdata-research?

Use when the user asks to batch-search candidates, verify public web evidence, dedupe results, and organize them into Feishu/Lark docs. Use especially for re... It is an AI Agent Skill for Claude Code / OpenClaw, with 97 downloads so far.

How do I install brightdata-research?

Run "/install brightdata-research" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is brightdata-research free?

Yes, brightdata-research is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does brightdata-research support?

brightdata-research is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created brightdata-research?

It is built and maintained by 16Miku (@16miku); the current version is v1.0.0.

More Skills

brightdata-research