hugging-face-api
/install hugging-face-api
Hugging Face Agent Skill
A playbook for agents that use the Hugging Face MCP server. Follow these steps in order. Discover for free first; run billed inference only against confirmed-supported models.
1. Name
Hugging Face — open-source model and dataset discovery plus OpenAI-compatible inference (chat and embeddings) across inference providers, via 7 MCP tools.
2. Purpose
Use this skill to find open-source models and datasets on the Hugging Face Hub, confirm which models are runnable through the Inference router, and run chat completions and embeddings — while controlling cost, respecting licenses, and keeping the access token secret.
3. When to use Hugging Face
Use it when the task involves:
- Open-source models (Llama, Qwen, Mistral, BGE, sentence-transformers, etc.).
- Model or dataset discovery — search/inspect the Hub catalog.
- OpenAI-compatible inference across providers — one interface, many providers.
- Embeddings — vectors for semantic search, RAG, clustering.
4. When NOT to use it
- If you need a specific closed/proprietary model (e.g. a vendor's flagship), call that vendor's provider directly.
- If the task needs no model at all (pure local computation), skip inference.
- If a cheaper or already-integrated tool already solves the task, use it.
5. Environment
Set one secret:
| Variable | Required | Notes |
|---|---|---|
HF_TOKEN |
Yes | hf_.... Get it at https://huggingface.co/settings/tokens. Never expose it. |
Optional: HF_HUB_BASE_URL, HF_ROUTER_BASE_URL, HF_TIMEOUT_MS, HF_MAX_RETRIES, LOG_LEVEL.
6. Operations (the 7 tools)
| Tool | Use it to | Cost |
|---|---|---|
hf_search_models |
Search Hub models | Free |
hf_model_info |
Inspect one model (license, task) | Free |
hf_search_datasets |
Search Hub datasets | Free |
hf_list_inference_models |
List models runnable via router | Free |
hf_chat |
OpenAI-style chat completion | Billed |
hf_embeddings |
Embedding vectors | Billed |
hf_request |
Reach any other Hub/router endpoint | Depends |
7. Discovery workflow (FREE)
Do this first; it costs nothing.
hf_search_models— find candidates by task/author/popularity.hf_model_info— checkpipeline_tagandcardData.license.hf_search_datasets— find data if needed.hf_list_inference_models— confirm the chosen model is actually runnable.
8. Inference workflow (BILLED)
- Choose a model that appears in
hf_list_inference_models. - For chat: call
hf_chatwith OpenAI-stylemessagesand a boundedmax_tokens. - For vectors: call
hf_embeddingswith a batch ofinputs(default modelsentence-transformers/all-MiniLM-L6-v2). - Report the model id and the returned
usage.
9. Cost control
- Hub discovery is free — use it liberally.
- Inference is billed per provider — always:
- Set
max_tokensonhf_chat. - Prefer smaller models when quality allows.
- Batch embeddings (array
inputs) instead of per-item calls. - Cache embeddings and deterministic completions.
- Set
10. Error handling
| Error | Reaction |
|---|---|
model_not_supported (402/403) |
Call hf_list_inference_models, pick a listed model, retry. |
401 invalid token |
Stop. Fix HF_TOKEN. Do not retry blindly. |
402 credits |
Stop. Add credits or use a cheaper/free model. |
429 rate limit |
Back off (server retries); slow down, batch, cache. |
11. Security
- Never print, log, or echo the
hf_token. The server redacts it; do not undo that. - Use a least-privilege token (read for discovery; inference only where needed).
- Use placeholders (
your_hf_token) in any shared config.
12. Reproducibility / model pinning
- Use exact model ids (and a revision/commit if available) so runs are repeatable.
- Use the same embedding model for indexing and querying in RAG.
13. Licensing
- Before downstream use, check the model card's license (
hf_model_info→cardData.license). - Respect usage restrictions (commercial use, redistribution, gated access).
14. Agent checklist
- Confirmed Hugging Face is the right tool (open-source / discovery / embeddings).
- Discovered model via
hf_search_models/hf_model_info(free). - Confirmed it is runnable via
hf_list_inference_models. - Checked the license.
- Set
max_tokens(chat) / batched inputs (embeddings). - Did not expose the token.
- Cited the exact model id and reported
usage.
15. Example workflows
- Find a model → run chat:
hf_search_models→hf_model_info→hf_list_inference_models→hf_chat. Seerecipes/find-and-run-model.md. - Build embeddings for RAG:
hf_embeddings(batch) → store → query. Seerecipes/build-embeddings.md. - Dataset lookup:
hf_search_datasets→hf_requestfor details. Seerecipes/dataset-discovery.md.
16. Common mistakes
- Calling
hf_chatbefore confirming the model is supported (causesmodel_not_supported). - One embedding call per item instead of a batch (slow and costly).
- Skipping the license check.
- Exposing the token in logs or output.
- Omitting
max_tokens, leading to runaway generation cost.
17. Maintenance
- The runnable model list changes — re-run
hf_list_inference_modelsrather than hardcoding ids. - Re-check licenses when adopting a new model.
- Rotate
HF_TOKENperiodically. - Confirm endpoint/provider details against https://huggingface.co/docs when behavior changes.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install hugging-face-api - 安装完成后,直接呼叫该 Skill 的名称或使用
/hugging-face-api触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
hugging-face-api 是什么?
Search and discover Hugging Face open-source models and datasets, then run OpenAI-compatible chat or embedding inference securely with cost control. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 52 次。
如何安装 hugging-face-api?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install hugging-face-api」即可一键安装,无需额外配置。
hugging-face-api 是免费的吗?
是的,hugging-face-api 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
hugging-face-api 支持哪些平台?
hugging-face-api 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 hugging-face-api?
由 Simon-Pierrre Boucher(@simonpierreboucher02)开发并维护,当前版本 v1.0.0。