← 返回 Skills 市场
bendusy

mlx-local-inference

作者 bendusy · GitHub ↗ · v2.2.1 · MIT-0
darwin ✓ 安全检测通过
772
总下载
2
收藏
1
当前安装
7
版本数
在 OpenClaw 中安装
/install mlx-local-inference
功能描述
Use when calling local AI on this Mac — text generation, embeddings, speech-to-text, OCR, or image understanding. LLM/VLM via oMLX gateway at localhost:8000/...
使用说明 (SKILL.md)

MLX Local Inference Stack

Local AI inference on Apple Silicon. oMLX handles LLM/VLM with continuous batching. Python libraries handle Embedding/ASR/OCR directly via uv.

Architecture

┌─────────────────────────────────────┐
│  oMLX (localhost:8000/v1)           │
│  - LLM (Qwen3.5-35B, etc.)          │
│  - VLM (vision-language models)     │
│  - Continuous batching + SSD cache  │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│  Python Libraries (via uv run)      │
│  - mlx-lm: Embedding                │
│  - mlx-vlm: OCR (PaddleOCR-VL)      │
│  - mlx-audio: ASR (Qwen3-ASR)       │
└─────────────────────────────────────┘

Models

Capability Implementation Model Size
💬 LLM oMLX API Qwen3.5-35B-A3B-4bit ~20 GB
👁️ VLM oMLX API Any mlx-vlm model varies
📐 Embed mlx-lm (uv) Qwen3-Embedding-0.6B-4bit-DWQ ~1 GB
🎤 ASR mlx-audio (uv) Qwen3-ASR-1.7B-8bit ~1.5 GB
👁️ OCR mlx-vlm (uv) PaddleOCR-VL-1.5-6bit ~3.3 GB

Usage

LLM / Vision-Language (via oMLX API)

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="local")

# Text generation
resp = client.chat.completions.create(
    model="Qwen3.5-35B-A3B-4bit",
    messages=[{"role": "user", "content": "Hello"}]
)
print(resp.choices[0].message.content)

Embeddings (via mlx-lm + uv)

uv run --with mlx-lm python -c "
from mlx_lm import load
model, tokenizer = load('~/models/Qwen3-Embedding-0.6B-4bit-DWQ')
text = 'text to embed'
inputs = tokenizer(text, return_tensors='np')
embeddings = model(**inputs).last_hidden_state.mean(axis=1)
print(embeddings.shape)
"

ASR — Speech-to-Text (via mlx-audio + uv)

Important: Must run with --python 3.11 to avoid OpenMP threading issues (SIGSEGV).

uv run --python 3.11 --with mlx-audio python -m mlx_audio.stt.generate \
  --model ~/models/Qwen3-ASR-1.7B-8bit \
  --audio "audio.wav" \
  --output-path /tmp/asr_result \
  --format txt \
  --language zh \
  --verbose

OCR (via mlx-vlm + uv)

Important: The generate function parameter order must be (model, processor, prompt, image).

cat \x3C\x3C 'PY_EOF' > run_ocr.py
import os
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template

model_path = os.path.expanduser("~/models/PaddleOCR-VL-1.5-6bit")
model, processor = load(model_path)
prompt = apply_chat_template(processor, config=model.config, prompt="OCR:", num_images=1)

output = generate(model, processor, prompt, "document.jpg", max_tokens=512, temp=0.0)
print(output.text)
PY_EOF

uv run --python 3.11 --with mlx-vlm python run_ocr.py

Service Management (oMLX only)

# Check running models
curl http://localhost:8000/v1/models

# Restart oMLX
launchctl kickstart -k gui/$(id -u)/com.omlx-server

Model Storage Strategy

All models stored in ~/models/ using oMLX-compatible structure:

~/models/
├── Qwen3-Embedding-0.6B-4bit-DWQ/
├── Qwen3-ASR-1.7B-8bit/
├── PaddleOCR-VL-1.5-6bit/
└── Qwen3.5-35B-A3B-4bit/

Requirements

  • Apple Silicon Mac (M1/M2/M3/M4)
  • uv installed (curl -LsSf https://astral.sh/uv/install.sh | sh)
安全使用建议
This skill appears to do what it says: operate local ML models via a local oMLX gateway and the 'uv' runner. Before installing or following the SKILL.md: 1) Verify the 'uv' installer URL (https://astral.sh) is legitimate to you — avoid running arbitrary curl | sh unless you trust the source; prefer a package manager if available. 2) Confirm you have enough disk space and have placed large model files under ~/models as described. 3) The skill talks to localhost:8000 only; ensure that oMLX is intentionally running and not exposed to untrusted networks. 4) Restarting via launchctl affects only your user service; it requires appropriate user permissions but is not a system-wide change. If you need higher assurance, ask the skill author for a formal install spec or an auditable package (Homebrew formula or repository) and for cryptographic checksums for any model or installer downloads.
功能分析
Type: OpenClaw Skill Name: mlx-local-inference Version: 2.2.1 The skill bundle provides legitimate instructions and code snippets for running local AI inference on Apple Silicon using the MLX framework and the oMLX gateway. It utilizes the 'uv' package manager for handling dependencies (mlx-lm, mlx-vlm, mlx-audio) and interacts with a local API at localhost:8000, with no evidence of data exfiltration, malicious execution, or unauthorized persistence mechanisms.
能力评估
Purpose & Capability
Name/description (local inference via oMLX and uv) align with the runtime instructions: calls to localhost:8000, uv run invocations, and references to ~/models are consistent with running models locally on macOS.
Instruction Scope
SKILL.md only instructs the agent to call a local HTTP API, run uv to invoke Python model libraries, read model files under ~/models, and use launchctl for the local oMLX service. It does not attempt to read unrelated system files, request unrelated environment variables, or exfiltrate data to remote endpoints.
Install Mechanism
There is no formal install spec in the registry; the SKILL.md recommends installing uv via 'curl -LsSf https://astral.sh/uv/install.sh | sh'. Download-and-exec installer instructions are common for CLIs but are higher risk than package manager installs — verify the source before running. No other installers or remote code downloads are required by the skill itself.
Credentials
The skill requests no environment variables or credentials and only requires the 'uv' binary and an Apple Silicon macOS environment, which is proportionate to local model execution. Model files are referenced under the user's home (~), which is expected.
Persistence & Privilege
The skill is not always-on, does not request elevated platform privileges, and does not modify other skills or system-wide configs beyond invoking a user launchctl command to restart the local oMLX service (which affects only the user's launchd job).
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install mlx-local-inference
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /mlx-local-inference 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.2.1
Sync updates from GitHub: Qwen3.5-35B model update
v2.2.0
One-line agent install, keep-alive/on-demand loading, multi-language mix support, model upgrade guide
v2.1.0
Rewrite READMEs: unified hear/see/read/speak/think narrative, model selection rationale, agent integration architecture
v2.0.0
Clean release: LLM, ASR, Embedding, OCR, TTS, Transcribe — all local on Apple Silicon via MLX. No private data.
v1.1.0
Security fix: remove all private IPs and hardcoded user paths. Rewrite README with clean formatting.
v1.0.1
Add README.md with full documentation, architecture diagram, quickstart guide, and MIT license
v1.0.0
Initial release: LLM, ASR, Embedding, OCR, TTS, Transcribe daemon — all local on Apple Silicon via MLX
元数据
Slug mlx-local-inference
版本 2.2.1
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 7
常见问题

mlx-local-inference 是什么?

Use when calling local AI on this Mac — text generation, embeddings, speech-to-text, OCR, or image understanding. LLM/VLM via oMLX gateway at localhost:8000/... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 772 次。

如何安装 mlx-local-inference?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install mlx-local-inference」即可一键安装,无需额外配置。

mlx-local-inference 是免费的吗?

是的,mlx-local-inference 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

mlx-local-inference 支持哪些平台?

mlx-local-inference 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(darwin)。

谁开发了 mlx-local-inference?

由 bendusy(@bendusy)开发并维护,当前版本 v2.2.1。

💬 留言讨论