Description

搜索、下载、上传并管理 Internet Archive (archive.org) 公开和私有存档项目及其元数据，支持高级查询和文件操作。

README (SKILL.md)

Internet Archive Skill

Name: Internet Archive Skill
Author: grill-glitch

提供与 Internet Archive (archive.org) 交互的能力：搜索、下载、上传和管理存档内容。

描述

此技能允许你：

搜索 Internet Archive 目录（支持高级查询语法）
下载任何公开存档项目的文件
上传文件到你的存档账户（需要认证）
管理已上传项目的元数据
列出项目中的文件
自动安装和配置 ia CLI 工具

触发器

"搜索 Internet Archive"
"从 archive.org 下载"
"上传到 Internet Archive"
"管理存档元数据"
"检查 Internet Archive 工具"
"安装 ia 命令行工具"
"存档相关"、"archive.org"

输入

用户请求与 Internet Archive 相关的操作。

输出

执行具体的 archive.org 操作，返回搜索结果、下载状态、上传确认等信息。

实现

主脚本: internet-archive.py
参考文档: references/

使用方式

直接调用 Python 脚本：

python3 skills/internet-archive/internet-archive.py \x3Cintent> [参数]

支持的 Intents

`check` - 检查工具状态

检查 ia CLI 是否已安装和配置。

python3 skills/internet-archive/internet-archive.py check

`install` - 安装 ia CLI

自动安装 internetarchive 包（使用 uv/pipx/pip）。

python3 skills/internet-archive/internet-archive.py install

`search` - 搜索存档

搜索 Internet Archive 目录。

# 基本搜索
python3 skills/internet-archive/internet-archive.py search "collection:nasa mediatype:image"

# 仅返回标识符列表
python3 skills/internet-archive/internet-archive.py search "public domain books" --itemlist

# 全文搜索
python3 skills/internet-archive/internet-archive.py search "climate change" --fts

# 排序和字段过滤
python3 skills/internet-archive/internet-archive.py search "mediatype:movies" --sort="downloads desc" --field=title --field=identifier

常用参数：

--itemlist - 仅输出标识符，每行一个
--fts - 全文搜索（在文本内容中搜索，而非仅元数据）
--sort="field asc|desc" - 排序（如 downloads desc, date asc）
--field=\x3Cname> - 只返回指定字段（可重复使用）
--parameters="rows=20&page=1" - 原始查询参数

`download` - 下载项目文件

下载 Internet Archive 项目的文件。

# 下载整个项目
python3 skills/internet-archive/internet-archive.py download \x3Cidentifier>

# 只下载特定扩展名文件
python3 skills/internet-archive/internet-archive.py download \x3Cidentifier> --glob="*.pdf"

# 排除某些文件
python3 skills/internet-archive/internet-archive.py download \x3Cidentifier> --exclude="*low*"

# 下载到指定目录
python3 skills/internet-archive/internet-archive.py download \x3Cidentifier> --destdir=./downloads

# 预览（不实际下载）
python3 skills/internet-archive/internet-archive.py download \x3Cidentifier> --dry-run

# 只下载原始文件，跳过衍生物
python3 skills/internet-archive/internet-archive.py download \x3Cidentifier> --source=original

`upload` - 上传文件

上传文件到你的 Internet Archive 账户（需要认证）。

python3 skills/internet-archive/internet-archive.py upload \x3Cidentifier> file1.pdf file2.jpg \
  --metadata="mediatype:texts" \
  --metadata="title:My Document" \
  --metadata="creator:Your Name"

# 上传并验证
python3 skills/internet-archive/internet-archive.py upload \x3Cid> document.pdf \
  --metadata="mediatype:texts" --checksum --verify

# 跳过衍生物生成（更快上传）
python3 skills/internet-archive/internet-archive.py upload \x3Cid> video.mp4 \
  --metadata="mediatype:movies" --no-derive

必需参数：

\x3Cidentifier> - 项目标识符（唯一、仅 ASCII 字母数字、连字符）
files - 要上传的文件路径列表
--metadata="key:value" - 元数据（至少需要 mediatype）

可选参数：

--checksum - 使用校验和跳过已上传的文件
--verify - 上传后验证数据完整性
--no-derive - 跳过自动衍生物生成
--retries=N - 重试次数（默认 0）

`metadata` - 查看/修改元数据

查看或修改项目元数据。

# 查看完整元数据（JSON 格式）
python3 skills/internet-archive/internet-archive.py metadata \x3Cidentifier>

# 列出包含的文件格式
python3 skills/internet-archive/internet-archive.py metadata \x3Cidentifier> --formats

# 修改字段
python3 skills/internet-archive/internet-archive.py metadata \x3Cidentifier> --modify="title:New Title"

# 添加值到列表字段（如 subjects）
python3 skills/internet-archive/internet-archive.py metadata \x3Cidentifier> --append-list="subject:new topic"

# 删除字段
python3 skills/internet-archive/internet-archive.py metadata \x3Cidentifier> --modify="oldfield:REMOVE_TAG"

`list` - 列出项目文件

列出 Internet Archive 项目中的所有文件。

# 列出所有文件
python3 skills/internet-archive/internet-archive.py list \x3Cidentifier>

# 只显示名称和大小
python3 skills/internet-archive/internet-archive.py list \x3Cidentifier> --columns=name,size

# 显示完整下载 URL
python3 skills/internet-archive/internet-archive.py list \x3Cidentifier> --location

# 显示所有文件信息（包含格式、校验和等）
python3 skills/internet-archive/internet-archive.py list \x3Cidentifier> --all --verbose

配置和认证

上传和修改元数据需要认证。搜索和下载公开项目不需要。

配置步骤

创建 Internet Archive 账户：https://archive.org/account/signup
获取 S3 兼容 API 密钥：https://archive.org/account/s3.php
- 点击 "Create new keys" 或使用现有密钥
- 复制 Access Key ID 和 Secret Access Key
运行配置命令：
```
ia configure
```
按提示输入密钥。

或者，设置环境变量：

export IA_ACCESS_KEY_ID="your-access-key"
export IA_SECRET_ACCESS_KEY="your-secret-key"

配置文件保存在 ~/.config/ia.ini。

验证配置：

ia configure --whoami  # 应显示你的用户名

User-Agent 要求

所有请求必须包含明确的 User-Agent。ia CLI 默认包含你的访问密钥。对于自动化工具，建议设置自定义后缀：

ia --user-agent-suffix "OpenClaw/1.0"

或在 ~/.config/ia.ini 中：

[general]
user_agent_suffix = OpenClaw/1.0

这有助于 Internet Archive 跟踪来源并维护服务质量。

自动安装

如果 ia 命令未找到，此技能会提示安装。支持的安装方式（按顺序尝试）：

uv tool install internetarchive（推荐）
pipx install internetarchive
pip install internetarchive

你需要 Python 3.9+ 和 pip/uv/pipx 之一。

重要概念

项目（Item）

archive.org 的基本单位。一个项目是一组相关文件的逻辑集合（一本书、一张专辑、一个数据集等）。每个项目有唯一的标识符。

项目包含：

原始上传文件
衍生物（系统自动生成的转换版本）
\x3Cidentifier>_meta.xml - 项目级元数据
\x3Cidentifier>_files.xml - 文件级元数据

集合（Collection）

项目必须属于一个集合。只有 IA 员工可以创建新集合（通常需要至少 50 个项目）。公开上传集合包括：

opensource_movies, opensource_audio, opensource_media
community_texts, community_video, community_audio

衍生物（Derivatives）

上传后，IA 自动生成衍生物（不同格式和分辨率的转换版本）：

视频 → h.264, Ogg, 多种码率
音频 → MP3, Ogg Vorbis, FLAC
文本/书籍 → OCR, 可搜索 PDF, EPUB, DjVu
图片 → 缩略图, JPEG 2000

使用 --no-derive 跳过衍生物生成（更快上传，但内容访问性降低）。

元数据模式

必需字段：identifier, mediatype

推荐字段：title, description, creator, date, subject, collection, language

标识符要求：

仅 ASCII 字母数字、下划线、连字符、句点
以字母数字开头
5-100 字符（5-80 推荐）
一旦设置不可更改

最佳实践

上传前测试 - 使用 test_collection 验证上传（30 天后自动删除）
使用有意义的标识符 - 小写、连字符分隔、描述性
完整的元数据 - 至少包含 title, creator, description
检查标识符冲突 - ia metadata \x3Cid> 看是否已存在
大文件分卷 - 大文件集打包成 ZIP 再上传
使用校验和 - --checksum 支持断点续传
速率限制 - 批量操作时添加延迟（如 GNU Parallel 的 --delay 1）
预览操作 - 使用 --dry-run 预览下载/上传
保护敏感日志 - 避免在日志中记录用户搜索词、歌曲 ID、歌词内容等

关于日志保护：

使用 BuildConfig.DEBUG 保护所有日志输出
不在任何日志中记录用户隐私数据
开发日志使用 Log.d()，生产环境自动静默

常见错误及解决

错误	解决方案
"not configured"	运行 `ia configure` 或设置环境变量
"identifier exists"	选择其他标识符（不可更改）
"permission denied"	检查 https://archive.org/account/s3.php 的密钥
"network error"	重试操作，检查网络连接
"429 Too Many Requests"	速率限制；等待 Retry-After 指定时间
项目未出现在搜索中	通常几分钟内出现；最长 24 小时；检查 `ia tasks \x3Cidentifier>`

参考资源

注意事项

此技能依赖外部 ia 命令；确保安装 internetarchive Python 包
上传和元数据修改需要有效的 Internet Archive 账户和 API 密钥
项目标识符一旦创建不可更改，选择时需谨慎
大文件上传可能需要较长时间，建议使用 --checksum 支持断点续传
遵循 Internet Archive 的使用条款和服务政策

Usage Guidance

This skill appears coherent for interacting with archive.org, but before installing: (1) only provide IA S3 credentials if you trust the skill and understand uploads will occur under that account; (2) prefer manually running the documented install command (pipx/pip/uv) rather than allowing automatic installs; (3) review the included python file if you have concerns—its subprocess calls invoke the local `ia` CLI (not arbitrary remote endpoints); (4) run in an environment where accidental uploads/downloads won't leak sensitive data (or test with the `test_collection` and non-production keys). If you don't need upload/modify features, you can omit providing credentials.

Capability Analysis

Type: OpenClaw Skill Name: internet-archive Version: 1.0.0 The skill is a legitimate wrapper for the official Internet Archive (archive.org) command-line tool. The Python script `internet-archive.py` uses safe subprocess calls to execute 'ia' commands and includes a standard installation routine for its dependency via pip/uv. No evidence of data exfiltration, malicious execution, or prompt injection was found; the documentation and code are consistent with the stated purpose of managing archival content.

Capability Assessment

✓ Purpose & Capability

Name, description, SKILL.md, README, and the Python implementation align: searching, downloading, uploading, listing, and metadata operations are implemented by invoking the `ia` CLI. Required capabilities (upload needs IA access keys) match the described purpose.

✓ Instruction Scope

Runtime instructions and the script limit actions to invoking the `ia` CLI and installing it if requested. The SKILL.md directs configuring credentials (~/.config/ia.ini or IA_* env vars) only for upload/modify operations, which is appropriate for this functionality. The skill does not instruct reading unrelated system files or exfiltrating data to third-party endpoints.

✓ Install Mechanism

This is instruction-only with a contained Python script. The provided install paths (uv/pipx/pip) are standard ways to install the `internetarchive` package; there are no direct downloads from untrusted URLs or archives extracted from arbitrary hosts.

✓ Credentials

No registry-level required env vars are declared, and the SKILL.md only documents optional IA_ACCESS_KEY_ID / IA_SECRET_ACCESS_KEY for uploads—these are the minimal, expected credentials for writing to an Internet Archive account. The skill does not request unrelated secrets or broad cloud credentials.

✓ Persistence & Privilege

always:false and no code that attempts to modify other skills or global agent settings. The skill runs on demand and does not request persistent or elevated system privileges.

Version History

v1.0.0

Initial release of Internet Archive skill. - Search, download, upload, and manage items on archive.org, including advanced query support and file filtering. - Supports editing metadata, listing files, and handling derivatives. - Integrates with the `ia` CLI tool; offers automatic installation and configuration guidance. - Requires authentication for uploads and metadata changes; public search and download available to all users. - Provides comprehensive usage instructions, best practices, error troubleshooting, and links to official documentation.

Metadata

Slug internet-archive

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Internet Archive Skill?

搜索、下载、上传并管理 Internet Archive (archive.org) 公开和私有存档项目及其元数据，支持高级查询和文件操作。 It is an AI Agent Skill for Claude Code / OpenClaw, with 112 downloads so far.

How do I install Internet Archive Skill?

Run "/install internet-archive" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Internet Archive Skill free?

Yes, Internet Archive Skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Internet Archive Skill support?

Internet Archive Skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Internet Archive Skill?

It is built and maintained by grill-glitch (@grill-glitch); the current version is v1.0.0.

More Skills

Internet Archive Skill

Internet Archive Skill

描述

触发器

输入

输出

实现

使用方式

支持的 Intents

check - 检查工具状态

install - 安装 ia CLI

search - 搜索存档

download - 下载项目文件

upload - 上传文件

metadata - 查看/修改元数据

list - 列出项目文件

配置和认证

配置步骤

User-Agent 要求

自动安装

重要概念

项目（Item）

集合（Collection）

衍生物（Derivatives）

元数据模式

最佳实践

常见错误及解决

参考资源

注意事项

What is Internet Archive Skill?

How do I install Internet Archive Skill?

Is Internet Archive Skill free?

Which platforms does Internet Archive Skill support?

Who created Internet Archive Skill?

💬 Comments

`check` - 检查工具状态

`install` - 安装 ia CLI

`search` - 搜索存档

`download` - 下载项目文件

`upload` - 上传文件

`metadata` - 查看/修改元数据

`list` - 列出项目文件