← Back to Skills Marketplace
artminding

Chaoxing Download

by Yi,Li (李祎) · GitHub ↗ · v1.3.0 · MIT-0
cross-platform ⚠ suspicious
118
Downloads
1
Stars
0
Active Installs
4
Versions
Install in OpenClaw
/install chaoxing-download
Description
Download PDF documents from Chaoxing (超星) contest/platform viewer URLs and convert to TXT. Use when user wants to download files from contestyd.chaoxing.com,...
README (SKILL.md)

Chaoxing Document Downloader (超星文档下载)

Download PDFs from Chaoxing WPS viewer URLs using the getYunFiles API.

Core Principle

Every Chaoxing viewer URL contains an objectid (32-char hex). Call the getYunFiles API to get the direct PDF link — no cookies or auth tokens needed.

Arguments

$ARGUMENTS contains the user's download request — typically one or more entries with page count, name, and viewer URL. Parse them to extract the data.

Download Method

Step 1: Extract objectid from each URL

Find the objectid=([a-f0-9]{32}) parameter in each viewer URL.

Step 2: Call getYunFiles API

For each objectid, call:

https://contestyd.chaoxing.com/app/files/{objectid}/getYunFiles?key=allData

Response JSON contains:

  • data.pdf — direct PDF URL on s3.cldisk.com or s3.ananas.chaoxing.com (preferred)
  • data.download — alternative download URL with auth tokens (fallback)
  • data.filename — original filename
  • data.pagenum — page count

Step 3: Download the PDF

Use the data.pdf URL to download directly. No authentication headers needed.

Save to: ~/Downloads/chaoxing_pdfs/{用户给的名称}.pdf

Step 4: Validate page count

Compare data.pagenum with the user's expected page count. Report any mismatch.

Step 5: Convert PDF to TXT (with OCR fallback)

After downloading each PDF, automatically extract text to a plain text file. Use a two-stage approach: native text extraction first, then OCR fallback for image-based pages.

Prerequisites:

pip install pymupdf rapidocr-onnxruntime

Conversion method (Python):

import sys, os, fitz
from rapidocr_onnxruntime import RapidOCR

if sys.platform == "win32":
    sys.stdout.reconfigure(encoding="utf-8")

ocr = RapidOCR()
pdf_path = "~/Downloads/chaoxing_pdfs/{name}.pdf"
doc = fitz.open(pdf_path)
all_text = []

for i, page in enumerate(doc):
    # Stage 1: Try native text extraction
    native = page.get_text().strip()
    if len(native) > 50:
        all_text.append(f"--- 第{i+1}页 ---\
{native}")
        continue
    # Stage 2: OCR fallback for image-based pages
    pix = page.get_pixmap(dpi=200)
    img_bytes = pix.tobytes("png")
    result, _ = ocr(img_bytes)
    ocr_text = "\
".join([item[1] for item in result]) if result else ""
    label = "OCR" if len(ocr_text) > 0 else "(empty)"
    all_text.append(f"--- 第{i+1}页 [{label}] ---\
{ocr_text}")

doc.close()
full_text = "\
".join(all_text)

with open(pdf_path.replace(".pdf", ".txt"), "w", encoding="utf-8") as f:
    f.write(full_text)

# Summary
native_pages = sum(1 for p in all_text if "[OCR]" not in p and "[empty]" not in p)
ocr_pages = sum(1 for p in all_text if "[OCR]" in p)
print(f"Native: {native_pages}p, OCR: {ocr_pages}p, Total: {len(full_text)} chars")

Output files per download:

  • {name}.pdf — original PDF
  • {name}.txt — plain text extraction (native + OCR pages marked with [OCR])

How it works:

  1. Each page is first checked for native text (text layer PDF)
  2. If native text \x3C 50 chars, the page is rendered to image at 200 DPI and processed by RapidOCR
  3. OCR pages are labeled [OCR] in the output for easy identification
  4. Empty pages (no text and OCR fails) are labeled [empty]

CLI Tool (Alternative)

A CLI tool is available at C:/Users/Cameron/Downloads/chaoxing_dl.py:

# Single download
python ~/Downloads/chaoxing_dl.py "VIEWER_URL" -n "文件名"

# Batch from JSON file
python ~/Downloads/chaoxing_dl.py --batch tasks.json

# With page validation
python ~/Downloads/chaoxing_dl.py "URL" -n "name" --json

# Force overwrite
python ~/Downloads/chaoxing_dl.py "URL" -n "name" -f

Batch JSON format:

[
  {"name": "文件名", "url": "viewer_url_or_objectid", "pages": 22},
  ...
]

Batch Processing (Without CLI Tool)

For multiple downloads without the CLI, use bash loop:

for oid_name in "OBJECTID1:名称1" "OBJECTID2:名称2"; do
  oid="${oid_name%%:*}"; name="${oid_name##*:}"
  info=$(curl -s -L "https://contestyd.chaoxing.com/app/files/$oid/getYunFiles?key=allData")
  pagenum=$(echo "$info" | grep -o '"pagenum":[0-9]*' | cut -d: -f2)
  pdf_url=$(echo "$info" | grep -o '"pdf":"[^"]*"' | head -1 | tr -d '"' | sed 's/^pdf://')
  echo "$name: ${pagenum}p"
  curl -s -L -o ~/Downloads/chaoxing_pdfs/${name}.pdf "$pdf_url"
done

Key Notes

  • Only objectid is needed — no resid, tk, addPointInfo, or cookies
  • Always validate page count against user expectation
  • The PDF URLs on s3.cldisk.com are direct links, publicly accessible
  • If data.pdf is empty, fall back to data.download
  • Skip files that already exist unless user specifies overwrite
Usage Guidance
This skill appears to do what it claims: call Chaoxing's public getYunFiles endpoint, download PDFs, and convert them to text. Before using it: 1) Confirm you are allowed to download the documents (copyright/terms). 2) Run pip installs inside a virtualenv (pymupdf and rapidocr-onnxruntime) to limit system impact. 3) Be aware rapidocr-onnxruntime may download OCR models or perform additional network requests — check its docs. 4) The skill writes files to ~/Downloads/chaoxing_pdfs; validate filenames to avoid path-traversal or overwrites and use the force/overwrite flags deliberately. 5) If you need to audit network calls, monitor requests to contestyd.chaoxing.com and any s3 host used for PDFs. If any of these behaviors are unacceptable or you cannot verify model/package provenance, do not run the installs or run them in an isolated environment.
Capability Analysis
Type: OpenClaw Skill Name: chaoxing-download Version: 1.3.0 The skill contains a shell injection vulnerability in the provided bash loop example within SKILL.md, where the user-supplied 'name' variable is used directly in a curl command without sanitization. Additionally, the Python conversion logic lacks path validation, potentially allowing path traversal if a malicious filename is provided. The documentation also references a hardcoded local file path (C:/Users/Cameron/Downloads/chaoxing_dl.py), which is a common indicator of poorly sanitized or environment-specific code.
Capability Assessment
Purpose & Capability
Name/description (download Chaoxing viewer PDFs and convert to text) directly match the SKILL.md: it extracts objectid, calls the documented getYunFiles endpoint, downloads data.pdf, validates page count, and converts to TXT. There are no unrelated credentials, binaries, or config paths requested.
Instruction Scope
Instructions stay within the stated purpose (parsing URLs/objectid, calling contestyd.chaoxing.com, downloading PDFs, extracting text and OCR). They instruct writing files to ~/Downloads/chaoxing_pdfs and installing Python packages. The SKILL.md also includes a Windows example path (C:/Users/Cameron/...) which appears to be an author/example artifact and not required. Note: the OCR package may download additional models at runtime, which causes extra network activity beyond the described API calls.
Install Mechanism
The skill is instruction-only (no install spec), so nothing is installed by the platform. However the runtime instructions tell users to run pip install pymupdf rapidocr-onnxruntime. Installing third-party Python packages is within scope but carries the usual risks (supply-chain issues, additional runtime model or asset downloads). No arbitrary download URLs or archives are present in the skill itself.
Credentials
No environment variables, credentials, or protected config paths are requested. The skill writes output to the user's Downloads directory (expected for a downloader). The Windows path shown is just an example and not a declared requirement.
Persistence & Privilege
No persistent/always-on flag set; the skill is user-invocable and does not request elevated or cross-skill configuration changes. It does write files to the user's Downloads directory as part of normal operation.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install chaoxing-download
  3. After installation, invoke the skill by name or use /chaoxing-download
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.3.0
新增RapidOCR自动回退:图片型/扫描件PDF自动OCR识别,无需Tesseract
v1.2.0
简化:移除DOCX转换,仅保留PDF转TXT
v1.1.0
新增PDF转TXT和DOCX功能,下载后自动生成三种格式文件
v1.0.0
初始发布:超星(学习通)PDF文档下载,支持单个/批量,页数验证
Metadata
Slug chaoxing-download
Version 1.3.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 4
Frequently Asked Questions

What is Chaoxing Download?

Download PDF documents from Chaoxing (超星) contest/platform viewer URLs and convert to TXT. Use when user wants to download files from contestyd.chaoxing.com,... It is an AI Agent Skill for Claude Code / OpenClaw, with 118 downloads so far.

How do I install Chaoxing Download?

Run "/install chaoxing-download" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Chaoxing Download free?

Yes, Chaoxing Download is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Chaoxing Download support?

Chaoxing Download is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Chaoxing Download?

It is built and maintained by Yi,Li (李祎) (@artminding); the current version is v1.3.0.

💬 Comments