← Back to Skills Marketplace
dataify-server

Dataify Github Repository By Repo Url

by dataify-server · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
40
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install dataify-github-repository-by-repo-url
Description
Prepare Dataify builder requests for the github.com scraper family rooted at github_repository_by-repo-url. Use when needs to work with the successful Dataif...
README (SKILL.md)

Dataify Builder Skill

Use this skill to prepare Dataify builder requests for the scraper family rooted at github_repository_by-repo-url on github.com.

Workflow

  1. Check whether DATAIFY_API_TOKEN exists in the environment.
  2. If the token is missing, stop and tell the user to sign in at Dataify Dashboard](https://dataify.com/dashboard) to obtain it.
  3. Ask the user to choose exactly one tool from the following Chinese list:
  • 通过仓库URL采集 (github_repository_by-repo-url)
  • 通过搜索URL采集 (github_repository_by-search-url)
  • 通过URL采集 (github_repository_by-url)
  1. Read references/tool-params.json and find the chosen tool by tool_sign or Chinese tool name.
  2. For each parameter in the chosen tool:
    • If input_mode is user_input, ask the user for the value.
    • If input_mode is select, present the saved options to the user.
  3. Use scripts/build-dataify-request.py as the default cross-platform helper.
  4. Use scripts/build-dataify-request.ps1 as the Windows PowerShell helper when needed.
  5. When a selectable parameter has a human-readable Chinese label, keep that label in spider_parameters. Do not replace it with a code such as HK unless the user explicitly asks for the coded value.
  6. Build spider_parameters as a JSON array.
  7. If every parameter has only one final value, build one object such as [{"searchurl":"...","country":"Hong Kong"}].
  8. If one or more parameters have multiple aligned values, zip them by index and build one object per row. Example: [{"search_url":"url1","page_turning":"1","max_num":"15"},{"search_url":"url2","page_turning":"1","max_num":"15"}].
  9. If a parameter has one value while another parameter has multiple values, reuse the single value across every generated row.
  10. Set spider_name to github.com.
  11. Set spider_id to the selected tool's tool_sign.
  12. Always include spider_errors=true and file_name={{TasksID}}.
  13. Return a curl command for https://scraperapi.dataify.com/builder.

Set DATAIFY_API_TOKEN

Prefer a permanent environment-variable setup instead of setting the token only for the current terminal session.

Windows PowerShell, permanent for the current user:

[Environment]::SetEnvironmentVariable("DATAIFY_API_TOKEN", "your_token_here", "User")

Then reopen PowerShell. If the current session also needs the token immediately, run:

$env:DATAIFY_API_TOKEN = "your_token_here"

macOS or Linux, permanent for bash:

echo 'export DATAIFY_API_TOKEN="your_token_here"' >> ~/.bashrc
source ~/.bashrc

macOS or Linux, permanent for zsh:

echo 'export DATAIFY_API_TOKEN="your_token_here"' >> ~/.zshrc
source ~/.zshrc

Script usage

Python:

python scripts/build-dataify-request.py --tool-sign \x3Cselected_tool_sign> --values-file values.json

PowerShell:

& ".\scripts\build-dataify-request.ps1" -ToolSign "\x3Cselected_tool_sign>" -ValuesFile ".\values.json"

The values.json file should contain either one object or an array of objects. Example:

[{"searchurl":"https://www.airbnb.com/s/Greece/homes?...","country":"Hong Kong"}]

Required output shape

Generate a curl command in this form:

curl -X POST 'https://scraperapi.dataify.com/builder' \
  -H "Authorization: Bearer $DATAIFY_API_TOKEN" \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -d 'spider_name=github.com' \
  -d 'spider_id=\x3Cselected_tool_sign>' \
  -d 'spider_parameters=[{"param":"value"}]' \
  -d 'spider_errors=true' \
  -d 'file_name={{TasksID}}'

Reference usage

  • references/tool-params.json stores the full saved parameter catalog for every available tool in this scraper family.
  • scripts/build-dataify-request.py is the portable implementation and should be preferred.
  • scripts/build-dataify-request.ps1 mirrors the same behavior for Windows users.
  • If a parameter has no options, the user must provide the value.
  • If a parameter has options, present those options back to the user before building the final request.
  • Do not assume spider_parameters always contains exactly one object. Multi-value tools may require multiple objects zipped by index.
  • Use the saved url_example only as a reference example. Do not assume the user wants the example values unless they explicitly confirm them.
Usage Guidance
Review before installing. Use a scoped Dataify token, avoid putting it in dotfiles or shared shell profiles, and do not run or paste generated curl commands that contain the raw token in logs, tickets, chats, or CI output. Treat spider_parameters as data that will be sent to Dataify and avoid including secrets or private repository information unless that is intended.
Capability Assessment
Purpose & Capability
The stated purpose matches the main behavior: prepare a Dataify builder curl request for GitHub scraper tools using DATAIFY_API_TOKEN and user-supplied parameters. The outbound Dataify endpoint is part of the described workflow, not hidden exfiltration.
Instruction Scope
The instructions are narrowly about generating Dataify requests, but they recommend persistent token setup and the Python helper prints a curl command containing the raw bearer token, which is broader credential exposure than necessary for the purpose.
Install Mechanism
The package contains markdown instructions, one Python helper, and a small agent YAML file with no dependency installation. However, it references files that are not present in the artifact, including references/tool-params.json and a PowerShell helper, so functionality may be incomplete.
Credentials
Reading DATAIFY_API_TOKEN is expected, but recommending shell profile or user environment persistence without security guidance is disproportionate for a request-building helper.
Persistence & Privilege
There is no background worker, privilege escalation, or automatic execution, but the documented setup persists an API token in user-level environment or shell startup files.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install dataify-github-repository-by-repo-url
  3. After installation, invoke the skill by name or use /dataify-github-repository-by-repo-url
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of Dataify builder skill for GitHub repository scraping: - Guides users through preparing Dataify builder requests for github.com scraper tools by repository URL. - Supports environment-based API token checks and detailed setup instructions for Windows, macOS, and Linux. - Allows users to select specific scraping tools and enter or select required tool parameters with Chinese labels. - Dynamically builds JSON arrays of spider parameters, supporting single and multi-value workflows. - Outputs ready-to-use curl commands targeting https://scraperapi.dataify.com/builder with correct parameters. - Provides script usage examples and references for tool parameter configuration and request generation.
Metadata
Slug dataify-github-repository-by-repo-url
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Dataify Github Repository By Repo Url?

Prepare Dataify builder requests for the github.com scraper family rooted at github_repository_by-repo-url. Use when needs to work with the successful Dataif... It is an AI Agent Skill for Claude Code / OpenClaw, with 40 downloads so far.

How do I install Dataify Github Repository By Repo Url?

Run "/install dataify-github-repository-by-repo-url" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Dataify Github Repository By Repo Url free?

Yes, Dataify Github Repository By Repo Url is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Dataify Github Repository By Repo Url support?

Dataify Github Repository By Repo Url is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Dataify Github Repository By Repo Url?

It is built and maintained by dataify-server (@dataify-server); the current version is v1.0.0.

💬 Comments