Data Source Verification
/install data-source-verification
Data Source Verification
A systematic workflow for verifying that every data point in a research dataset can be traced back to its original source paper, figure, table, or text passage.
When to Use
- Building datasets from literature (CSV, JSON, tables)
- Populating tables or plots with values from multiple papers
- Reviewing existing datasets for data integrity
- Before submitting any paper that includes compiled data
Core Rule
Every numerical value must be traceable to a specific location in the original paper. If you cannot find the value in the cited source, it is unverified and must be flagged — never included as confirmed data.
Data Provenance Chain
Source PDF → CITATION.md (extracted values) → CSV/data table → LaTeX manuscript
Every link in this chain must be auditable. If someone asks "where did this number come from?", the answer should be: paper X, Table Y, column Z — and we have the PDF to prove it.
Citation Source Management
Project Setup (init)
Create a Citation_Sources/ directory for the project:
Citation_Sources/
AuthorLastName_Year_Journal_ShortTitle/
Author_Year_Topic.pdf ← original paper
Author_Year_Topic_SI.pdf ← supplementary info (if any)
CITATION.md ← structured metadata + data provenance
CITATION.md Template
Every cited paper gets a CITATION.md file:
# Author et al. Year — Short Description
**Title**: Full title
**Authors**: Author list
**Journal**: Journal Vol, Pages (Year)
**DOI**: 10.xxxx/xxxxx
**Data used**: [exact values extracted, with table/figure reference]
**PDF**: ✅ Confirmed | ❌ NOT DOWNLOADED — [reason]
**Status**: CONFIRMED | ⚠️ NEEDS CONFIRM — [reason]
**Notes**: [any caveats, discrepancies, proxy assumptions]
Adding a Source (add)
When adding a new citation:
- Create the folder:
Citation_Sources/AuthorLastName_Year_Journal_ShortTitle/ - Download the original PDF — always try to get the actual paper, not just the abstract
- Download supplementary information if it contains data
- Create CITATION.md from the template
- Extract the specific values you need, recording exact table/figure/page locations
- Mark the PDF status and verification status
Verification Workflow
Step 1: Collect with Provenance
When extracting data from a paper, record ALL of the following for each value:
Value: 0.65 W/m·K
Paper: Cheng et al. 2021
DOI: 10.1002/smll.202101693
Location: Table 2, row 3
Method: TDTR (time-domain thermoreflectance)
Data type: Experimental
Verified: YES — value confirmed in Table 2
Never record a value without filling in the Location, Data type, and Verified fields.
Step 2: Verify Against Original
For each data point:
- Always download the original PDF — don't trust web scraping, abstracts, or secondary sources
- Find the exact value in a table, figure, or text passage
- Record where you found it — table number, figure number, page, equation
- Note the measurement method — experimental technique, simulation, estimate
- Check units — convert if needed, note the original units
- Track the data type: DFT-calculated, experimentally measured, or derived (note assumptions)
If the paper is behind a paywall and you cannot verify:
- Mark as
⚠️ NEEDS CONFIRM — paywall - Note this limitation in CITATION.md
Step 3: Cross-Check the Full Chain
Verify consistency at every step:
Value in PDF → Value in CITATION.md → Value in data table/CSV → Value in manuscript
Any mismatch at any step is a flag.
Step 4: Flag Problems
Mark any value with one of these status levels:
| Status | Meaning | Action |
|---|---|---|
VERIFIED |
Found exact value in cited paper at stated location | Include in dataset |
APPROXIMATE |
Value is close but not exact (e.g., read from figure) | Include with note |
UNVERIFIED |
Cannot find value in cited paper | Flag — do not use without user approval |
MISATTRIBUTED |
Cited paper does not contain this data at all | Remove from dataset, alert user immediately |
ESTIMATED |
Value was calculated or estimated, not directly measured | Include with clear label |
⚠️ NEEDS CONFIRM |
PDF not available (paywall) or value needs double-check | Flag for manual verification |
Step 5: Flag Discrepancies
When multiple sources report different values for the same quantity:
- Record both values with their sources
- Note the discrepancy explicitly (e.g., "B = 45 GPa (Author A, Table 2) vs B = 86 GPa (Author B, Fig. 3)")
- Check if the difference is due to measurement method, sample preparation, or temperature
- Let the user decide which value to use — do not silently pick one
Dataset Format
When building compiled datasets, always include provenance columns:
CSV format:
Material,Property,Value,Unit,Source_Paper,DOI,Source_Location,Method,Data_Type,Verified,Notes
Li6PS5Cl,kappa,0.69,W/m·K,Cheng 2021,10.1002/smll.202101693,Table 2,TDTR,experimental,YES,
Li3InCl6,v_longitudinal,2800,m/s,Asano 2018,10.1002/adma.201803075,NOT FOUND,Unknown,unknown,MISATTRIBUTED,Paper contains no Li3InCl6 sound velocity data
JSON format:
{
"material": "Li6PS5Cl",
"property": "thermal_conductivity",
"value": 0.69,
"unit": "W/m·K",
"source": {
"paper": "Cheng et al. 2021",
"doi": "10.1002/smll.202101693",
"location": "Table 2, row 5",
"method": "TDTR",
"dataType": "experimental",
"verified": true
}
}
Audit Workflow (audit)
Scan all CITATION.md files and generate a report:
- List all unique sources in Citation_Sources/
- For each source, check:
- PDF downloaded? (✅ or ❌)
- CITATION.md complete? (all fields filled)
- Values confirmed against PDF?
- Generate audit summary:
## Audit Report — [Project Name]
Date: [timestamp]
### Summary
- Total sources: [N]
- PDFs confirmed: [N] / [N]
- Values verified: [N] / [N]
- Needs confirmation: [N]
- Missing PDFs: [N]
### Source Details
| Paper | PDF | Values | Verified | Status |
|---|---|---|---|---|
| Cheng 2021 | ✅ | 3 | 3/3 | CONFIRMED |
| Asano 2018 | ✅ | 2 | 1/2 | ⚠️ 1 MISATTRIBUTED |
| Wang 2014 | ❌ | 4 | 0/4 | ⚠️ NEEDS CONFIRM |
### Flagged Values
- Li3InCl6 v_longitudinal: MISATTRIBUTED to Asano 2018 — paper contains no LIC data
- LGPS density: conflicting values (2.0 vs 1.9 g/cm³) between Wang 2014 and Kamaya 2011
- Report findings — list verified, flagged, and misattributed values
- Recommend action for each flagged value
Export (export)
Generate a summary table of all data values and their provenance:
## Data Provenance Summary — [Project Name]
| Material | Property | Value | Unit | Source | Location | Data Type | Status |
|---|---|---|---|---|---|---|---|
| LLZTO | κ | 0.42 | W/m·K | Muy 2019 | Table 1 | experimental | VERIFIED |
| LAGP | v_avg | 4700 | m/s | Rohde 2021 | Table S2 | experimental | VERIFIED |
| Li3InCl6 | v_avg | 1849 | m/s | Qiu 2025 | Table 1 | DFT | VERIFIED |
Red Flags
Watch for these indicators of unreliable data:
- Value attributed to a paper but no specific table/figure cited
- "Estimated from family properties" without a clear methodology
- Values that appear in reviews but cannot be traced to original measurements
- Round numbers that suggest estimation rather than measurement (e.g., 2800 m/s vs 2837 m/s)
- Same value appearing in multiple papers without independent measurement
- DFT values presented as experimental without noting the distinction
- Discrepancies between different sources for the same quantity left unaddressed
Rules
- Never assume a citation is correct — always verify against the original paper
- Always download the PDF — don't trust abstracts, web scraping, or secondary sources
- Secondary sources are not verification — a review paper citing a value does not confirm it
- Flag immediately when a value cannot be found in its cited source
- Track data type — distinguish DFT-calculated, experimentally measured, and derived values
- Flag discrepancies — when two sources disagree, note both values and let the user decide
- Prefer measured over estimated — clearly label the difference
- Document everything — future researchers need the audit trail
- When in doubt, exclude — a smaller verified dataset beats a larger unverified one
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install data-source-verification - 安装完成后,直接呼叫该 Skill 的名称或使用
/data-source-verification触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Data Source Verification 是什么?
Verify numerical data against original papers and maintain traceable provenance for every value in datasets, tables, and plots. Includes citation source mana... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 311 次。
如何安装 Data Source Verification?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-source-verification」即可一键安装,无需额外配置。
Data Source Verification 是免费的吗?
是的,Data Source Verification 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Data Source Verification 支持哪些平台?
Data Source Verification 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Data Source Verification?
由 Larry-of-cosmotim(@larry-of-cosmotim)开发并维护,当前版本 v1.0.0。