Description

Formal verification using Lean 4 + Leanstral (labs-leanstral-2603) model. Use when: you need mathematical proof of code correctness, protocol verification, a...

README (SKILL.md)

Leanstral Formal Verification

Name: Leanstral Formal Verification
Author: lykeion-dev

A skill for formal verification using Lean 4 + Mathlib + the Leanstral model (labs-leanstral-2603) from Mistral AI to mathematically prove code properties.

🔑 Requires a Mistral API key. Set via MISTRAL_API_KEY environment variable. The Leanstral model is available for free via Mistral's API as of 2026-05-17. Get a key at: https://console.mistral.ai/api-keys
export MISTRAL_API_KEY="your-key-here"
⚠️ When using this skill:

Theorem statements and code are sent to Mistral's API. Do not include confidential code, secrets, or proprietary protocol details in prompts.

Generated .lean files are untrusted code — verify.sh rejects proof holes (sorry/admit) and compiles in an isolated project directory.

Always review the generated theorem statements. Lean verifies that the proof is valid, not that the theorem matches your intent.

About the Model

Leanstral is Mistral AI's first open-source code agent for Lean 4 (released March 2026). It is specialized for theorem proving — it generates proofs, and Lean 4 verifies them mechanically. When a proof passes, correctness is a mathematical fact, not a probability.

Spec	Value
Architecture	Mixture of Experts (128 experts, 4 active per token)
Total parameters	119B
Active parameters	6.5B per token
Context length	256K tokens
License	Apache 2.0 (open weights)
API model ID	`labs-leanstral-2603`
API base URL	`https://api.mistral.ai/v1`

Why Leanstral beats general-purpose LLMs at proofs

General-purpose LLMs (GPT, Claude, etc.) write code and hope it works — they test a few inputs but miss edge cases. Leanstral writes code and proves it correct for all inputs. When Lean 4 accepts the proof, correctness is guaranteed mathematically.

FLTEval Benchmarks

Model	Cost ($)	Score
Claude Haiku	184	23.0
Claude Sonnet	549	23.7
Claude Opus 4.6	1,650	39.6
Leanstral pass@1	18	21.9
Leanstral pass@2	36	26.3
Leanstral pass@4	72	29.3
Leanstral pass@16	290	31.9

Pass@16 beats Sonnet at 1/2 the cost, approaches Opus at 1/6 the cost.

Quick Start — Direct API Call

The primary way to use this skill: call the Mistral API directly.

curl -X POST "https://api.mistral.ai/v1/chat/completions" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "labs-leanstral-2603",
    "temperature": 1.0,
    "max_tokens": 32000,
    "messages": [{
      "role": "user",
      "content": "Prove the following theorem in Lean 4:\
\
theorem add_comm (a b : Nat) : a + b = b + a := by\
  sorry"
    }]
  }'

Or with Python:

from openai import OpenAI

client = OpenAI(
    api_key="***",
    base_url="https://api.mistral.ai/v1"
)

response = client.chat.completions.create(
    model="labs-leanstral-2603",
    temperature=1.0,
    max_tokens=32000,
    messages=[{"role": "user", "content": "Prove that..."}],
)
print(response.choices[0].message.content)

Recommended API Parameters

Parameter	Recommended	Why
`temperature`	1.0	Diverse proof strategies. Lower values produce repetitive attempts.
`max_tokens`	32000	Proofs are verbose. Generous output budget avoids truncation.
`reasoning_effort`	"high" (Mistral-specific)	Required for non-trivial proofs. Drop to `"medium"` only for simple boolean logic.

Pass@N Strategy

Leanstral improves significantly with multiple attempts. If a proof fails on the first try, call the API again — the model explores different proof strategies each time. pass@2 adds +4.4 points to the score. For automated workflows, loop with retry logic:

for i in 1 2 3; do
  curl -s -X POST "https://api.mistral.ai/v1/chat/completions" \
    -H "Authorization: Bearer $MISTRAL_API_KEY" \
    -H "Content-Type: application/json" \
    -d "{\"model\":\"labs-leanstral-2603\",\"temperature\":1.0,\"max_tokens\":32000,\"messages\":[{\"role\":\"user\",\"content\":\"$(cat proof_request.txt | jq -Rs .)\"}]}" \
    | jq -r '.choices[0].message.content' > proof_attempt_$i.lean
  # Verify with Lean 4
  bash verify.sh proof_attempt_$i.lean && break
done

Prerequisites

Before using this skill, set up a Lean 4 project on your machine:

Install elan (Lean version manager)
Create a Lean project: lake new formal-verification
Add Mathlib as a dependency in lakefile.lean
Run lake build once to build the Mathlib cache (~500MB, one-time cost)
Create a verify.sh script in the project root (see below)

verify.sh template

#!/bin/bash
# verify.sh — compile and check a Lean 4 proof file
# Usage: bash verify.sh /path/to/proof.lean
set -e
export PATH="$HOME/.elan/bin:$PATH"
PROJECT_DIR="\x3Cyour-lean-project-dir>"
cp "$1" "$PROJECT_DIR/Proof.lean"
cd "$PROJECT_DIR"
lake build

When to use

Use cases:

Proving that a code fix satisfies the intended properties
Formal verification of security properties
Proving the correctness of algorithms
Verifying the safety of protocols
Proving that multiple conditional branches are exhaustive
Providing formal assurance that "this change does not break existing behavior"

Do not use:

When simple unit tests are sufficient
When you want to check runtime behavior (test by actually running it)
For unprovable subjective properties (UX, design, etc.)
For I/O-heavy code (file systems, network, databases — Lean 4 is for pure functions)
For entire large codebases — focus on the critical core, not 100% coverage

Workflow Patterns

Pattern 1: Verify an existing code fix

1. Write the code + specification in Lean 4
2. Call Leanstral API: "Prove this implementation satisfies the specification"
3. Compile the returned proof with verify.sh
4. Pass → correctness guaranteed. Fail → failure point is the bug location.

Pattern 2: Spec-driven development

1. Write the specification in Lean 4 first (what "correct" means)
2. Call Leanstral to generate both implementation AND proof
3. Compile — when the proof passes, development is complete. No tests needed.

Pattern 3: Root-cause a bug

1. Observe the bug's symptoms
2. Write the "correct behavior" specification in Lean 4
3. Call Leanstral to prove the current implementation satisfies it
4. The point where the proof fails = the bug's root cause
5. Fix, then re-prove to confirm

Pattern 4: Safe refactoring

1. Write the pre- and post-refactoring code
2. Define equivalence as a specification
3. Call Leanstral to prove they are equivalent
4. Proof passes → refactoring is mathematically safe

Pattern 5: State machine / concurrency verification

1. Model the state machine as an inductive proposition in Lean 4
2. Formulate properties: "no deadlocks", "no unreachable states", etc.
3. Call Leanstral to prove them
→ Catches concurrency bugs exhaustively, not probabilistically

Verification Flow

Step 1: Identify the properties to be verified

Clearly state the properties to be proven in natural language. For example, when verifying a code change that adds a new boolean guard condition:

"When the user cancels, the side effect does not fire."
"During system-generated timeouts, the side effect fires as before (unchanged)."
"The guard flag only becomes true on user-initiated operations."

Step 2: Build a formal model

Convert the natural language properties into Lean 4 types and theorems:

-- Model the relevant boolean conditions as variables
variable (timedOut compactDuring timeExecuting userAborted : Bool)

-- Condition BEFORE the change
def triggerOld : Bool := timedOut && !compactDuring && !timeExecuting

-- Condition AFTER the change (adds userAborted guard)
def triggerNew : Bool := timedOut && !compactDuring && !timeExecuting && !userAborted

Step 3: Call Leanstral to generate proofs

Send the model + theorems to the Leanstral API with a prompt like:

Prove the following theorems in Lean 4 using Mathlib.
Use only the tactics: simp, tauto, rw, cases, rfl, constructor.

[paste your model + theorems here]

Step 4: Save output and compile

# Save Leanstral's response to a .lean file
echo "$LEANSTRAL_OUTPUT" > FormalVerification.lean

# Compile and verify
bash \x3Clean-project-dir>/verify.sh FormalVerification.lean

Success: Build completed successfully → Proof completed
Failure: Read the error message → Send the error back to Leanstral → Retry

Step 5: Report the results

List of proven theorems
Compilation output (success message)
Which real-world properties each theorem corresponds to

Prompt Strategy

When instructing Leanstral via the API, always provide:

[Context]
- Lean 4 version in use
- Dependencies (Mathlib, specific modules)
- The code/specification to verify

[Instruction]
- What property to prove
- Optional: suggested tactics or approach
- Any constraints (e.g., "only use simp and tauto")

Concrete Example: Algorithm Correctness

-- Define the specification FIRST
def sorted_correct (arr : Array Nat) : Prop :=
  ∀ i j, i \x3C j → j \x3C arr.size → arr[i]! ≤ arr[j]!

-- Send to Leanstral: "Prove that bubbleSort satisfies sorted_correct"
def bubbleSort (arr : Array Nat) : Array Nat := by
  sorry -- implementation

theorem bubbleSort_correct (arr : Array Nat) :
  sorted_correct (bubbleSort arr) := by
  sorry -- Leanstral generates this proof

Using as an OpenClaw Sub-Agent (Optional)

If you have already configured a Leanstral sub-agent in your OpenClaw gateway (this is separate setup — see the gateway's agent configuration), you can delegate proof generation to it:

sessions_spawn:
  agentId: \x3Cyour-leanstral-agent-id>
  task: |
    You are a Lean 4 formal verification expert.

    ## Context
    [Description of the verification target and properties to prove]

    ## Environment
    Lean 4 is available on the HOST. To compile:
    ```bash
    export PATH="$HOME/.elan/bin:$PATH"
    bash \x3Clean-project-dir>/verify.sh /workspace/your-file.lean
    ```

    ## Task
    1. Identify the properties to be verified
    2. Build a formal model in Lean 4
    3. Write and prove theorems
    4. Verify compilation using verify.sh (mandatory)
    5. If it fails, read the error, fix the proof, and recompile
    6. Report the final compilation output

    Save to: /workspace/FormalVerification.lean

Recommended sub-agent parameters (if configuring one):

Setting	Recommended
Model	`labs-leanstral-2603` (Mistral)
Temperature	1.0
Max tokens	32000
Thinking / reasoning	high
Timeout	600–1200 seconds
Fallback models	Any capable reasoning models

Best Practices

✅ DO

Write the specification first. Define "correct" before implementing.
Prove in small steps. Break large theorems into lemmas.
Set reasoning_effort="high". Essential for non-trivial proofs.
Use pass@N. If the first API call fails, retry — each attempt explores different strategies.
Feed Lean errors back to Leanstral. The error message tells exactly what failed.
Leverage Mathlib. Use existing proven lemmas — don't re-prove everything from scratch.
Keep proofs small and focused. One property per verification task.

❌ DON'T

Use for I/O-heavy code. Lean 4 is for pure functions. File I/O, networking, databases are impractical to formalize.
Formalize an entire large codebase. Pick the critical 5% — the core logic that must never be wrong.
Confuse "proof passes" with "spec is correct." Lean verifies that the implementation matches the spec. The spec's correctness is a human judgment.
Use Leanstral for general tasks. It's specialized for Lean 4 proofs — not for Python web apps, general chat, or code generation in other languages.
Trust proofs blindly. Lean's verification is trustworthy, but rare hallucinations (non-existent theorems/tactics) are caught by the compiler.

Important Notes

Lean on the host

Lean 4 with Mathlib requires ~500MB of cached dependencies. Run it directly on your machine — containerizing it is impractical. The verify.sh script compiles .lean files using the host's Lean installation.

Timeout management

The Leanstral model can take minutes to generate proofs. Countermeasures:

Set generous API timeouts (600–1200s)
Use pass@N: retry with fresh API calls if the first proof fails
Divide complex proofs into smaller lemmas

Applications in Other Fields

Field	Example
Algorithms	Correctness of sorting/searching, proof of computational complexity
Security	Safety of authentication protocols, access control properties
Business logic	Consistency of fee calculations, exhaustiveness of discount rules
Data integrity	Satisfaction of DB constraints, safety of schema migrations
Protocols	Deadlock-free state transitions, message ordering guarantees
Mathematics	Correctness of statistical calculations, equivalence of formula transformations
State machines	No unreachable states, all transitions defined, invariant preservation
Refactoring	Pre/post equivalence proofs, behavioral preservation guarantees

Application procedure

Define the property in natural language — "If X, then Y must always hold"
Build a formal model — Variables, types, functions that model the system
Write theorems — Express properties using theorem
Call Leanstral API — Send the model + theorems and request proofs
Compile and verify — lake build checks the proofs

Basic proof tactics

Tactic	Use
`simp`	Expand definitions and simplify
`tauto`	Automatic proof of propositional logic
`rw [h]`	Rewrite using assumption `h`
`cases`	Case analysis
`intro h`	Introduce implication
`rfl`	Trivial equality
`constructor`	Split a conjunction

Limitations

Limitation	Detail
Lean 4 only	Does not support Coq, Isabelle, Agda, or other proof assistants
I/O-poor	File systems, networking, databases cannot be formalized practically
Learning curve	Requires understanding of Lean 4's tactic language
Not for full codebases	Formalizing an entire project is impractical — focus on critical core
Proof length	Complex proofs can be very long and may exceed context
Hallucinations	Rarely generates non-existent theorems/tactics — Lean catches these

Error Handling

Compilation errors

error: unknown identifier 'foo'

→ The definition does not exist. Define it with def or add an import.

error: type mismatch

→ Type mismatch. Check the variable type declaration.

error: tactic 'simp' failed

→ simp cannot prove it. Use more specific tactics (cases, rw).

Pass@N retry loop

# Basic retry loop: try up to 3 times, stop on first success
for i in 1 2 3; do
  echo "Attempt $i..."
  curl -s ... | python3 -c "import sys,json; print(json.load(sys.stdin)['choices'][0]['message']['content'])" > proof.lean
  if bash verify.sh proof.lean 2>/dev/null; then
    echo "✅ Proof verified on attempt $i"
    break
  fi
  echo "❌ Attempt $i failed, retrying..."
done

Reference Links

Leanstral blog: https://mistral.ai/news/leanstral
Model (HuggingFace): https://huggingface.co/mistralai/Leanstral-2603
API docs: https://docs.mistral.ai/models/leanstral-26-03
Lean 4: https://github.com/leanprover/lean4
Mathlib: https://github.com/leanprover-community/mathlib4
Lean AI leaderboard: https://lean-lang.org/eval/

Usage Guidance

Install only if you are comfortable sending selected theorem statements and code to Mistral. Before running generated Lean files, add real proof-hole checks, verify theorem statements yourself, and run builds in a sandboxed disposable project rather than a valuable local workspace.

Capability Tags

requires-sensitive-credentials

Capability Assessment

ℹ Purpose & Capability

The skill purpose is coherent: it uses Lean 4, Lake, and Mistral's Leanstral model for formal verification. It clearly says theorem statements and code are sent to Mistral and requires a Mistral API key.

⚠ Instruction Scope

The visible instructions claim generated Lean files are checked safely, but the provided verify.sh template only copies the file and runs `lake build`; it does not visibly reject `sorry` or `admit` or provide real isolation.

✓ Install Mechanism

There is no install spec or bundled code. The setup is instruction-only and depends on user-directed Lean/Lake setup.

⚠ Credentials

Building AI-generated `.lean` files with `lake build` is purpose-aligned, but the artifact under-specifies containment for untrusted generated code and may affect the local Lean project.

ℹ Persistence & Privilege

The template writes generated proof attempts and copies a proof into `Proof.lean` in the configured project directory. This is user-directed and scoped, but users should avoid overwriting important files.

Version History

v1.2.0

ClawScan fixes v1.2.0: verify.sh now rejects proof holes (sorry/admit) before building. Added MISTRAL_API_KEY to metadata env requirements. Added security advisory to intro section. Improved verify.sh template in docs.

v1.1.0

Restructured to API-first approach. Direct curl + Python API examples as primary usage. Sub-agent delegation moved to optional appendix. Added pass@N bash retry loop. Added metadata.requires.bins: [lake].

v1.0.0

Initial release: Formal verification using Lean 4 + Leanstral (labs-leanstral-2603, Mistral, 119B MoE). Free tier available. Features: 5 workflow patterns (verify, spec-driven, root-cause, refactoring, state machines), FLTEval benchmarks (pass@16 beats Sonnet), pass@N strategy, comprehensive best practices, verify.sh setup script, worked example with 13 proven theorems.

Metadata

Slug leanstral-formal-verification

Version 1.2.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 3

Frequently Asked Questions

What is Leanstral Formal Verification?

Formal verification using Lean 4 + Leanstral (labs-leanstral-2603) model. Use when: you need mathematical proof of code correctness, protocol verification, a... It is an AI Agent Skill for Claude Code / OpenClaw, with 136 downloads so far.

How do I install Leanstral Formal Verification?

Run "/install leanstral-formal-verification" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Leanstral Formal Verification free?

Yes, Leanstral Formal Verification is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Leanstral Formal Verification support?

Leanstral Formal Verification is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Leanstral Formal Verification?

It is built and maintained by 大村愛弥 (@lykeion-dev); the current version is v1.2.0.

More Skills

Leanstral Formal Verification

Leanstral Formal Verification

About the Model

Why Leanstral beats general-purpose LLMs at proofs

FLTEval Benchmarks

Quick Start — Direct API Call

Recommended API Parameters

Pass@N Strategy

Prerequisites

verify.sh template

When to use

Workflow Patterns

Pattern 1: Verify an existing code fix

Pattern 2: Spec-driven development

Pattern 3: Root-cause a bug

Pattern 4: Safe refactoring

Pattern 5: State machine / concurrency verification

Verification Flow

Step 1: Identify the properties to be verified

Step 2: Build a formal model

Step 3: Call Leanstral to generate proofs

Step 4: Save output and compile

Step 5: Report the results

Prompt Strategy

Concrete Example: Algorithm Correctness

Using as an OpenClaw Sub-Agent (Optional)

Best Practices

✅ DO

❌ DON'T

Important Notes

Lean on the host

Timeout management

Applications in Other Fields

Application procedure

Basic proof tactics

Limitations

Error Handling

Compilation errors

Pass@N retry loop

Reference Links

What is Leanstral Formal Verification?

How do I install Leanstral Formal Verification?

Is Leanstral Formal Verification free?

Which platforms does Leanstral Formal Verification support?

Who created Leanstral Formal Verification?

💬 Comments