/install leanstral-formal-verification
Leanstral Formal Verification
A skill for formal verification using Lean 4 + Mathlib + the Leanstral model
(labs-leanstral-2603) from Mistral AI to mathematically prove code properties.
🔑 Requires a Mistral API key. Set via
MISTRAL_API_KEYenvironment variable. The Leanstral model is available for free via Mistral's API as of 2026-05-17. Get a key at: https://console.mistral.ai/api-keysexport MISTRAL_API_KEY="your-key-here"⚠️ When using this skill:
- Theorem statements and code are sent to Mistral's API. Do not include confidential code, secrets, or proprietary protocol details in prompts.
- Generated
.leanfiles are untrusted code —verify.shrejects proof holes (sorry/admit) and compiles in an isolated project directory.- Always review the generated theorem statements. Lean verifies that the proof is valid, not that the theorem matches your intent.
About the Model
Leanstral is Mistral AI's first open-source code agent for Lean 4 (released March 2026). It is specialized for theorem proving — it generates proofs, and Lean 4 verifies them mechanically. When a proof passes, correctness is a mathematical fact, not a probability.
| Spec | Value |
|---|---|
| Architecture | Mixture of Experts (128 experts, 4 active per token) |
| Total parameters | 119B |
| Active parameters | 6.5B per token |
| Context length | 256K tokens |
| License | Apache 2.0 (open weights) |
| API model ID | labs-leanstral-2603 |
| API base URL | https://api.mistral.ai/v1 |
Why Leanstral beats general-purpose LLMs at proofs
General-purpose LLMs (GPT, Claude, etc.) write code and hope it works — they test a few inputs but miss edge cases. Leanstral writes code and proves it correct for all inputs. When Lean 4 accepts the proof, correctness is guaranteed mathematically.
FLTEval Benchmarks
| Model | Cost ($) | Score |
|---|---|---|
| Claude Haiku | 184 | 23.0 |
| Claude Sonnet | 549 | 23.7 |
| Claude Opus 4.6 | 1,650 | 39.6 |
| Leanstral pass@1 | 18 | 21.9 |
| Leanstral pass@2 | 36 | 26.3 |
| Leanstral pass@4 | 72 | 29.3 |
| Leanstral pass@16 | 290 | 31.9 |
Pass@16 beats Sonnet at 1/2 the cost, approaches Opus at 1/6 the cost.
Quick Start — Direct API Call
The primary way to use this skill: call the Mistral API directly.
curl -X POST "https://api.mistral.ai/v1/chat/completions" \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "labs-leanstral-2603",
"temperature": 1.0,
"max_tokens": 32000,
"messages": [{
"role": "user",
"content": "Prove the following theorem in Lean 4:\
\
theorem add_comm (a b : Nat) : a + b = b + a := by\
sorry"
}]
}'
Or with Python:
from openai import OpenAI
client = OpenAI(
api_key="***",
base_url="https://api.mistral.ai/v1"
)
response = client.chat.completions.create(
model="labs-leanstral-2603",
temperature=1.0,
max_tokens=32000,
messages=[{"role": "user", "content": "Prove that..."}],
)
print(response.choices[0].message.content)
Recommended API Parameters
| Parameter | Recommended | Why |
|---|---|---|
temperature |
1.0 | Diverse proof strategies. Lower values produce repetitive attempts. |
max_tokens |
32000 | Proofs are verbose. Generous output budget avoids truncation. |
reasoning_effort |
"high" (Mistral-specific) | Required for non-trivial proofs. Drop to "medium" only for simple boolean logic. |
Pass@N Strategy
Leanstral improves significantly with multiple attempts. If a proof fails on the first try, call the API again — the model explores different proof strategies each time. pass@2 adds +4.4 points to the score. For automated workflows, loop with retry logic:
for i in 1 2 3; do
curl -s -X POST "https://api.mistral.ai/v1/chat/completions" \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-H "Content-Type: application/json" \
-d "{\"model\":\"labs-leanstral-2603\",\"temperature\":1.0,\"max_tokens\":32000,\"messages\":[{\"role\":\"user\",\"content\":\"$(cat proof_request.txt | jq -Rs .)\"}]}" \
| jq -r '.choices[0].message.content' > proof_attempt_$i.lean
# Verify with Lean 4
bash verify.sh proof_attempt_$i.lean && break
done
Prerequisites
Before using this skill, set up a Lean 4 project on your machine:
- Install elan (Lean version manager)
- Create a Lean project:
lake new formal-verification - Add Mathlib as a dependency in
lakefile.lean - Run
lake buildonce to build the Mathlib cache (~500MB, one-time cost) - Create a
verify.shscript in the project root (see below)
verify.sh template
#!/bin/bash
# verify.sh — compile and check a Lean 4 proof file
# Usage: bash verify.sh /path/to/proof.lean
set -e
export PATH="$HOME/.elan/bin:$PATH"
PROJECT_DIR="\x3Cyour-lean-project-dir>"
cp "$1" "$PROJECT_DIR/Proof.lean"
cd "$PROJECT_DIR"
lake build
When to use
Use cases:
- Proving that a code fix satisfies the intended properties
- Formal verification of security properties
- Proving the correctness of algorithms
- Verifying the safety of protocols
- Proving that multiple conditional branches are exhaustive
- Providing formal assurance that "this change does not break existing behavior"
Do not use:
- When simple unit tests are sufficient
- When you want to check runtime behavior (test by actually running it)
- For unprovable subjective properties (UX, design, etc.)
- For I/O-heavy code (file systems, network, databases — Lean 4 is for pure functions)
- For entire large codebases — focus on the critical core, not 100% coverage
Workflow Patterns
Pattern 1: Verify an existing code fix
1. Write the code + specification in Lean 4
2. Call Leanstral API: "Prove this implementation satisfies the specification"
3. Compile the returned proof with verify.sh
4. Pass → correctness guaranteed. Fail → failure point is the bug location.
Pattern 2: Spec-driven development
1. Write the specification in Lean 4 first (what "correct" means)
2. Call Leanstral to generate both implementation AND proof
3. Compile — when the proof passes, development is complete. No tests needed.
Pattern 3: Root-cause a bug
1. Observe the bug's symptoms
2. Write the "correct behavior" specification in Lean 4
3. Call Leanstral to prove the current implementation satisfies it
4. The point where the proof fails = the bug's root cause
5. Fix, then re-prove to confirm
Pattern 4: Safe refactoring
1. Write the pre- and post-refactoring code
2. Define equivalence as a specification
3. Call Leanstral to prove they are equivalent
4. Proof passes → refactoring is mathematically safe
Pattern 5: State machine / concurrency verification
1. Model the state machine as an inductive proposition in Lean 4
2. Formulate properties: "no deadlocks", "no unreachable states", etc.
3. Call Leanstral to prove them
→ Catches concurrency bugs exhaustively, not probabilistically
Verification Flow
Step 1: Identify the properties to be verified
Clearly state the properties to be proven in natural language. For example, when verifying a code change that adds a new boolean guard condition:
- "When the user cancels, the side effect does not fire."
- "During system-generated timeouts, the side effect fires as before (unchanged)."
- "The guard flag only becomes true on user-initiated operations."
Step 2: Build a formal model
Convert the natural language properties into Lean 4 types and theorems:
-- Model the relevant boolean conditions as variables
variable (timedOut compactDuring timeExecuting userAborted : Bool)
-- Condition BEFORE the change
def triggerOld : Bool := timedOut && !compactDuring && !timeExecuting
-- Condition AFTER the change (adds userAborted guard)
def triggerNew : Bool := timedOut && !compactDuring && !timeExecuting && !userAborted
Step 3: Call Leanstral to generate proofs
Send the model + theorems to the Leanstral API with a prompt like:
Prove the following theorems in Lean 4 using Mathlib.
Use only the tactics: simp, tauto, rw, cases, rfl, constructor.
[paste your model + theorems here]
Step 4: Save output and compile
# Save Leanstral's response to a .lean file
echo "$LEANSTRAL_OUTPUT" > FormalVerification.lean
# Compile and verify
bash \x3Clean-project-dir>/verify.sh FormalVerification.lean
- Success:
Build completed successfully→ Proof completed - Failure: Read the error message → Send the error back to Leanstral → Retry
Step 5: Report the results
- List of proven theorems
- Compilation output (success message)
- Which real-world properties each theorem corresponds to
Prompt Strategy
When instructing Leanstral via the API, always provide:
[Context]
- Lean 4 version in use
- Dependencies (Mathlib, specific modules)
- The code/specification to verify
[Instruction]
- What property to prove
- Optional: suggested tactics or approach
- Any constraints (e.g., "only use simp and tauto")
Concrete Example: Algorithm Correctness
-- Define the specification FIRST
def sorted_correct (arr : Array Nat) : Prop :=
∀ i j, i \x3C j → j \x3C arr.size → arr[i]! ≤ arr[j]!
-- Send to Leanstral: "Prove that bubbleSort satisfies sorted_correct"
def bubbleSort (arr : Array Nat) : Array Nat := by
sorry -- implementation
theorem bubbleSort_correct (arr : Array Nat) :
sorted_correct (bubbleSort arr) := by
sorry -- Leanstral generates this proof
Using as an OpenClaw Sub-Agent (Optional)
If you have already configured a Leanstral sub-agent in your OpenClaw gateway (this is separate setup — see the gateway's agent configuration), you can delegate proof generation to it:
sessions_spawn:
agentId: \x3Cyour-leanstral-agent-id>
task: |
You are a Lean 4 formal verification expert.
## Context
[Description of the verification target and properties to prove]
## Environment
Lean 4 is available on the HOST. To compile:
```bash
export PATH="$HOME/.elan/bin:$PATH"
bash \x3Clean-project-dir>/verify.sh /workspace/your-file.lean
```
## Task
1. Identify the properties to be verified
2. Build a formal model in Lean 4
3. Write and prove theorems
4. Verify compilation using verify.sh (mandatory)
5. If it fails, read the error, fix the proof, and recompile
6. Report the final compilation output
Save to: /workspace/FormalVerification.lean
Recommended sub-agent parameters (if configuring one):
| Setting | Recommended |
|---|---|
| Model | labs-leanstral-2603 (Mistral) |
| Temperature | 1.0 |
| Max tokens | 32000 |
| Thinking / reasoning | high |
| Timeout | 600–1200 seconds |
| Fallback models | Any capable reasoning models |
Best Practices
✅ DO
- Write the specification first. Define "correct" before implementing.
- Prove in small steps. Break large theorems into lemmas.
- Set reasoning_effort="high". Essential for non-trivial proofs.
- Use pass@N. If the first API call fails, retry — each attempt explores different strategies.
- Feed Lean errors back to Leanstral. The error message tells exactly what failed.
- Leverage Mathlib. Use existing proven lemmas — don't re-prove everything from scratch.
- Keep proofs small and focused. One property per verification task.
❌ DON'T
- Use for I/O-heavy code. Lean 4 is for pure functions. File I/O, networking, databases are impractical to formalize.
- Formalize an entire large codebase. Pick the critical 5% — the core logic that must never be wrong.
- Confuse "proof passes" with "spec is correct." Lean verifies that the implementation matches the spec. The spec's correctness is a human judgment.
- Use Leanstral for general tasks. It's specialized for Lean 4 proofs — not for Python web apps, general chat, or code generation in other languages.
- Trust proofs blindly. Lean's verification is trustworthy, but rare hallucinations (non-existent theorems/tactics) are caught by the compiler.
Important Notes
Lean on the host
Lean 4 with Mathlib requires ~500MB of cached dependencies. Run it directly on
your machine — containerizing it is impractical. The verify.sh script compiles
.lean files using the host's Lean installation.
Timeout management
The Leanstral model can take minutes to generate proofs. Countermeasures:
- Set generous API timeouts (600–1200s)
- Use pass@N: retry with fresh API calls if the first proof fails
- Divide complex proofs into smaller lemmas
Applications in Other Fields
| Field | Example |
|---|---|
| Algorithms | Correctness of sorting/searching, proof of computational complexity |
| Security | Safety of authentication protocols, access control properties |
| Business logic | Consistency of fee calculations, exhaustiveness of discount rules |
| Data integrity | Satisfaction of DB constraints, safety of schema migrations |
| Protocols | Deadlock-free state transitions, message ordering guarantees |
| Mathematics | Correctness of statistical calculations, equivalence of formula transformations |
| State machines | No unreachable states, all transitions defined, invariant preservation |
| Refactoring | Pre/post equivalence proofs, behavioral preservation guarantees |
Application procedure
- Define the property in natural language — "If X, then Y must always hold"
- Build a formal model — Variables, types, functions that model the system
- Write theorems — Express properties using
theorem - Call Leanstral API — Send the model + theorems and request proofs
- Compile and verify —
lake buildchecks the proofs
Basic proof tactics
| Tactic | Use |
|---|---|
simp |
Expand definitions and simplify |
tauto |
Automatic proof of propositional logic |
rw [h] |
Rewrite using assumption h |
cases |
Case analysis |
intro h |
Introduce implication |
rfl |
Trivial equality |
constructor |
Split a conjunction |
Limitations
| Limitation | Detail |
|---|---|
| Lean 4 only | Does not support Coq, Isabelle, Agda, or other proof assistants |
| I/O-poor | File systems, networking, databases cannot be formalized practically |
| Learning curve | Requires understanding of Lean 4's tactic language |
| Not for full codebases | Formalizing an entire project is impractical — focus on critical core |
| Proof length | Complex proofs can be very long and may exceed context |
| Hallucinations | Rarely generates non-existent theorems/tactics — Lean catches these |
Error Handling
Compilation errors
error: unknown identifier 'foo'
→ The definition does not exist. Define it with def or add an import.
error: type mismatch
→ Type mismatch. Check the variable type declaration.
error: tactic 'simp' failed
→ simp cannot prove it. Use more specific tactics (cases, rw).
Pass@N retry loop
# Basic retry loop: try up to 3 times, stop on first success
for i in 1 2 3; do
echo "Attempt $i..."
curl -s ... | python3 -c "import sys,json; print(json.load(sys.stdin)['choices'][0]['message']['content'])" > proof.lean
if bash verify.sh proof.lean 2>/dev/null; then
echo "✅ Proof verified on attempt $i"
break
fi
echo "❌ Attempt $i failed, retrying..."
done
Reference Links
- Leanstral blog: https://mistral.ai/news/leanstral
- Model (HuggingFace): https://huggingface.co/mistralai/Leanstral-2603
- API docs: https://docs.mistral.ai/models/leanstral-26-03
- Lean 4: https://github.com/leanprover/lean4
- Mathlib: https://github.com/leanprover-community/mathlib4
- Lean AI leaderboard: https://lean-lang.org/eval/
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install leanstral-formal-verification - After installation, invoke the skill by name or use
/leanstral-formal-verification - Provide required inputs per the skill's parameter spec and get structured output
What is Leanstral Formal Verification?
Formal verification using Lean 4 + Leanstral (labs-leanstral-2603) model. Use when: you need mathematical proof of code correctness, protocol verification, a... It is an AI Agent Skill for Claude Code / OpenClaw, with 136 downloads so far.
How do I install Leanstral Formal Verification?
Run "/install leanstral-formal-verification" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Leanstral Formal Verification free?
Yes, Leanstral Formal Verification is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Leanstral Formal Verification support?
Leanstral Formal Verification is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Leanstral Formal Verification?
It is built and maintained by 大村愛弥 (@lykeion-dev); the current version is v1.2.0.