Description

Extract recipes from Instagram reels. Use when a user sends an Instagram reel link and wants to get the recipe from the caption. Parses ingredients, instructions, and macros into a clean format.

README (SKILL.md)

Instagram Recipe Extractor

Name: ClawdBites
Author: kylelol

Extract recipes from Instagram reels using a multi-layered approach:

Caption parsing — Instant, check description first
Audio transcription — Whisper (local, no API key)
Frame analysis — Vision model for on-screen text

No Instagram login required. Works on public reels.

When to Use

User sends an Instagram reel link
User mentions "recipe from Instagram" or "save this reel"
User wants to extract recipe details from a video post

How It Works (MANDATORY FLOW)

ALWAYS follow this complete flow — do not stop after caption if instructions are missing:

User sends Instagram reel URL
Extract metadata using yt-dlp (--dump-json)
Parse the caption for recipe details
Check completeness: Does caption have BOTH ingredients AND instructions?
- ✅ YES: Present the recipe
- ❌ NO (missing instructions or incomplete): Automatically proceed to audio transcription — do NOT stop or ask the user
If audio transcription needed:
- Download video: yt-dlp -o "/tmp/reel.mp4" "URL"
- Extract audio: ffmpeg -y -i /tmp/reel.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 /tmp/reel.wav
- Transcribe: whisper /tmp/reel.wav --model base --output_format txt --output_dir /tmp
- Merge caption ingredients with audio instructions
Present clean, formatted recipe (combining caption + audio as needed)
User decides what to do (save to notes, add to wishlist, etc.)

Completeness check heuristics:

Has ingredients = contains 3+ quantity+item patterns (e.g., "1 cup flour", "2 lbs chicken")
Has instructions = contains action verbs (blend, cook, bake, mix, pour, add) + sequence OR numbered steps

Extraction Command

yt-dlp --dump-json "https://www.instagram.com/reel/SHORTCODE/" 2>/dev/null

Key fields from JSON output:

description — The caption containing the recipe
uploader — Creator's name
channel — Creator's handle
webpage_url — Original URL
like_count — Popularity indicator

Recipe Parsing

Look for these patterns in the caption:

Macros:

"X Calories | Xg P | Xg C | Xg F"
"Macros per serving"
"Cal/Protein/Carbs/Fat"

Ingredients:

Lines starting with quantities (1 cup, 2 tbsp, 24oz)
Lines with measurement units
Emoji bullet points (🥩 🌽 🧀 etc.)

Sections:

"For the [component]:"
"Ingredients:"
"Instructions:"
"Directions:"

Output Format

Present extracted recipe cleanly:

## [Recipe Name]
*From @[handle]*

**Macros (per serving):** X cal | Xg P | Xg C | Xg F

### Ingredients
- [ingredient 1]
- [ingredient 2]
...

### Instructions
1. [step 1]
2. [step 2]
...

---
Source: [original URL]

User Actions After Extraction

Let the user decide what to do:

"Save to my recipes" → Save to Apple Notes (if meal-planner skill available)
"Add to wishlist" → Save to memory/recipe-wishlist.json
"Just show me" → Display only, no save
"Plan this for next week" → Hand off to meal-planner skill

Wishlist Storage

Optional storage for recipes user wants to try later:

memory/recipe-wishlist.json:

{
  "recipes": [
    {
      "name": "Recipe Name",
      "source": "instagram",
      "sourceUrl": "https://instagram.com/reel/...",
      "handle": "@creator",
      "addedDate": "2026-01-26",
      "tried": false,
      "macros": {
        "calories": 585,
        "protein": 56,
        "carbs": 25,
        "fat": 28,
        "servings": 3
      },
      "ingredients": [...],
      "instructions": [...]
    }
  ]
}

Error Handling

If yt-dlp fails:

Check if URL is valid Instagram reel format
May be a private account — inform user
Suggest user paste caption text manually as fallback

If no recipe found in caption (IMPORTANT):

After extracting, scan the caption for recipe indicators:

Ingredient quantities (numbers + units like oz, cups, tbsp, lbs)
Recipe sections ("For the...", "Ingredients:", "Instructions:")
Cooking verbs (bake, cook, sauté, mix, combine)
Macro information (calories, protein, carbs, fat)

If none found, tell the user clearly:

"I pulled the caption but it doesn't look like the recipe is there — it might just be a teaser or the recipe is only shown in the video itself. Here's what the caption says:

[show caption]

A few options:

Check the comments — sometimes creators post recipes there

Check their bio link — might lead to the full recipe

Describe what you saw in the video and I can help find a similar recipe"

Recipe detection heuristics:

HAS_RECIPE if caption contains:
- 3+ ingredient-like patterns (quantity + food item)
- OR "recipe" + ingredient list
- OR macro breakdown + ingredients
- OR numbered/bulleted instructions

NO_RECIPE if caption is:
- Mostly hashtags
- Just a description/teaser
- Under 100 characters
- No quantities or measurements

Integration with meal-planner

The meal-planner skill can reference this skill:

When planning meals, check wishlist for untried recipes
Suggest wishlist recipes that match pantry items
Mark recipes as "tried" after they're used in a meal plan

Audio Transcription (V2) — MANDATORY FALLBACK

When caption is missing instructions, ALWAYS transcribe the audio automatically. Do not stop and ask the user — just do it. This is the most common case since creators often put ingredients in captions but speak the instructions.

Step 1: Download video

yt-dlp -o "/tmp/reel.mp4" "https://instagram.com/reel/XXX"

Step 2: Extract audio

ffmpeg -i /tmp/reel.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 /tmp/reel.wav

Step 3: Transcribe with Whisper

/Users/kylekirkland/Library/Python/3.14/bin/whisper /tmp/reel.wav --model base --output_format txt --output_dir /tmp

Step 4: Parse transcript for recipe Look for cooking instructions, ingredients mentioned verbally.

Inference for Missing Measurements

ALWAYS infer quantities when not provided. Never present a recipe without amounts — estimate based on context and standard package sizes.

Vague Language → Specific Amounts

What they say	Infer
"some chicken"	~1 lb
"a bit of garlic"	2-3 cloves
"handful of spinach"	~2 cups
"drizzle of oil"	1-2 tbsp
"season to taste"	½ tsp salt, ¼ tsp pepper
"splash of soy sauce"	1-2 tbsp
"a few tablespoons"	2-3 tbsp
"some rice"	1 cup dry
"cheese on top"	½ - 1 cup shredded
"diced onion"	1 medium onion
"bell peppers"	2 peppers

Standard Package Sizes (when item mentioned without amount)

Ingredient	Standard Package	Infer
Puff pastry	17oz sheet	1 sheet
Ground beef/turkey	1 lb pack	1 lb
Chicken breast	~1.5 lb pack	1.5 lbs
Sausage links	14oz / 4-5 links	1 package
Bacon	12oz / 12 slices	½ package (6 slices)
Shredded cheese	8oz bag	1-2 cups
Tortillas	8-10 count	1 package
Canned beans	15oz can	1 can
Broth/stock	32oz carton	1-2 cups
Pasta	16oz box	8oz (half box)
Rice	2 lb bag	1-2 cups dry

Context-Aware Scaling

By recipe type:

Stir fry for 2 → 1 lb protein, 4 cups veggies
Soup/stew → 1.5-2 lbs protein, 4 cups broth
Sheet pan meal → 1.5 lbs protein, 3-4 cups veggies
Appetizers → smaller portions, estimate ~12-15 pieces per batch

By servings mentioned:

"Serves 4" → Scale standard amounts for 4
"Meal prep for the week" → Assume 5-8 servings
No servings mentioned → Default to 4 servings

By protein target (if user has macro goals):

40-50g protein per serving → ~6-8oz cooked meat per portion
Scale recipe protein accordingly

Output Format

Always present inferred amounts clearly:

### Ingredients
- 1 lb ground turkey *(estimated)*
- 1 medium onion, diced *(estimated)*
- 2 cups broth *(estimated based on typical soup)*

Mark inferred quantities with (estimated) so user knows what came from the source vs inference.

Combined Extraction Flow

1. TRY CAPTION (instant)
   └── yt-dlp --dump-json → parse description
   └── Recipe found? → DONE ✅
   └── Check for "pinned" / "in comments" / "check comments" → FLAG
   
2. IF FLAGGED: CHECK FOR CREATOR COMMENT
   └── Look through comments for creator's username
   └── If creator comment found with recipe → DONE ✅
   └── If not found → continue + notify user

3. TRY AUDIO (30-60 sec)
   └── Download video
   └── Extract audio with ffmpeg
   └── Transcribe with Whisper (base model)
   └── Parse transcript for recipe
   └── Infer missing measurements
   └── Recipe found? → DONE ✅

4. PRESENT RESULTS + PROMPT IF NEEDED
   └── Show what was extracted from audio
   └── If "pinned" was flagged, tell user:
       "The creator mentioned the full recipe is pinned in the comments.
        I extracted what I could from the audio, but if you want the 
        exact measurements, paste the pinned comment here and I'll 
        merge it with what I found."
   
5. TRY FRAME ANALYSIS (if audio incomplete)
   └── Extract 5-8 key frames with ffmpeg
   └── Send to Claude vision
   └── Ask: "Extract any recipe text, ingredients, or measurements shown"
   └── Merge findings with audio transcript
   
6. FALLBACK (nothing found)
   └── Inform user: "Recipe wasn't in caption or audio/video"
   └── Offer: search for similar recipe based on video title/description

Frame Analysis

Extract key frames and analyze with vision model.

Extract frames:

# Extract 1 frame every 5 seconds
ffmpeg -i /tmp/reel.mp4 -vf "fps=1/5" /tmp/frame_%02d.jpg

# Or extract specific number of frames evenly distributed
ffmpeg -i /tmp/reel.mp4 -vf "select='not(mod(n,30))'" -vsync vfr /tmp/frame_%02d.jpg

Send to vision model: Use Claude's image analysis to read each frame:

Recipe cards / title screens
Ingredient lists shown on screen
Measurements in text overlays
Step-by-step instructions displayed

Vision prompt:

Analyze this frame from a cooking video. Extract any:
- Recipe name or title
- Ingredients with quantities
- Cooking instructions
- Nutritional information / macros
- Any other recipe-related text shown

If no recipe text is visible, respond with "No recipe text found."

Merge strategy:

Audio transcript = primary source (spoken instructions)
Frame analysis = supplement (exact measurements, recipe cards)
Combine both, prefer specific measurements from visual over inferred from audio

Pinned Comment Detection

Scan caption for these phrases (case-insensitive):

"recipe pinned"
"pinned in comments"
"check comments"
"in the comments"
"comment below"
"recipe below"
"full recipe in comments"

If detected, flag and notify user after extraction:

"Heads up — the creator said the recipe is pinned in the comments. I got what I could from the audio, but yt-dlp can't access pinned comments without login. If you want the exact recipe, copy the pinned comment and send it to me — I'll format it properly."

Requirements

yt-dlp — brew install yt-dlp
ffmpeg — brew install ffmpeg
whisper — pip3 install openai-whisper (runs locally, no API key)
No Instagram login required for public reels

Usage Guidance

This skill appears coherent with its purpose, but check these before installing: (1) it will download the Instagram reel and transcribe audio locally without asking if the caption is incomplete — consider the user's privacy expectations when sharing links; (2) the SKILL.md references frame-analysis/vision OCR but doesn't declare OCR dependencies or commands—expect that feature may be incomplete or rely on platform vision capabilities; (3) Whisper is installed via pip (network install from PyPI) according to SKILL.md but the registry install list omits it—decide whether you want that pip install to run; (4) temporary media files are written to /tmp and optional wishlist writes to memory/recipe-wishlist.json—verify you are comfortable with local file writes; and (5) Homebrew installs for yt-dlp and ffmpeg are standard and expected. If any of these concerns matter, ask the skill author for clarification (explicit OCR/install steps and an opt-in prompt before downloading/transcribing) or only run the skill in a controlled environment.

Capability Analysis

Type: OpenClaw Skill Name: clawdbites Version: 0.1.0 The skill is classified as suspicious due to its reliance on powerful shell commands (`yt-dlp`, `ffmpeg`, `whisper`) and external API calls (implied Claude Vision API for frame analysis), which are inherently high-risk capabilities. While these actions are plausibly needed for the stated purpose of extracting recipes from Instagram reels, and there's no clear evidence of malicious intent like data exfiltration or unauthorized remote control, the broad access to shell execution and external network communication without stringent input sanitization or sandboxing raises concerns. The `SKILL.md` also contains strong directives to the AI agent (e.g., 'ALWAYS follow this complete flow', 'Automatically proceed to audio transcription'), which, while currently aligned with the skill's purpose, demonstrate a potential for prompt injection if the instructions were to be subtly altered for malicious ends.

Capability Assessment

✓ Purpose & Capability

Name/description (extract recipes from Instagram reels) matches the required binaries (yt-dlp, ffmpeg, Whisper) and the runtime flow (dump caption, download media, transcribe audio). These tools are appropriate and expected for the stated purpose.

ℹ Instruction Scope

The SKILL.md mandates an automatic flow that will download the reel and transcribe audio without asking the user if the caption is incomplete — this is coherent with the goal but is a behavioral choice with privacy implications. The doc also references 'frame analysis' / a vision model for on-screen text but provides no commands, dependencies, or heuristics for that step; that's an incomplete/underspecified part of the instructions.

ℹ Install Mechanism

Registry install specs list Homebrew formulas for yt-dlp and ffmpeg (reasonable). The SKILL.md metadata recommends installing Whisper with `pip3 install --user openai-whisper`; that pip install is present in SKILL.md metadata but not in the top-level registry install list—an inconsistency to be aware of. Installing Whisper via pip is a standard approach but has the usual network/install risks of pulling a package from PyPI.

✓ Credentials

No environment variables, tokens, or unrelated credentials are requested. The skill does write/read temporary files (e.g., /tmp/reel.mp4, /tmp/reel.wav) and optionally writes to a local memory file (memory/recipe-wishlist.json) when the user opts in; these file operations are proportional to the described functionality.

✓ Persistence & Privilege

The skill is not always-enabled and does not request elevated or cross-skill privileges. It may write an optional wishlist JSON and uses /tmp for transient files; this is normal and scoped to its purpose.

Version History

v0.1.0

Initial release of clawdbites: Extract recipes from Instagram reels. - Parses Instagram reel captions and video audio to extract ingredients, instructions, and macros. - Ensures recipes are complete by automatically transcribing audio if instructions are missing from the caption. - Presents results in a clean, structured recipe format with source and macros. - Provides clear fallback and error messages if recipes are not detected. - Supports wishlist storage for saving untried recipes and integrates with meal-planner skills. - Infers ingredient quantities when not specified, ensuring recipes always contain measurements.

Metadata

Slug clawdbites

Version 0.1.0

License —

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is ClawdBites?

Extract recipes from Instagram reels. Use when a user sends an Instagram reel link and wants to get the recipe from the caption. Parses ingredients, instructions, and macros into a clean format. It is an AI Agent Skill for Claude Code / OpenClaw, with 2030 downloads so far.

How do I install ClawdBites?

Run "/install clawdbites" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is ClawdBites free?

Yes, ClawdBites is completely free (open-source). You can download, install and use it at no cost.

Which platforms does ClawdBites support?

ClawdBites is cross-platform and runs anywhere OpenClaw / Claude Code is available (darwin, linux).

Who created ClawdBites?

It is built and maintained by kylelol (@kylelol); the current version is v0.1.0.

More Skills

ClawdBites

Instagram Recipe Extractor

When to Use

How It Works (MANDATORY FLOW)

Extraction Command

Recipe Parsing

Output Format

User Actions After Extraction

Wishlist Storage

Error Handling

Integration with meal-planner

Audio Transcription (V2) — MANDATORY FALLBACK

Inference for Missing Measurements

Vague Language → Specific Amounts

Standard Package Sizes (when item mentioned without amount)

Context-Aware Scaling

Output Format

Combined Extraction Flow

Frame Analysis

Pinned Comment Detection

Requirements

What is ClawdBites?

How do I install ClawdBites?

Is ClawdBites free?

Which platforms does ClawdBites support?

Who created ClawdBites?

💬 Comments