Case Study: Data Analysis Assistant — NL2SQL, Chart Interpretation and Auto-Reports
Chapter 24: Project — Data Analysis Assistant — NL2SQL, Chart Interpretation and Report Automation
Let business users query databases, interpret charts, and generate reports in natural language — this chapter shows how to use Dify to put data analysis capability in everyone's hands.
Chapter Overview
"What were last week's sales by region?" "What does this chart tell us?" "Write me a quarterly data analysis report" — these requests flood data analytics teams daily, consuming analyst time on tasks that, while individually important, follow predictable patterns.
The core capability is NL2SQL (Natural Language to SQL): converting a user's natural language question into a precise SQL query, executing it, and presenting results in a human-readable way. This is not a new idea, but in the LLM era, quality and ease of use have improved dramatically.
Case background: A retail chain (500 stores nationwide), an 8-person data team handling approximately 150 data query requests daily from store management, marketing, and supply chain departments. Core databases contain:
- Sales data: daily transaction records (last 3 years, 200 million rows)
- Inventory data: real-time store/warehouse inventory
- Member data: 8 million member profiles and purchase records
- Store data: basic information and operational metrics for 500 stores
Goals:
- Business users can self-serve 80% of daily data queries (currently they must engage the data team)
- Analysts freed from repetitive queries to focus on high-value analysis
- Weekly/monthly reports automatically generated, reducing report creation time by 60%
Level 1: Core Concepts (1–3 Years Experience)
How NL2SQL Works
NL2SQL is not magic. The workflow is straightforward:
User question (natural language)
↓
Database Schema context injection
↓
LLM generates SQL
↓
SQL safety validation (injection prevention)
↓
Execute SQL (read-only permission)
↓
Result formatting (table/natural language)
↓
Return to user
Why can an LLM generate correct SQL?
LLMs have seen vast amounts of SQL in training data. But to generate SQL for your company's database, they need to know your table structure (Schema). This is why "Schema injection" is the core of NL2SQL.
Analogy: Getting an LLM to write SQL is like hiring a smart SQL developer to query your database. They know SQL perfectly — but they need to read your database documentation (Schema) before they can write accurate queries.
Configuring NL2SQL in Dify
Step 1: Prepare the Schema document
Format the database table structure for LLM comprehension:
# Database Schema
## Database: sales_db
### Table: daily_sales (Daily Sales Records)
| Column | Type | Description | Example |
|--------|------|-------------|---------|
| id | BIGINT | Primary key | 10001 |
| store_id | INT | Store ID | 42 |
| sale_date | DATE | Sale date | 2024-03-15 |
| product_id | INT | Product ID | 5001 |
| product_name | VARCHAR | Product name | "Phone Case" |
| category | VARCHAR | Product category | "Phone Accessories" |
| quantity | INT | Units sold | 5 |
| unit_price | DECIMAL | Unit price (USD) | 9.99 |
| total_amount | DECIMAL | Sale amount (USD) | 49.95 |
### Table: stores (Store Information)
| Column | Type | Description | Example |
|--------|------|-------------|---------|
| id | INT | Store ID | 42 |
| store_name | VARCHAR | Store name | "NYC Times Square" |
| city | VARCHAR | City | "New York" |
| state | VARCHAR | State | "NY" |
| region | VARCHAR | Region (Northeast/South/Midwest/West) | "Northeast" |
| store_type | VARCHAR | Type (flagship/standard/small) | "flagship" |
### Key relationships
- daily_sales.store_id = stores.id
### Business rules
- "This month" = from 1st of current month to today
- "Last week" = the previous complete calendar week (Monday to Sunday)
- Revenue = SUM(total_amount)
Step 2: Dify Workflow configuration
Node 1: Schema context preparation
→ Dynamically fetch relevant table schemas
Node 2: SQL generation (LLM)
Input: user question + schema context
Output: SQL query
Node 3: SQL safety check (Code node)
→ Reject INSERT/UPDATE/DELETE/DROP operations
→ Check for SQL injection patterns
Node 4: SQL execution (HTTP node)
→ Call query API (never connect to DB directly from Dify)
Node 5: Result interpretation (LLM)
Input: SQL query results
Output: natural language interpretation
Node 6: Visualization suggestion (optional)
→ Recommend chart type based on data characteristics
Core SQL generation prompt:
You are a professional data analyst with expert knowledge of MySQL and business analytics.
## Database information:
{{schema_context}}
## Business rules:
- All monetary values are in USD
- "This month" = from the 1st of the current month to today
- "Last week" = the previous complete calendar week
## Task:
Convert the user's natural language question to a MySQL query.
## Rules:
1. Generate only SELECT statements — no INSERT/UPDATE/DELETE/DROP
2. For time ranges, use precise date functions:
- This month: WHERE sale_date >= DATE_FORMAT(NOW(), '%Y-%m-01') AND sale_date <= NOW()
- Last week: WHERE YEARWEEK(sale_date) = YEARWEEK(NOW()) - 1
3. Round monetary amounts to 2 decimal places: ROUND(value, 2)
4. Queries expected to return more than 1,000 rows must include a LIMIT clause
5. If the question is ambiguous, choose the most likely interpretation and note assumptions in comments
## User question:
{{user_question}}
Output only SQL (no explanations):
```sql
-- Intent: [describe your understanding of the query]
-- Assumptions: [list any assumptions made]
[SQL statement]
---
## Level 2: Mechanism Deep Dive (3–5 Years Experience)
### Complete NL2SQL Service Implementation
```python
import re
import json
import sqlparse
import mysql.connector
from typing import Optional
class NL2SQLService:
def __init__(self, db_config: dict, dify_config: dict):
self.db_config = db_config
self.dify_config = dify_config
self.schema_manager = SchemaManager(db_config)
def process_query(self, user_question: str) -> dict:
# Step 1: Smart schema selection
relevant_tables = self._identify_relevant_tables(user_question)
schema_context = self.schema_manager.get_schema_context(relevant_tables)
# Step 2: Generate SQL via Dify
sql = self._generate_sql(user_question, schema_context)
if not sql:
return {'error': 'SQL generation failed'}
# Step 3: Safety check
is_safe, safety_error = self._validate_sql_safety(sql)
if not is_safe:
return {'error': f'SQL safety check failed: {safety_error}'}
# Step 4: Execute query
result, execute_error = self._execute_sql(sql)
if execute_error:
corrected_sql = self._fix_sql(sql, execute_error, schema_context)
if corrected_sql:
result, execute_error = self._execute_sql(corrected_sql)
sql = corrected_sql
if execute_error:
return {'sql': sql, 'error': f'Query execution failed: {execute_error}'}
# Step 5: Interpret results
interpretation = self._interpret_results(user_question, sql, result)
# Step 6: Chart suggestion
chart = self._suggest_chart(result)
return {
'sql': sql,
'result': result,
'interpretation': interpretation,
'chart_suggestion': chart,
'row_count': len(result.get('rows', [])),
}
def _validate_sql_safety(self, sql: str) -> tuple:
sql_clean = re.sub(r'--.*$', '', sql, flags=re.MULTILINE)
sql_clean = re.sub(r'/\*.*?\*/', '', sql_clean, flags=re.DOTALL)
sql_clean = sql_clean.strip().upper()
if not sql_clean.startswith('SELECT'):
return False, f"Only SELECT statements are allowed; got: {sql_clean[:20]}"
dangerous_keywords = [
'INSERT', 'UPDATE', 'DELETE', 'DROP', 'CREATE', 'ALTER',
'TRUNCATE', 'EXEC', 'EXECUTE', 'CALL', 'GRANT', 'REVOKE',
]
for keyword in dangerous_keywords:
if keyword in sql_clean:
return False, f"Forbidden keyword detected: {keyword}"
return True, None
def _execute_sql(self, sql: str, max_rows: int = 1000) -> tuple:
try:
connection = mysql.connector.connect(
host=self.db_config['host'],
user=self.db_config['readonly_user'],
password=self.db_config['readonly_password'],
database=self.db_config['database'],
connection_timeout=30
)
cursor = connection.cursor(dictionary=True)
# Add LIMIT protection
if 'LIMIT' not in sql.upper():
sql = sql.rstrip(';') + f' LIMIT {max_rows}'
cursor.execute(sql)
rows = cursor.fetchall()
rows = self._serialize_rows(rows)
columns = [desc[0] for desc in cursor.description] if cursor.description else []
return {'columns': columns, 'rows': rows, 'total_rows': len(rows)}, None
except mysql.connector.Error as e:
return None, str(e)
finally:
if connection and connection.is_connected():
connection.close()
def _suggest_chart(self, result: dict) -> dict:
if not result or not result.get('rows'):
return {'type': 'none', 'reason': 'No data'}
columns = result['columns']
rows = result['rows']
row_count = len(rows)
has_date_column = any(
col.lower() in ['date', 'month', 'week', 'sale_date', 'created_at']
for col in columns
)
has_numeric = sum(1 for col in columns
if isinstance(rows[0].get(col), (int, float)))
has_category = any(
col.lower() in ['category', 'region', 'state', 'store_type', 'product_name']
for col in columns
)
if has_date_column and has_numeric >= 1 and row_count > 5:
return {'type': 'line', 'reason': 'Time series data — line chart shows trends best'}
elif has_category and has_numeric >= 1 and row_count <= 20:
return {'type': 'bar', 'reason': 'Category comparison — bar chart shows rankings clearly'}
elif has_numeric == 1 and row_count <= 10 and has_category:
return {'type': 'pie', 'reason': 'Share data — pie chart shows proportions intuitively'}
else:
return {'type': 'table', 'reason': 'Complex data — table shows full detail'}
Report Automation
import plotly.express as px
import base64
class ReportGenerator:
def __init__(self, nl2sql_service: NL2SQLService):
self.nl2sql = nl2sql_service
def generate_weekly_report(self, week_start: str, week_end: str) -> dict:
report_queries = [
{'title': 'Overall Sales Summary',
'question': f'Total revenue, order count, and average order value from {week_start} to {week_end}'},
{'title': 'Sales by Region',
'question': f'Ranked regional sales from {week_start} to {week_end}'},
{'title': 'Daily Revenue Trend',
'question': f'Daily revenue trend from {week_start} to {week_end}'},
{'title': 'Top 10 Best-Selling Products',
'question': f'Top 10 products by units sold from {week_start} to {week_end}'},
{'title': 'Category Revenue Share',
'question': f'Revenue share by product category from {week_start} to {week_end}'},
]
report_sections = []
for query_config in report_queries:
result = self.nl2sql.process_query(query_config['question'])
if 'error' in result:
report_sections.append({'title': query_config['title'], 'error': result['error']})
continue
chart = self._generate_chart(result['result'], result['chart_suggestion'])
report_sections.append({
'title': query_config['title'],
'chart': chart,
'data': result['result'],
'analysis': result.get('interpretation', ''),
'sql': result['sql']
})
executive_summary = self._generate_executive_summary(
week_start, week_end, report_sections
)
return {
'report_type': 'weekly',
'period': f'{week_start} to {week_end}',
'generated_at': datetime.now().isoformat(),
'executive_summary': executive_summary,
'sections': report_sections
}
def _generate_chart(self, data: dict, chart_suggestion: dict) -> Optional[str]:
if not data or not data.get('rows'):
return None
rows = data['rows']
columns = data['columns']
chart_type = chart_suggestion.get('type', 'table')
if chart_type == 'bar' and len(columns) >= 2:
fig = px.bar(rows, x=columns[0], y=columns[1], title='')
elif chart_type == 'line' and len(columns) >= 2:
fig = px.line(rows, x=columns[0], y=columns[1], markers=True)
elif chart_type == 'pie' and len(columns) >= 2:
fig = px.pie(rows, names=columns[0], values=columns[1])
else:
return None
img_bytes = fig.to_image(format='png', width=800, height=400, scale=2)
return base64.b64encode(img_bytes).decode('utf-8')
def _generate_executive_summary(self, week_start, week_end, sections) -> str:
summary_data = {
s['title']: s.get('analysis', '')
for s in sections if 'error' not in s
}
prompt = f"""
You are a senior retail data analyst. Based on the following data, write a concise
200-word executive summary for the period {week_start} to {week_end}.
Data summaries:
{json.dumps(summary_data, indent=2)}
Requirements:
1. Highlight the 3-5 most important business insights
2. Clearly indicate which metrics exceeded expectations and which need attention
3. Provide 2-3 actionable recommendations
4. Keep language concise and professional
Executive Summary:
"""
return self._call_llm(prompt)
def export_to_markdown(self, report: dict) -> str:
md = f"# Weekly Report: {report['period']}\n\n"
md += f"**Generated**: {report['generated_at']}\n\n"
md += "## Executive Summary\n\n"
md += report['executive_summary'] + "\n\n---\n\n"
for section in report['sections']:
md += f"## {section['title']}\n\n"
if 'error' in section:
md += f"Data unavailable: {section['error']}\n\n"
continue
md += section.get('analysis', '') + "\n\n"
return md
Level 3: Source Code and Architecture (5+ Years)
Improving NL2SQL Accuracy
Pure LLM SQL generation achieves approximately 70–80% accuracy. These techniques push it to 90%+:
Technique 1: Few-Shot Examples
FEW_SHOT_EXAMPLES = [
{
"question": "What were last week's sales by region?",
"sql": """
SELECT
s.region,
ROUND(SUM(ds.total_amount), 2) AS revenue,
COUNT(*) AS order_count
FROM daily_sales ds
JOIN stores s ON ds.store_id = s.id
WHERE YEARWEEK(ds.sale_date) = YEARWEEK(NOW()) - 1
GROUP BY s.region
ORDER BY revenue DESC;
"""
},
{
"question": "Top 5 products by units sold this month",
"sql": """
SELECT
product_name,
SUM(quantity) AS units_sold,
ROUND(SUM(total_amount), 2) AS revenue
FROM daily_sales
WHERE sale_date >= DATE_FORMAT(NOW(), '%Y-%m-01')
GROUP BY product_id, product_name
ORDER BY units_sold DESC
LIMIT 5;
"""
},
]
Technique 2: Chain-of-Thought prompting
COT_PROMPT_SUFFIX = """
Before generating the SQL, analyze:
1. Which data dimensions does this question involve? (time/geography/product/store)
2. Which metrics need aggregation? (SUM/COUNT/AVG)
3. Which tables need to be joined?
4. How should the time range be precisely expressed?
After analysis, output the SQL:
"""
Technique 3: Self-consistency validation
async def generate_sql_with_self_check(question: str, schema: str) -> str:
sql = await generate_sql(question, schema)
check_prompt = f"""
Review whether the following SQL correctly answers the user's question:
User question: {question}
SQL: {sql}
Check:
1. Does the SQL logic match the question's intent?
2. Is the time range correctly expressed?
3. Is the aggregation correct?
4. Are there any syntax errors?
If there are issues, provide corrected SQL. If correct, output "CORRECT":
"""
check_result = await call_llm(check_prompt)
if check_result.strip().upper() != 'CORRECT':
corrected = extract_sql_from_text(check_result)
return corrected or sql
return sql
Multi-turn Conversational NL2SQL
class ConversationalNL2SQL:
def __init__(self, nl2sql_service: NL2SQLService):
self.nl2sql = nl2sql_service
self.conversation_history = []
def process(self, user_question: str) -> dict:
if self.conversation_history:
contextual_question = self._rewrite_with_context(
user_question, self.conversation_history
)
else:
contextual_question = user_question
result = self.nl2sql.process_query(contextual_question)
self.conversation_history.append({
'user': user_question,
'sql': result.get('sql'),
'summary': result.get('interpretation', '')[:200]
})
if len(self.conversation_history) > 5:
self.conversation_history = self.conversation_history[-5:]
return result
def _rewrite_with_context(self, question: str, history: list) -> str:
context = "\n".join([
f"Q: {h['user']}\nSummary: {h['summary']}"
for h in history[-3:]
])
rewrite_prompt = f"""
Based on the following conversation history, rewrite the follow-up question
as a complete standalone question:
History:
{context}
Follow-up question: {question}
Rewritten standalone question (output question text only):
"""
return self._call_llm(rewrite_prompt)
Level 4: Production Traps and Decisions (Expert Perspective)
Trap 1: LLM Generates Dangerous Full Table Scans
Symptom: User asks "which products have never been sold?" and the LLM generates an unconstrained full table scan on a 200-million-row table — crashing the database.
Prevention:
- Query protection wrapper:
def add_protection(sql: str, max_rows: int = 1000, timeout_ms: int = 30000) -> str:
protected = f"/*+ MAX_EXECUTION_TIME({timeout_ms}) */ {sql}"
if 'LIMIT' not in sql.upper():
protected = protected.rstrip(';') + f' LIMIT {max_rows}'
return protected
- Always route NL2SQL queries to the read replica, never the primary:
connection = mysql.connector.connect(
host=REPLICA_HOST, # Read replica
user='readonly_user',
...
)
- Execution timeout with query kill:
import threading
def execute_with_timeout(cursor, sql: str, timeout: int = 30) -> list:
result, error = [], []
def target():
try:
cursor.execute(sql)
result.extend(cursor.fetchall())
except Exception as e:
error.append(str(e))
thread = threading.Thread(target=target)
thread.start()
thread.join(timeout=timeout)
if thread.is_alive():
cursor.execute("KILL QUERY " + str(cursor.connection.connection_id))
raise TimeoutError(f"Query timed out after {timeout}s")
if error:
raise Exception(error[0])
return result
Trap 2: Schema Exposure Risk
NL2SQL requires sending table structures to the LLM, potentially exposing sensitive business data schema.
Mitigations:
- Sanitize schema descriptions — remove sensitive column comments
- Use Azure OpenAI or a privately deployed LLM — data stays on-premises
- Send only relevant table schemas (smart schema selection), never the full database schema
Trap 3: Hallucinated Interpretations
User asks "What were Beijing store sales?" The database returns ¥1,234,567. The LLM may interpret: "Beijing store monthly sales reached ¥1.23M, up 15% year-over-year" — but "up 15% YoY" was fabricated!
Fix: Strictly constrain interpretation to only the actual returned data:
def interpret_results_safely(question: str, sql: str, result: dict) -> str:
data_json = json.dumps(result['rows'][:20], indent=2)
prompt = f"""
User question: {question}
SQL executed: {sql}
Query results: {data_json}
Interpret the query results in natural language.
Strict rules:
1. Only describe what is present in the query results
2. Never infer or calculate metrics not present in the data (no YoY, MoM)
3. If there is no comparison data, do not mention "growth" or "decline"
4. If results are empty, say "No data matching the conditions was found"
Interpretation:
"""
return call_llm(prompt)
Real Business Outcomes
After 3 months of the retail chain's data analysis assistant:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Daily queries handled by data team | 150 | 35 | -77% |
| Business user wait time for data | avg. 4 hours | < 2 minutes | -97% |
| Weekly report preparation time | 8 hours | 45 minutes | -91% |
| Analyst time on high-value analysis | 30% | 70% | +133% |
| NL2SQL accuracy (correct executable SQL) | — | 91% | — |
| User satisfaction with self-service | — | 4.3/5 | — |
Chapter Summary
Core technical takeaways for data analysis assistants:
-
Schema context is the foundation of NL2SQL: LLMs know nothing about your business data structure. A carefully designed Schema document with examples, business rules, and relationship explanations is the single biggest factor in accuracy.
-
Safety is non-negotiable: Read-only accounts, SQL allowlist (SELECT only), execution timeouts, and row limits — all four safeguards are essential.
-
Few-Shot + CoT pushes accuracy from 75% to 90%+: Few-shot examples and chain-of-thought prompting are practical, high-ROI improvements worth implementing before anything else.
-
Report automation = predefined queries + LLM insights: Weekly and monthly reports shouldn't be "let AI do whatever" — they're structured around predefined key metrics queries, with LLM generating the narrative and executive summary.
-
Result interpretation must be strictly constrained: LLMs will confidently fabricate YoY/MoM comparisons if not restricted. In business decision-making contexts, this kind of hallucination is extremely dangerous.
-
Multi-turn dialogue requires question rewriting: Follow-up questions must be rewritten into complete standalone questions before entering the NL2SQL pipeline — this is the correct way to ensure context consistency.
NL2SQL accuracy improvement roadmap:
| Stage | Techniques | Expected Accuracy |
|---|---|---|
| Basic | LLM + Schema | 70–75% |
| Improved | + Few-Shot + CoT | 85–90% |
| Production | + Self-validation + Auto-correction | 90–93% |
| Expert | + Vector schema retrieval + Fine-tuning | 95%+ |