Chapter 24

Case Study: Data Analysis Assistant — NL2SQL, Chart Interpretation and Auto-Reports

Chapter 24: Project — Data Analysis Assistant — NL2SQL, Chart Interpretation and Report Automation

Let business users query databases, interpret charts, and generate reports in natural language — this chapter shows how to use Dify to put data analysis capability in everyone's hands.

Chapter Overview

"What were last week's sales by region?" "What does this chart tell us?" "Write me a quarterly data analysis report" — these requests flood data analytics teams daily, consuming analyst time on tasks that, while individually important, follow predictable patterns.

The core capability is NL2SQL (Natural Language to SQL): converting a user's natural language question into a precise SQL query, executing it, and presenting results in a human-readable way. This is not a new idea, but in the LLM era, quality and ease of use have improved dramatically.

Case background: A retail chain (500 stores nationwide), an 8-person data team handling approximately 150 data query requests daily from store management, marketing, and supply chain departments. Core databases contain:

Sales data: daily transaction records (last 3 years, 200 million rows)
Inventory data: real-time store/warehouse inventory
Member data: 8 million member profiles and purchase records
Store data: basic information and operational metrics for 500 stores

Goals:

Business users can self-serve 80% of daily data queries (currently they must engage the data team)
Analysts freed from repetitive queries to focus on high-value analysis
Weekly/monthly reports automatically generated, reducing report creation time by 60%

Level 1: Core Concepts (1–3 Years Experience)

How NL2SQL Works

NL2SQL is not magic. The workflow is straightforward:

User question (natural language)
        ↓
  Database Schema context injection
        ↓
  LLM generates SQL
        ↓
  SQL safety validation (injection prevention)
        ↓
  Execute SQL (read-only permission)
        ↓
  Result formatting (table/natural language)
        ↓
  Return to user

Why can an LLM generate correct SQL?

LLMs have seen vast amounts of SQL in training data. But to generate SQL for your company's database, they need to know your table structure (Schema). This is why "Schema injection" is the core of NL2SQL.

Analogy: Getting an LLM to write SQL is like hiring a smart SQL developer to query your database. They know SQL perfectly — but they need to read your database documentation (Schema) before they can write accurate queries.

Configuring NL2SQL in Dify

Step 1: Prepare the Schema document

Format the database table structure for LLM comprehension:

# Database Schema

## Database: sales_db

### Table: daily_sales (Daily Sales Records)
| Column | Type | Description | Example |
|--------|------|-------------|---------|
| id | BIGINT | Primary key | 10001 |
| store_id | INT | Store ID | 42 |
| sale_date | DATE | Sale date | 2024-03-15 |
| product_id | INT | Product ID | 5001 |
| product_name | VARCHAR | Product name | "Phone Case" |
| category | VARCHAR | Product category | "Phone Accessories" |
| quantity | INT | Units sold | 5 |
| unit_price | DECIMAL | Unit price (USD) | 9.99 |
| total_amount | DECIMAL | Sale amount (USD) | 49.95 |

### Table: stores (Store Information)
| Column | Type | Description | Example |
|--------|------|-------------|---------|
| id | INT | Store ID | 42 |
| store_name | VARCHAR | Store name | "NYC Times Square" |
| city | VARCHAR | City | "New York" |
| state | VARCHAR | State | "NY" |
| region | VARCHAR | Region (Northeast/South/Midwest/West) | "Northeast" |
| store_type | VARCHAR | Type (flagship/standard/small) | "flagship" |

### Key relationships
- daily_sales.store_id = stores.id

### Business rules
- "This month" = from 1st of current month to today
- "Last week" = the previous complete calendar week (Monday to Sunday)
- Revenue = SUM(total_amount)

Step 2: Dify Workflow configuration

Node 1: Schema context preparation
  → Dynamically fetch relevant table schemas

Node 2: SQL generation (LLM)
  Input: user question + schema context
  Output: SQL query

Node 3: SQL safety check (Code node)
  → Reject INSERT/UPDATE/DELETE/DROP operations
  → Check for SQL injection patterns

Node 4: SQL execution (HTTP node)
  → Call query API (never connect to DB directly from Dify)

Node 5: Result interpretation (LLM)
  Input: SQL query results
  Output: natural language interpretation

Node 6: Visualization suggestion (optional)
  → Recommend chart type based on data characteristics

Core SQL generation prompt:

You are a professional data analyst with expert knowledge of MySQL and business analytics.

## Database information:
{{schema_context}}

## Business rules:
- All monetary values are in USD
- "This month" = from the 1st of the current month to today
- "Last week" = the previous complete calendar week

## Task:
Convert the user's natural language question to a MySQL query.

## Rules:
1. Generate only SELECT statements — no INSERT/UPDATE/DELETE/DROP
2. For time ranges, use precise date functions:
   - This month: WHERE sale_date >= DATE_FORMAT(NOW(), '%Y-%m-01') AND sale_date <= NOW()
   - Last week: WHERE YEARWEEK(sale_date) = YEARWEEK(NOW()) - 1
3. Round monetary amounts to 2 decimal places: ROUND(value, 2)
4. Queries expected to return more than 1,000 rows must include a LIMIT clause
5. If the question is ambiguous, choose the most likely interpretation and note assumptions in comments

## User question:
{{user_question}}

Output only SQL (no explanations):
```sql
-- Intent: [describe your understanding of the query]
-- Assumptions: [list any assumptions made]
[SQL statement]


---

## Level 2: Mechanism Deep Dive (3–5 Years Experience)

### Complete NL2SQL Service Implementation

```python
import re
import json
import sqlparse
import mysql.connector
from typing import Optional

class NL2SQLService:
    
    def __init__(self, db_config: dict, dify_config: dict):
        self.db_config = db_config
        self.dify_config = dify_config
        self.schema_manager = SchemaManager(db_config)
    
    def process_query(self, user_question: str) -> dict:
        # Step 1: Smart schema selection
        relevant_tables = self._identify_relevant_tables(user_question)
        schema_context = self.schema_manager.get_schema_context(relevant_tables)
        
        # Step 2: Generate SQL via Dify
        sql = self._generate_sql(user_question, schema_context)
        if not sql:
            return {'error': 'SQL generation failed'}
        
        # Step 3: Safety check
        is_safe, safety_error = self._validate_sql_safety(sql)
        if not is_safe:
            return {'error': f'SQL safety check failed: {safety_error}'}
        
        # Step 4: Execute query
        result, execute_error = self._execute_sql(sql)
        if execute_error:
            corrected_sql = self._fix_sql(sql, execute_error, schema_context)
            if corrected_sql:
                result, execute_error = self._execute_sql(corrected_sql)
                sql = corrected_sql
        
        if execute_error:
            return {'sql': sql, 'error': f'Query execution failed: {execute_error}'}
        
        # Step 5: Interpret results
        interpretation = self._interpret_results(user_question, sql, result)
        
        # Step 6: Chart suggestion
        chart = self._suggest_chart(result)
        
        return {
            'sql': sql,
            'result': result,
            'interpretation': interpretation,
            'chart_suggestion': chart,
            'row_count': len(result.get('rows', [])),
        }
    
    def _validate_sql_safety(self, sql: str) -> tuple:
        sql_clean = re.sub(r'--.*$', '', sql, flags=re.MULTILINE)
        sql_clean = re.sub(r'/\*.*?\*/', '', sql_clean, flags=re.DOTALL)
        sql_clean = sql_clean.strip().upper()
        
        if not sql_clean.startswith('SELECT'):
            return False, f"Only SELECT statements are allowed; got: {sql_clean[:20]}"
        
        dangerous_keywords = [
            'INSERT', 'UPDATE', 'DELETE', 'DROP', 'CREATE', 'ALTER',
            'TRUNCATE', 'EXEC', 'EXECUTE', 'CALL', 'GRANT', 'REVOKE',
        ]
        for keyword in dangerous_keywords:
            if keyword in sql_clean:
                return False, f"Forbidden keyword detected: {keyword}"
        
        return True, None
    
    def _execute_sql(self, sql: str, max_rows: int = 1000) -> tuple:
        try:
            connection = mysql.connector.connect(
                host=self.db_config['host'],
                user=self.db_config['readonly_user'],
                password=self.db_config['readonly_password'],
                database=self.db_config['database'],
                connection_timeout=30
            )
            cursor = connection.cursor(dictionary=True)
            
            # Add LIMIT protection
            if 'LIMIT' not in sql.upper():
                sql = sql.rstrip(';') + f' LIMIT {max_rows}'
            
            cursor.execute(sql)
            rows = cursor.fetchall()
            rows = self._serialize_rows(rows)
            columns = [desc[0] for desc in cursor.description] if cursor.description else []
            
            return {'columns': columns, 'rows': rows, 'total_rows': len(rows)}, None
        
        except mysql.connector.Error as e:
            return None, str(e)
        finally:
            if connection and connection.is_connected():
                connection.close()
    
    def _suggest_chart(self, result: dict) -> dict:
        if not result or not result.get('rows'):
            return {'type': 'none', 'reason': 'No data'}
        
        columns = result['columns']
        rows = result['rows']
        row_count = len(rows)
        
        has_date_column = any(
            col.lower() in ['date', 'month', 'week', 'sale_date', 'created_at']
            for col in columns
        )
        has_numeric = sum(1 for col in columns 
                         if isinstance(rows[0].get(col), (int, float)))
        has_category = any(
            col.lower() in ['category', 'region', 'state', 'store_type', 'product_name']
            for col in columns
        )
        
        if has_date_column and has_numeric >= 1 and row_count > 5:
            return {'type': 'line', 'reason': 'Time series data — line chart shows trends best'}
        elif has_category and has_numeric >= 1 and row_count <= 20:
            return {'type': 'bar', 'reason': 'Category comparison — bar chart shows rankings clearly'}
        elif has_numeric == 1 and row_count <= 10 and has_category:
            return {'type': 'pie', 'reason': 'Share data — pie chart shows proportions intuitively'}
        else:
            return {'type': 'table', 'reason': 'Complex data — table shows full detail'}

Report Automation

import plotly.express as px
import base64

class ReportGenerator:
    
    def __init__(self, nl2sql_service: NL2SQLService):
        self.nl2sql = nl2sql_service
    
    def generate_weekly_report(self, week_start: str, week_end: str) -> dict:
        report_queries = [
            {'title': 'Overall Sales Summary', 
             'question': f'Total revenue, order count, and average order value from {week_start} to {week_end}'},
            {'title': 'Sales by Region',
             'question': f'Ranked regional sales from {week_start} to {week_end}'},
            {'title': 'Daily Revenue Trend',
             'question': f'Daily revenue trend from {week_start} to {week_end}'},
            {'title': 'Top 10 Best-Selling Products',
             'question': f'Top 10 products by units sold from {week_start} to {week_end}'},
            {'title': 'Category Revenue Share',
             'question': f'Revenue share by product category from {week_start} to {week_end}'},
        ]
        
        report_sections = []
        for query_config in report_queries:
            result = self.nl2sql.process_query(query_config['question'])
            if 'error' in result:
                report_sections.append({'title': query_config['title'], 'error': result['error']})
                continue
            
            chart = self._generate_chart(result['result'], result['chart_suggestion'])
            
            report_sections.append({
                'title': query_config['title'],
                'chart': chart,
                'data': result['result'],
                'analysis': result.get('interpretation', ''),
                'sql': result['sql']
            })
        
        executive_summary = self._generate_executive_summary(
            week_start, week_end, report_sections
        )
        
        return {
            'report_type': 'weekly',
            'period': f'{week_start} to {week_end}',
            'generated_at': datetime.now().isoformat(),
            'executive_summary': executive_summary,
            'sections': report_sections
        }
    
    def _generate_chart(self, data: dict, chart_suggestion: dict) -> Optional[str]:
        if not data or not data.get('rows'):
            return None
        
        rows = data['rows']
        columns = data['columns']
        chart_type = chart_suggestion.get('type', 'table')
        
        if chart_type == 'bar' and len(columns) >= 2:
            fig = px.bar(rows, x=columns[0], y=columns[1], title='')
        elif chart_type == 'line' and len(columns) >= 2:
            fig = px.line(rows, x=columns[0], y=columns[1], markers=True)
        elif chart_type == 'pie' and len(columns) >= 2:
            fig = px.pie(rows, names=columns[0], values=columns[1])
        else:
            return None
        
        img_bytes = fig.to_image(format='png', width=800, height=400, scale=2)
        return base64.b64encode(img_bytes).decode('utf-8')
    
    def _generate_executive_summary(self, week_start, week_end, sections) -> str:
        summary_data = {
            s['title']: s.get('analysis', '')
            for s in sections if 'error' not in s
        }
        
        prompt = f"""
You are a senior retail data analyst. Based on the following data, write a concise 
200-word executive summary for the period {week_start} to {week_end}.

Data summaries:
{json.dumps(summary_data, indent=2)}

Requirements:
1. Highlight the 3-5 most important business insights
2. Clearly indicate which metrics exceeded expectations and which need attention
3. Provide 2-3 actionable recommendations
4. Keep language concise and professional

Executive Summary:
"""
        return self._call_llm(prompt)
    
    def export_to_markdown(self, report: dict) -> str:
        md = f"# Weekly Report: {report['period']}\n\n"
        md += f"**Generated**: {report['generated_at']}\n\n"
        md += "## Executive Summary\n\n"
        md += report['executive_summary'] + "\n\n---\n\n"
        
        for section in report['sections']:
            md += f"## {section['title']}\n\n"
            if 'error' in section:
                md += f"Data unavailable: {section['error']}\n\n"
                continue
            md += section.get('analysis', '') + "\n\n"
        
        return md

Level 3: Source Code and Architecture (5+ Years)

Improving NL2SQL Accuracy

Pure LLM SQL generation achieves approximately 70–80% accuracy. These techniques push it to 90%+:

Technique 1: Few-Shot Examples

FEW_SHOT_EXAMPLES = [
    {
        "question": "What were last week's sales by region?",
        "sql": """
SELECT 
    s.region,
    ROUND(SUM(ds.total_amount), 2) AS revenue,
    COUNT(*) AS order_count
FROM daily_sales ds
JOIN stores s ON ds.store_id = s.id
WHERE YEARWEEK(ds.sale_date) = YEARWEEK(NOW()) - 1
GROUP BY s.region
ORDER BY revenue DESC;
"""
    },
    {
        "question": "Top 5 products by units sold this month",
        "sql": """
SELECT 
    product_name,
    SUM(quantity) AS units_sold,
    ROUND(SUM(total_amount), 2) AS revenue
FROM daily_sales
WHERE sale_date >= DATE_FORMAT(NOW(), '%Y-%m-01')
GROUP BY product_id, product_name
ORDER BY units_sold DESC
LIMIT 5;
"""
    },
]

Technique 2: Chain-of-Thought prompting

COT_PROMPT_SUFFIX = """
Before generating the SQL, analyze:
1. Which data dimensions does this question involve? (time/geography/product/store)
2. Which metrics need aggregation? (SUM/COUNT/AVG)
3. Which tables need to be joined?
4. How should the time range be precisely expressed?

After analysis, output the SQL:
"""

Technique 3: Self-consistency validation

async def generate_sql_with_self_check(question: str, schema: str) -> str:
    sql = await generate_sql(question, schema)
    
    check_prompt = f"""
Review whether the following SQL correctly answers the user's question:

User question: {question}
SQL: {sql}

Check:
1. Does the SQL logic match the question's intent?
2. Is the time range correctly expressed?
3. Is the aggregation correct?
4. Are there any syntax errors?

If there are issues, provide corrected SQL. If correct, output "CORRECT":
"""
    
    check_result = await call_llm(check_prompt)
    
    if check_result.strip().upper() != 'CORRECT':
        corrected = extract_sql_from_text(check_result)
        return corrected or sql
    
    return sql

Multi-turn Conversational NL2SQL

class ConversationalNL2SQL:
    def __init__(self, nl2sql_service: NL2SQLService):
        self.nl2sql = nl2sql_service
        self.conversation_history = []
    
    def process(self, user_question: str) -> dict:
        if self.conversation_history:
            contextual_question = self._rewrite_with_context(
                user_question, self.conversation_history
            )
        else:
            contextual_question = user_question
        
        result = self.nl2sql.process_query(contextual_question)
        
        self.conversation_history.append({
            'user': user_question,
            'sql': result.get('sql'),
            'summary': result.get('interpretation', '')[:200]
        })
        
        if len(self.conversation_history) > 5:
            self.conversation_history = self.conversation_history[-5:]
        
        return result
    
    def _rewrite_with_context(self, question: str, history: list) -> str:
        context = "\n".join([
            f"Q: {h['user']}\nSummary: {h['summary']}"
            for h in history[-3:]
        ])
        
        rewrite_prompt = f"""
Based on the following conversation history, rewrite the follow-up question 
as a complete standalone question:

History:
{context}

Follow-up question: {question}

Rewritten standalone question (output question text only):
"""
        return self._call_llm(rewrite_prompt)

Level 4: Production Traps and Decisions (Expert Perspective)

Trap 1: LLM Generates Dangerous Full Table Scans

Symptom: User asks "which products have never been sold?" and the LLM generates an unconstrained full table scan on a 200-million-row table — crashing the database.

Prevention:

Query protection wrapper:

def add_protection(sql: str, max_rows: int = 1000, timeout_ms: int = 30000) -> str:
    protected = f"/*+ MAX_EXECUTION_TIME({timeout_ms}) */ {sql}"
    if 'LIMIT' not in sql.upper():
        protected = protected.rstrip(';') + f' LIMIT {max_rows}'
    return protected

Always route NL2SQL queries to the read replica, never the primary:

connection = mysql.connector.connect(
    host=REPLICA_HOST,  # Read replica
    user='readonly_user',
    ...
)

Execution timeout with query kill:

import threading

def execute_with_timeout(cursor, sql: str, timeout: int = 30) -> list:
    result, error = [], []
    def target():
        try:
            cursor.execute(sql)
            result.extend(cursor.fetchall())
        except Exception as e:
            error.append(str(e))
    
    thread = threading.Thread(target=target)
    thread.start()
    thread.join(timeout=timeout)
    
    if thread.is_alive():
        cursor.execute("KILL QUERY " + str(cursor.connection.connection_id))
        raise TimeoutError(f"Query timed out after {timeout}s")
    
    if error:
        raise Exception(error[0])
    return result

Trap 2: Schema Exposure Risk

NL2SQL requires sending table structures to the LLM, potentially exposing sensitive business data schema.

Mitigations:

Sanitize schema descriptions — remove sensitive column comments
Use Azure OpenAI or a privately deployed LLM — data stays on-premises
Send only relevant table schemas (smart schema selection), never the full database schema

Trap 3: Hallucinated Interpretations

User asks "What were Beijing store sales?" The database returns ¥1,234,567. The LLM may interpret: "Beijing store monthly sales reached ¥1.23M, up 15% year-over-year" — but "up 15% YoY" was fabricated!

Fix: Strictly constrain interpretation to only the actual returned data:

def interpret_results_safely(question: str, sql: str, result: dict) -> str:
    data_json = json.dumps(result['rows'][:20], indent=2)
    
    prompt = f"""
User question: {question}
SQL executed: {sql}
Query results: {data_json}

Interpret the query results in natural language.

Strict rules:
1. Only describe what is present in the query results
2. Never infer or calculate metrics not present in the data (no YoY, MoM)
3. If there is no comparison data, do not mention "growth" or "decline"
4. If results are empty, say "No data matching the conditions was found"

Interpretation:
"""
    return call_llm(prompt)

Real Business Outcomes

After 3 months of the retail chain's data analysis assistant:

Metric	Before	After	Improvement
Daily queries handled by data team	150	35	-77%
Business user wait time for data	avg. 4 hours	< 2 minutes	-97%
Weekly report preparation time	8 hours	45 minutes	-91%
Analyst time on high-value analysis	30%	70%	+133%
NL2SQL accuracy (correct executable SQL)	—	91%	—
User satisfaction with self-service	—	4.3/5	—

Chapter Summary

Core technical takeaways for data analysis assistants:

Schema context is the foundation of NL2SQL: LLMs know nothing about your business data structure. A carefully designed Schema document with examples, business rules, and relationship explanations is the single biggest factor in accuracy.
Safety is non-negotiable: Read-only accounts, SQL allowlist (SELECT only), execution timeouts, and row limits — all four safeguards are essential.
Few-Shot + CoT pushes accuracy from 75% to 90%+: Few-shot examples and chain-of-thought prompting are practical, high-ROI improvements worth implementing before anything else.
Report automation = predefined queries + LLM insights: Weekly and monthly reports shouldn't be "let AI do whatever" — they're structured around predefined key metrics queries, with LLM generating the narrative and executive summary.
Result interpretation must be strictly constrained: LLMs will confidently fabricate YoY/MoM comparisons if not restricted. In business decision-making contexts, this kind of hallucination is extremely dangerous.
Multi-turn dialogue requires question rewriting: Follow-up questions must be rewritten into complete standalone questions before entering the NL2SQL pipeline — this is the correct way to ensure context consistency.

NL2SQL accuracy improvement roadmap:

Stage	Techniques	Expected Accuracy
Basic	LLM + Schema	70–75%
Improved	+ Few-Shot + CoT	85–90%
Production	+ Self-validation + Auto-correction	90–93%
Expert	+ Vector schema retrieval + Fine-tuning	95%+

Rate this chapter

4.6 / 5 (5 ratings)