Chapter 10

Power Query — AI-Driven Data Cleaning Pipeline

Chapter 10: Power Query Data Cleaning

If you've ever spent hours manually merging monthly Excel files, hand-deleting junk rows, or writing nested SUBSTITUTE formulas to strip unwanted characters — Power Query is about to change your life. It's Excel's built-in data transformation engine: you configure the cleaning steps once, and every future update takes one click to refresh.

What Is Power Query

Power Query (built into Excel 2016+, available as an add-in for 2010/2013) records every transformation step you apply to data. When the source data updates, clicking Refresh All re-runs all those steps automatically.

The core value proposition: convert one-time manual labor into a repeatable automated pipeline.

❌ Traditional Approach

Open each of 12 monthly files individually

Copy-paste into master sheet, fix formatting

Manually delete subtotal rows, merged cells, blank rows

Fix date/number formats with formulas

Repeat every month...

Time cost: 3–5 hours/month

✅ Power Query Approach

First time: set up data source and cleaning steps (~45 min)

Every subsequent month: drop new file in folder, click Refresh All

Wait 10–30 seconds for everything to update

Go get coffee

**After setup:

Access: Data tab → Get Data → choose source type. Or click "From Table/Range" to load current sheet data into the Power Query Editor.

Getting Started: Interface & Loading Data

The Power Query Editor has four key areas:

✅ Tip

Key insight: Every operation becomes a reversible step. You can delete, reorder, or insert steps at any point — nothing is permanent until you choose to load it to the worksheet.

6 Core Data Cleaning Operations

Case 1: Remove Duplicate Rows

❌ Old way: Sort then manually scan, or use COUNTIF + filter + delete ✅ Power Query: Home → Remove Rows → Remove Duplicates

Select specific columns first to deduplicate by those columns only (other columns may differ).
    No column selection = deduplicate on all columns (exact row match).

Time saved: 10,000 rows manually ≈ 30 min. Power Query ≈ 5 seconds. Future refreshes handle new duplicates automatically.

Case 2: Split Column (Name + Phone Combined → Two Columns)

❌ Old way: LEFT/FIND/MID formula combination, tedious to write and test ✅ Power Query: Transform → Split Column → By Delimiter

Enter the delimiter (e.g., "-"), choose "Each occurrence" or "Left/right-most".
    Other split options: by number of characters, by lowercase-to-uppercase transition, by digit-to-letter boundary.

Time saved: Writing and debugging FIND+MID formulas ≈ 15–30 min. Power Query split ≈ 30 seconds.

Case 3: Fill Down Empty Cells (Unmerge & Fill)

Scene: ERP export with merged department cells — only the first row per department has a value

Right-click the column → Fill → Down
    Each null is replaced with the nearest non-null value above it. Equivalent to "unmerge all and fill." Fill Up also available.

Time saved: Manual fill on 100 merged-cell rows ≈ 10 min. Power Query ≈ 3 seconds, reusable on every refresh.

Case 4: Replace & Clean (Remove Units Like "$", ",", "元")

Scene: Amount column has "$12,345.00" — needs to become the number 12345 1Select column → Transform → Replace Values → find "$", replace with nothing 2Replace Values again → find ",", replace with nothing 3Click type icon on column header → change to Decimal Number

Each replacement is a separate reversible step.
    Also useful: Transform → Format → Trim (remove leading/trailing spaces), Clean (remove non-printable characters).

Case 5: Data Type Conversion (Text → Date → Number)

Click the type icon (ABC / 123 / calendar) in any column header to change its type

Power Query enforces strict typing — set types early in your step sequence.
    Common problems:
    - "20260425" as text → parse with Date.FromText() or convert via integer first
    - "1,234.56" with thousand separator → replace comma first, then convert to decimal
    - Excel date serial number (e.g., 46000) → select Date type directly

Case 6: Conditional Column (IF Logic Without Writing Formulas)

❌ Old way: =IF(E2>10000,"Large",IF(E2>5000,"Medium","Small")) nested formula ✅ Power Query: Add Column → Conditional Column

Set the new column name, add IF/ELSE IF conditions with dropdowns:
    Order Amount > 10000 → "Large Customer"
    Order Amount > 5000 → "Medium Customer"
    Else → "Small Customer"
    Visual configuration — logic is immediately readable, no formula debugging needed.

Time saved: Writing and testing nested IF ≈ 15 min. Power Query conditional column ≈ 2 minutes.

AI Case: Describe Messy Data, Get a Cleaning Plan

I have a CSV exported from our ERP with these data quality issues:
1. Amount column shows "¥12,345.00" — need plain number 12345
2. Supplier column has blanks that should be filled from the row above (merged cell export problem)
3. Date column shows "April 25, 2026" (long English format) — need standard Excel date
4. Notes column has newline characters (\n) from the system — replace with spaces
5. Duplicate order rows exist (same order number) — keep only the row with the highest amount

Please give me the Power Query steps to fix all 5 issues, in order. Specify which menu to use for each, and flag any steps that require M language code.

Combining Tables: Power Query's Killer Feature

Append Queries — Stack Tables Vertically

Scenario: 12 monthly sales files with identical structure → one annual table.

1Load each query (or use Folder method below) 2Home → Append Queries → Three or more tables → select all monthly queries

Result: all rows stacked vertically. Column matching is by name — mismatched column names produce null values, so standardize headers first.

Merge Queries — Join Tables Horizontally (VLOOKUP-Style)

Scenario: Orders table has Product ID; Products table has name, category, cost price. Bring product details into orders.

1In the Orders query: Home → Merge Queries 2Select Products query, match columns (Orders.ProductID = Products.ProductID) 3Join kind: Left Outer (keep all order rows, fill null where no product match) 4Expand the merged column — select which product fields to bring in

Join types: Left Outer (most common, like VLOOKUP) / Inner (only matching rows) / Full Outer (all rows from both).
  Advantages over VLOOKUP: handles large data, supports multi-column keys, unaffected by column position changes.

Folder-Based Batch Combine — The Game Changer

1Data → Get Data → From File → From Folder → select the folder containing all monthly files 2Power Query shows all files — click "Combine & Transform Data" 3Choose sample file (Power Query learns the structure), select the worksheet to load 4Power Query applies identical transformations to every file and combines them. A "Source.Name" column tracks which file each row came from. 5Continue cleaning in the editor, then Close & Load to worksheet

✅ Tip

Next month: Drop the new file into that folder. Back in Excel, click Refresh All. Done in under 30 seconds. This is the core value of Power Query.

5 Multi-Table Cases (Summary)

Case Approach Time Savings
Merge 12 monthly sales files Folder combine → extract month from filename 2 hr → 30 sec/month after setup
Order table + product info Merge Query (Left Outer on ProductID) Replaces VLOOKUP, handles 100K+ rows faster
Multi-store daily reports Folder combine, use Source.Name for store ID 10 stores × 1 hr/day → 1 min/day
Weekly auto-refresh Set refresh interval in query properties Eliminates manual re-pull entirely
Design complex merge logic AI consultation (see prompt below) Saves planning time + avoids common pitfalls
I need to combine 4 data sources in Excel with Power Query to build an annual sales analysis dataset:

Source 1: Monthly sales detail (12 files, same format) — Order ID, Date, Product ID, Qty, Unit Price, Customer ID, Sales Rep ID
Source 2: Product table (1 file, ~500 rows) — Product ID, Name, Category, Cost, Supplier
Source 3: Customer table (1 file, ~1000 rows) — Customer ID, Name, City, Tier
Source 4: Employee table (1 file, ~50 rows) — Rep ID, Name, Region, Start Date

Goal: one wide table for full-year analysis by product / customer / region / sales rep.

Please design: 1) the recommended merge order and method; 2) key steps; 3) potential problems to watch for and how to prevent them.

M Language Basics

Every GUI operation generates M code behind the scenes. View it: View → Advanced Editor. You don't need to write M manually — but knowing it exists lets you leverage AI to handle cases the UI can't cover.

3 AI-Generated M Language Cases

In Power Query, I combined files from a folder. The Source.Name column has filenames like "2026-03-sales-data.xlsx". I need a new column "YearMonth" containing just "2026-03". Give me the M formula for a custom column.
I want my Power Query filter to automatically keep only the last 30 days of data on every refresh, rather than a fixed date. The UI date filter can't do dynamic dates. Give me the M code for a filter step.
My date column has values like "Apr 25, 2026" (English month abbreviation). Power Query won't auto-convert this to a date type. Give me the M code to convert it properly.

Comprehensive Case: Annual Sales Auto-Aggregation System

Build a fully automated reporting pipeline using everything in this chapter.

Query Source Output
Sales Detail Folder (12 monthly files) Clean combined annual data, load to sheet
Products Products Excel file Connection only (no sheet output)
Customers Customers Excel file Connection only
Sales Wide Sales Detail + merge Products + merge Customers Full wide table for pivot/analysis, load to sheet

✅ Tip

First-time setup: 2–4 hours (including testing and adjustment)

Monthly maintenance: ~2 minutes (drop file in folder + click refresh + wait)

Annual time savings: conservatively 20–40 hours

Power Query Cheat Sheet

Task Location Action
Load Excel file Data → Get Data From File → From Workbook
Load folder (batch) Data → Get Data From File → From Folder
Remove duplicate rows Home Remove Rows → Remove Duplicates
Remove blank rows Home Remove Rows → Remove Blank Rows
Split column Transform Split Column → By Delimiter or By Characters
Fill down blanks Right-click column Fill → Down
Replace values Transform / right-click Replace Values
Trim whitespace Transform → Format Trim
Change column type Click type icon in header Select target type
Add conditional column Add Column Conditional Column
Add custom column (M) Add Column Custom Column, enter M formula
Stack tables vertically Home Append Queries
Join tables horizontally Home Merge Queries, choose join type
Refresh all data Data tab Refresh All
View M code View Advanced Editor
Set auto-refresh Right-click query → Properties Set refresh interval

Previous ← Chapter 9: Conditional Formatting Next Chapter 11: Multi-Sheet Operations →

Rate this chapter
4.5  / 5  (31 ratings)

💬 Comments