Text Processing — Let AI Split, Merge, and Clean Your Data
Chapter 5: Text Functions
Real-world data is rarely clean. Customer records exported from systems carry trailing spaces, ID numbers are buried in descriptions, phone numbers need masking, and dozens of columns need joining into a single sentence. Text functions are the specialized tools for taming this messy data — and combined with AI, even complex extraction logic translates to formulas quickly.
Where Text Functions Are Used
Text functions are essential in these high-frequency workplace scenarios:
- ID number extraction — pull birthdate, gender, or province from an 18-digit national ID
- Phone number masking — convert 13812345678 to 138****5678 for privacy compliance
- Name / address splitting — break "John Smith | Sales | New York" into separate columns
- Data cleaning — strip trailing spaces, remove special characters, normalize case
- Field concatenation — combine name + department + title into a display string
- Format conversion — turn numbers into currency text, dates into custom string formats
Text functions fall into four categories: Extract, Clean, Concat, Convert.
ℹ️ Note
Why AI excels here: You can give AI actual sample data ("input is '123 Main St, Suite 400, New York, NY 10001' — I want to extract the state code") and it immediately understands the intent and writes the matching formula.
Extract Functions: LEFT / RIGHT / MID / FIND / SEARCH
| Function | Purpose | Syntax | Notes |
|---|---|---|---|
| LEFT | Extract N chars from left | =LEFT(text, n) | Fixed from start |
| RIGHT | Extract N chars from right | =RIGHT(text, n) | Fixed from end |
| MID | Extract from middle | =MID(text, start, n) | Specify start position + length |
| FIND | Find character position (case-sensitive) | =FIND(find, text, [start]) | Returns position number |
| SEARCH | Find position (case-insensitive) | =SEARCH(find, text, [start]) | Supports wildcards * and ? |
6 Extraction Cases
Extract Case 1: Birthdate from National ID (MID) Column A has 18-digit national IDs. Extract birth year (B), month (C), and full date string (D).
Extract birthdate from ID number.
Extract Case 2: Username from Email (LEFT + FIND) A2 = "[email protected]" → B2 should extract "jsmith" (everything before @).
Extract Case 3: Extract Content Inside Brackets (MID + FIND) A2 = "Annual Report (2024 Edition)" → B2 should extract "2024 Edition".
Extract Case 4: Split Product Code by Delimiter A2 = "ELE-001-2024" (category-sequence-year). Extract each segment.
Extract Case 5: Let AI Write the Complex Formula When data format is irregular, give AI real samples instead of describing the format.
Clean Functions: TRIM / CLEAN / SUBSTITUTE / REPLACE
| Function | Purpose | Syntax | Typical Use |
|---|---|---|---|
| TRIM | Remove extra spaces (leading, trailing, double-spaces) | =TRIM(text) | System export cleanup |
| CLEAN | Remove non-printable characters (newlines, tabs) | =CLEAN(text) | PDF or web paste cleanup |
| SUBSTITUTE | Replace text by content match | =SUBSTITUTE(text, old, new, [nth]) | Bulk keyword replacement |
| REPLACE | Replace by position | =REPLACE(text, start, n_chars, new) | Phone number masking |
Clean Case 1: Remove Extra Spaces (TRIM)
Clean Case 2: Bulk Keyword Replacement (SUBSTITUTE)
Clean Case 3: Phone Number Masking (REPLACE)
Clean Case 4: Remove Hidden Characters (CLEAN)
Concat Functions: CONCATENATE / CONCAT / TEXTJOIN / &
| Method | Syntax | Key Feature | Best For |
|---|---|---|---|
| & operator | ="Hello "&A2&"!" | Simplest, one by one | A few fields |
| CONCAT | =CONCAT(A2:C2) | Accepts ranges, no separator | Adjacent columns |
| TEXTJOIN | =TEXTJOIN(",",TRUE,range) | Custom separator, ignores blanks | Multi-row or multi-column joins |
| CONCATENATE | =CONCATENATE(A2,B2) | Legacy function | Older Excel compatibility |
Concat Case 1: Name + Department Display String
Concat Case 2: Date Format Reassembly
Concat Case 3: Join Multiple Rows into One Cell (TEXTJOIN)
Concat Case 4: Generate SQL Statements
Convert Functions: UPPER / LOWER / PROPER / TEXT / VALUE
| Function | Purpose | Example |
|---|---|---|
| UPPER | Convert to uppercase | =UPPER("hello") → "HELLO" |
| LOWER | Convert to lowercase | =LOWER("HELLO") → "hello" |
| PROPER | Capitalize each word | =PROPER("hello world") → "Hello World" |
| TEXT | Format number/date as string | =TEXT(1234.5,"#,##0.00") → "1,234.50" |
| VALUE | Convert text number to numeric value | =VALUE("1,234") → 1234 |
Convert Case 1: Normalize English Case (PROPER)
Convert Case 2: Number to Currency Text (TEXT)
Convert Case 3: Date to Custom String Format
Convert Case 4: Text Number to Numeric Value (VALUE)
LEN & LENB: Character Count
LEN(text) → character count (each character = 1)
LENB(text) → byte count (ASCII = 1 byte, CJK = 2 bytes)
Detect if text contains multi-byte characters (e.g., CJK):
=LENB(A2) > LEN(A2) ← TRUE = contains double-byte characters
Check if phone number is exactly 10 digits:
=IF(LEN(A2)=10, "Valid", "Invalid — got "&LEN(A2)&" digits")
Enforce character limit (flag text over 280 chars):
=IF(LEN(A2)>280, "Over by "&(LEN(A2)-280), "OK")
Comprehensive Case: Cleaning a Messy Customer Table
In practice, text functions are rarely used alone — they stack together. Here is a full workflow from chaotic data to structured output.
Text Function Cheat Sheet
| Category | Function | Core Use | Quick Example |
|---|---|---|---|
| Extract | LEFT | Take N chars from left | =LEFT(A2, 3) |
| RIGHT | Take N chars from right | =RIGHT(A2, 4) | |
| MID | Take N chars from position | =MID(A2, 7, 4) | |
| FIND | Find char position (case-sensitive) | =FIND("@", A2) | |
| SEARCH | Find position (wildcard support) | =SEARCH("abc", A2) | |
| Clean | TRIM | Remove extra spaces | =TRIM(A2) |
| CLEAN | Remove non-printable chars | =CLEAN(A2) | |
| SUBSTITUTE | Replace by content | =SUBSTITUTE(A2,"old","new") | |
| REPLACE | Replace by position | =REPLACE(A2,4,4,"****") | |
| Concat | & operator | Join strings | =A2&" - "&B2 |
| CONCAT | Join a range (no separator) | =CONCAT(A2:C2) | |
| TEXTJOIN | Join with separator, skip blanks | =TEXTJOIN(", ",TRUE,A2:A10) | |
| Convert | UPPER/LOWER | Change English case | =UPPER(A2) |
| PROPER | Capitalize each word | =PROPER(A2) | |
| TEXT | Number/date to formatted string | =TEXT(A2,"$#,##0.00") | |
| VALUE | Text number to numeric | =VALUE(A2) | |
| LEN / LENB | Count characters / bytes | =LEN(A2) |
✅ Tip
Chapter Summary:
For fixed-position extraction use LEFT/RIGHT/MID directly. For delimiter-based extraction, combine FIND/SEARCH with MID/LEFT.
Standard cleaning combo: =TRIM(CLEAN(A2)) — handles most space and hidden character issues.
For joining: use & for a few fields, CONCAT for adjacent ranges, TEXTJOIN when you need a separator or want to ignore blanks.
TEXT converts numbers to display strings (loses numeric value). VALUE converts text strings back to numbers (restores numeric value).
For complex formats, paste 3–5 rows of real data + expected output into AI — faster than describing the format abstractly.
Previous ← Chapter 4: Conditional Functions Next Chapter 6: Date & Time Functions →