โ† Back to Blog

CSV Text Manipulation Guide

2026-04-16 ยท 5 min read

โ† Back to Blog

CSV Text Manipulation Guide

ยท 5 min read

CSV Format Basics

CSV (Comma-Separated Values) is the most universal tabular data text format. Each line represents one record; fields are separated by commas; the first line is typically a header row. When a field value itself contains a comma or line break, enclose it in double quotes (commas inside quotes are not treated as separators). The CSV format is simple but has many edge-case pitfalls, and different tools handle them differently.

Common CSV Issues

Processing CSV with Command Line

# ๆŸฅ็œ‹ CSV ๆ–‡ไปถๅ‰ 10 ่กŒ
head -n 10 data.csv

# ๆŒ‰็ฌฌไบŒๅˆ—ๆŽ’ๅบ๏ผˆๆ•ฐๅญ—ๆŽ’ๅบ๏ผŒ้€—ๅทๅˆ†้š”๏ผ‰
sort -t',' -k2 -n data.csv

# ๆๅ–็ฌฌไธ€ๅˆ—ๅ’Œ็ฌฌไธ‰ๅˆ—
cut -d',' -f1,3 data.csv

# ็ปŸ่ฎก่กŒๆ•ฐ๏ผˆๅ‡ๅŽปๆ ‡้ข˜่กŒ๏ผ‰
wc -l data.csv

# ไฝฟ็”จ csvkit๏ผˆ้œ€ๅฎ‰่ฃ…๏ผ‰
# ๆŒ‰ๅˆ—ๅๆŽ’ๅบ
csvsort -c "column_name" data.csv

# ๆŸฅ็œ‹็‰นๅฎšๅˆ—
csvcut -c "name,email" data.csv | csvlook

Python pandas for CSV Processing

import pandas as pd

# ่ฏปๅ– CSV๏ผˆๅค„็†็ผ–็ ๅ’Œ BOM๏ผ‰
df = pd.read_csv('data.csv', encoding='utf-8-sig')

# ๅŸบๆœฌๆธ…็†
df.columns = df.columns.str.strip()  # ๅŽป้™คๅˆ—ๅ็ฉบ็™ฝ
df = df.dropna(how='all')  # ๅˆ ้™คๅ…จ็ฉบ่กŒ
df = df.drop_duplicates()  # ๅˆ ้™ค้‡ๅค่กŒ

# ๆธ…็†ๆ–‡ๆœฌๅˆ—
df['name'] = df['name'].str.strip().str.title()
df['email'] = df['email'].str.strip().str.lower()

# ๆŒ‰ๅˆ—ๆŽ’ๅบ
df_sorted = df.sort_values('date', ascending=False)

# ๅฏผๅ‡บ๏ผˆUTF-8 ๅธฆ BOM ไพ› Excel ๆญฃ็กฎๆ˜พ็คบ๏ผ‰
df.to_csv('output.csv', index=False, encoding='utf-8-sig')

Converting Between CSV and Other Formats

CSV frequently needs to be converted to and from other formats: CSV to JSON (for API data exchange); CSV to Excel (preserving formatting and formulas); CSV to SQL INSERT statements (for database import); CSV to Markdown table (for documentation); and converting between TSV (tab-separated) and CSV. Online format converters or Python pandas can efficiently complete these conversions.

CSV Considerations for Excel Users

Excel users working with CSV should note: double-clicking to open a CSV may cause Excel to interpret purely numeric fields as numbers (postal code "01234" becomes number 1234, losing the leading zero); import through "Data Import" rather than double-clicking to specify data types for each column in the import wizard; when saving, select "CSV (UTF-8)" rather than regular "CSV" to avoid Chinese garbling; Excel's CSV export may handle fields containing line breaks incorrectly.

Strategies for Large CSV Files

For large CSV files exceeding memory capacity (GB scale), direct loading fails. Processing strategies: use pandas chunked reading (chunksize parameter) to process in blocks; use the Polars library (more memory-efficient than pandas, supports lazy loading); use DuckDB to query CSV files as virtual database tables; use command-line tools (awk, sort) for stream processing; consider importing CSV data into a real database for querying.

Try the online tool now โ€” no installation, completely free.

Open Tool โ†’

Try the free tool now

Use Free Tool โ†’