โ† Back to Blog

How to Convert Markdown to Plain Text

2026-04-18 ยท 5 min read

Why Convert Markdown to Plain Text

Markdown is an excellent lightweight markup language, but in certain scenarios, Markdown-formatted text needs to be converted to clean plain text: copying document content to systems that do not support Markdown (old CRM, ERP); preparing text content for text-to-speech (TTS); extracting article body for SEO analysis or word counting; as an intermediate step when exporting to PDF; providing plain text versions for non-technical users.

Markdown Elements to Remove

Python Code Implementation

import re

def markdown_to_plain(text):
    # ๅˆ ้™คๆ ‡้ข˜ #
    text = re.sub(r'^#{1,6}\s+', '', text, flags=re.MULTILINE)

    # ๅˆ ้™ค็ฒ—ไฝ“ๅ’Œๆ–œไฝ“ๆ ‡่ฎฐ๏ผˆไฟ็•™ๆ–‡ๅญ—๏ผ‰
    text = re.sub(r'\*\*(.+?)\*\*', r'\1', text)  # **็ฒ—ไฝ“**
    text = re.sub(r'\*(.+?)\*', r'\1', text)       # *ๆ–œไฝ“*
    text = re.sub(r'__(.+?)__', r'\1', text)       # __็ฒ—ไฝ“__
    text = re.sub(r'_(.+?)_', r'\1', text)         # _ๆ–œไฝ“_

    # ๅค„็†้“พๆŽฅ๏ผš[ๆ–‡ๅญ—](URL) -> ๆ–‡ๅญ—
    text = re.sub(r'\[(.+?)\]\(.+?\)', r'\1', text)

    # ๅˆ ้™คๅ›พ็‰‡๏ผš![alt](URL) -> alt ๆˆ–ๅฎŒๅ…จๅˆ ้™ค
    text = re.sub(r'!\[(.+?)\]\(.+?\)', r'\1', text)

    # ๅˆ ้™คไปฃ็ ๅ—ๅๅผ•ๅท
    text = re.sub(r'```[\w]*\n?', '', text)
    text = re.sub(r'`(.+?)`', r'\1', text)

    # ๅˆ ้™คๅผ•็”จๅ— >
    text = re.sub(r'^>\s?', '', text, flags=re.MULTILINE)

    # ๅˆ ้™คๆฐดๅนณๅˆ†ๅ‰ฒ็บฟ
    text = re.sub(r'^[-*]{3,}\s*$', '', text, flags=re.MULTILINE)

    # ๅˆ ้™คๅˆ—่กจๆ ‡่ฎฐ๏ผˆไฟ็•™ๅ†…ๅฎน๏ผ‰
    text = re.sub(r'^[\*\-\+]\s+', '', text, flags=re.MULTILINE)
    text = re.sub(r'^\d+\.\s+', '', text, flags=re.MULTILINE)

    return text.strip()

Using Existing Libraries for Conversion

Writing regular expressions to handle all Markdown syntax edge cases is complex. Using existing libraries is recommended: Python's markdown library (convert to HTML first, then use BeautifulSoup to extract plain text); or directly use the markdownify reverse library; the remark toolchain in JavaScript. These libraries correctly handle nested structures, escaped characters, and various Markdown dialects (CommonMark, GFM, etc.).

Pandoc Command-Line Conversion

# ไฝฟ็”จ Pandoc ๅฐ† Markdown ่ฝฌไธบ็บฏๆ–‡ๆœฌ
pandoc input.md -t plain -o output.txt

# ่ฝฌไธบ HTML๏ผˆไธญ้—ดๆญฅ้ชค๏ผ‰
pandoc input.md -t html -o output.html

# ๆ‰น้‡่ฝฌๆข็›ฎๅฝ•ไธญ็š„ๆ‰€ๆœ‰ .md ๆ–‡ไปถ
for f in *.md; do pandoc "$f" -t plain -o "${f%.md}.txt"; done

Preserving Structure vs. Complete Removal

Depending on the target use, choose different conversion strategies: completely remove all formatting (for TTS, simple text analysis); preserve basic structure (headings separated by blank lines, lists with indentation โ€” for plain text requiring higher readability); convert to reStructuredText or another lightweight markup language (for scenarios that need to preserve structure but switch format).

Reverse Operation: Plain Text to Markdown

Sometimes existing plain text content needs to be converted to Markdown format (like migrating old documents to a blog system). Automatic conversion is hard to perfect, but can be semi-automated: use scripts to infer paragraph type from first line format as headings, convert line-leading hyphens to lists, infer all-caps words as emphasis. Manual review is still needed at the end โ€” automation only reduces the workload.

Try the free tool now

Use Free Tool โ†’