How to Convert Markdown to Plain Text

2026-04-18 · 5 min read

Why Convert Markdown to Plain Text

Markdown is an excellent lightweight markup language, but in certain scenarios, Markdown-formatted text needs to be converted to clean plain text: copying document content to systems that do not support Markdown (old CRM, ERP); preparing text content for text-to-speech (TTS); extracting article body for SEO analysis or word counting; as an intermediate step when exporting to PDF; providing plain text versions for non-technical users.

Markdown Elements to Remove

Heading markers: # H1, ## H2, ### H3, etc. (remove # symbols, keep text)
Bold and italic: bold, italic, bold (remove * and _ symbols)
Links: link text (typically keep link text, delete URL and brackets)
Images: ![alt text](image URL) (keep alt text or remove entirely)
Code blocks: inline code and code fences with backticks (keep code content, remove backticks)
List markers: - and * lists, numbered lists (remove markers or convert to plain text)
Blockquotes: > quote (remove > symbols)
Horizontal rules: --- or *** (delete or replace with blank line)

Python Code Implementation

import re

def markdown_to_plain(text):
    # 删除标题 #
    text = re.sub(r'^#{1,6}\s+', '', text, flags=re.MULTILINE)

    # 删除粗体和斜体标记（保留文字）
    text = re.sub(r'\*\*(.+?)\*\*', r'\1', text)  # **粗体**
    text = re.sub(r'\*(.+?)\*', r'\1', text)       # *斜体*
    text = re.sub(r'__(.+?)__', r'\1', text)       # __粗体__
    text = re.sub(r'_(.+?)_', r'\1', text)         # _斜体_

    # 处理链接：[文字](URL) -> 文字
    text = re.sub(r'\[(.+?)\]\(.+?\)', r'\1', text)

    # 删除图片：![alt](URL) -> alt 或完全删除
    text = re.sub(r'!\[(.+?)\]\(.+?\)', r'\1', text)

    # 删除代码块反引号
    text = re.sub(r'```[\w]*\n?', '', text)
    text = re.sub(r'`(.+?)`', r'\1', text)

    # 删除引用块 >
    text = re.sub(r'^>\s?', '', text, flags=re.MULTILINE)

    # 删除水平分割线
    text = re.sub(r'^[-*]{3,}\s*$', '', text, flags=re.MULTILINE)

    # 删除列表标记（保留内容）
    text = re.sub(r'^[\*\-\+]\s+', '', text, flags=re.MULTILINE)
    text = re.sub(r'^\d+\.\s+', '', text, flags=re.MULTILINE)

    return text.strip()

Using Existing Libraries for Conversion

Writing regular expressions to handle all Markdown syntax edge cases is complex. Using existing libraries is recommended: Python's markdown library (convert to HTML first, then use BeautifulSoup to extract plain text); or directly use the markdownify reverse library; the remark toolchain in JavaScript. These libraries correctly handle nested structures, escaped characters, and various Markdown dialects (CommonMark, GFM, etc.).

Pandoc Command-Line Conversion

# 使用 Pandoc 将 Markdown 转为纯文本
pandoc input.md -t plain -o output.txt

# 转为 HTML（中间步骤）
pandoc input.md -t html -o output.html

# 批量转换目录中的所有 .md 文件
for f in *.md; do pandoc "$f" -t plain -o "${f%.md}.txt"; done

Preserving Structure vs. Complete Removal

Depending on the target use, choose different conversion strategies: completely remove all formatting (for TTS, simple text analysis); preserve basic structure (headings separated by blank lines, lists with indentation — for plain text requiring higher readability); convert to reStructuredText or another lightweight markup language (for scenarios that need to preserve structure but switch format).

Reverse Operation: Plain Text to Markdown

Sometimes existing plain text content needs to be converted to Markdown format (like migrating old documents to a blog system). Automatic conversion is hard to perfect, but can be semi-automated: use scripts to infer paragraph type from first line format as headings, convert line-leading hyphens to lists, infer all-caps words as emphasis. Manual review is still needed at the end — automation only reduces the workload.

Try the free tool now

Use Free Tool →