How to Convert Markdown to Plain Text
Why Convert Markdown to Plain Text
Markdown is an excellent lightweight markup language, but in certain scenarios, Markdown-formatted text needs to be converted to clean plain text: copying document content to systems that do not support Markdown (old CRM, ERP); preparing text content for text-to-speech (TTS); extracting article body for SEO analysis or word counting; as an intermediate step when exporting to PDF; providing plain text versions for non-technical users.
Markdown Elements to Remove
- Heading markers: # H1, ## H2, ### H3, etc. (remove # symbols, keep text)
- Bold and italic: bold, italic, bold (remove * and _ symbols)
- Links: link text (typically keep link text, delete URL and brackets)
- Images:  (keep alt text or remove entirely)
- Code blocks: inline code and code fences with backticks (keep code content, remove backticks)
- List markers: - and * lists, numbered lists (remove markers or convert to plain text)
- Blockquotes: > quote (remove > symbols)
- Horizontal rules: --- or *** (delete or replace with blank line)
Python Code Implementation
import re
def markdown_to_plain(text):
# ๅ ้คๆ ้ข #
text = re.sub(r'^#{1,6}\s+', '', text, flags=re.MULTILINE)
# ๅ ้ค็ฒไฝๅๆไฝๆ ่ฎฐ๏ผไฟ็ๆๅญ๏ผ
text = re.sub(r'\*\*(.+?)\*\*', r'\1', text) # **็ฒไฝ**
text = re.sub(r'\*(.+?)\*', r'\1', text) # *ๆไฝ*
text = re.sub(r'__(.+?)__', r'\1', text) # __็ฒไฝ__
text = re.sub(r'_(.+?)_', r'\1', text) # _ๆไฝ_
# ๅค็้พๆฅ๏ผ[ๆๅญ](URL) -> ๆๅญ
text = re.sub(r'\[(.+?)\]\(.+?\)', r'\1', text)
# ๅ ้คๅพ็๏ผ -> alt ๆๅฎๅ
จๅ ้ค
text = re.sub(r'!\[(.+?)\]\(.+?\)', r'\1', text)
# ๅ ้คไปฃ็ ๅๅๅผๅท
text = re.sub(r'```[\w]*\n?', '', text)
text = re.sub(r'`(.+?)`', r'\1', text)
# ๅ ้คๅผ็จๅ >
text = re.sub(r'^>\s?', '', text, flags=re.MULTILINE)
# ๅ ้คๆฐดๅนณๅๅฒ็บฟ
text = re.sub(r'^[-*]{3,}\s*$', '', text, flags=re.MULTILINE)
# ๅ ้คๅ่กจๆ ่ฎฐ๏ผไฟ็ๅ
ๅฎน๏ผ
text = re.sub(r'^[\*\-\+]\s+', '', text, flags=re.MULTILINE)
text = re.sub(r'^\d+\.\s+', '', text, flags=re.MULTILINE)
return text.strip()
Using Existing Libraries for Conversion
Writing regular expressions to handle all Markdown syntax edge cases is complex. Using existing libraries is recommended: Python's markdown library (convert to HTML first, then use BeautifulSoup to extract plain text); or directly use the markdownify reverse library; the remark toolchain in JavaScript. These libraries correctly handle nested structures, escaped characters, and various Markdown dialects (CommonMark, GFM, etc.).
Pandoc Command-Line Conversion
# ไฝฟ็จ Pandoc ๅฐ Markdown ่ฝฌไธบ็บฏๆๆฌ
pandoc input.md -t plain -o output.txt
# ่ฝฌไธบ HTML๏ผไธญ้ดๆญฅ้ชค๏ผ
pandoc input.md -t html -o output.html
# ๆน้่ฝฌๆข็ฎๅฝไธญ็ๆๆ .md ๆไปถ
for f in *.md; do pandoc "$f" -t plain -o "${f%.md}.txt"; done
Preserving Structure vs. Complete Removal
Depending on the target use, choose different conversion strategies: completely remove all formatting (for TTS, simple text analysis); preserve basic structure (headings separated by blank lines, lists with indentation โ for plain text requiring higher readability); convert to reStructuredText or another lightweight markup language (for scenarios that need to preserve structure but switch format).
Reverse Operation: Plain Text to Markdown
Sometimes existing plain text content needs to be converted to Markdown format (like migrating old documents to a blog system). Automatic conversion is hard to perfect, but can be semi-automated: use scripts to infer paragraph type from first line format as headings, convert line-leading hyphens to lists, infer all-caps words as emphasis. Manual review is still needed at the end โ automation only reduces the workload.
Try the free tool now
Use Free Tool โ