Fetch Methods Reference

Proxy Cascade (General URLs)

Use scripts/fetch.sh for automatic proxy cascade with fallback. Try in order until success:

1. r.jina.ai

bash ~/.claude/skills/qiaomu-markdown-proxy/scripts/fetch.sh "https://example.com"

Wide coverage, preserves image links. Try this first.

2. defuddle.md

Automatically tried by fetch.sh if r.jina.ai fails. Cleaner output with YAML frontmatter.

3. agent-fetch

Last resort local tool, automatically tried if both proxies fail.

With Custom Proxy

bash ~/.claude/skills/qiaomu-markdown-proxy/scripts/fetch.sh "https://example.com" "http://127.0.0.1:7890"

PDF to Markdown

Remote PDF URL

r.jina.ai handles PDF URLs directly:

curl -sL "https://r.jina.ai/https://example.com/paper.pdf"

If that fails, download and extract locally:

curl -sL "https://example.com/paper.pdf" -o /tmp/input.pdf
bash ~/.claude/skills/qiaomu-markdown-proxy/scripts/extract_pdf.sh /tmp/input.pdf

Local PDF File

bash ~/.claude/skills/qiaomu-markdown-proxy/scripts/extract_pdf.sh /path/to/file.pdf

The script tries three methods in order:

marker-pdf (best quality, requires: pip install marker-pdf)
- Best for papers, tables, complex layouts
- Preserves formatting and structure
pdftotext (fast, requires: brew install poppler)
- Good for text-heavy PDFs
- Fast extraction with layout preservation
pypdf (no-dependency fallback, requires: pip install pypdf)
- Works everywhere Python is available
- Basic text extraction

WeChat Public Account (公众号)

Use the proxy cascade first (r.jina.ai / defuddle.md). Works for most articles without extra tools. If proxies are blocked, use the built-in Playwright script as last resort:

python3 ~/.claude/skills/qiaomu-markdown-proxy/scripts/fetch_weixin.py "https://mp.weixin.qq.com/s/abc123"

Requirements (one-time setup, ~300 MB):

pip install playwright beautifulsoup4 lxml
playwright install chromium

Output: YAML frontmatter (title, author, date, url) + Markdown body JSON output:

python3 ~/.claude/skills/qiaomu-markdown-proxy/scripts/fetch_weixin.py "URL" --json

Feishu / Lark Document

Built-in API script for Feishu documents. Requires app credentials:

export FEISHU_APP_ID=your_app_id
export FEISHU_APP_SECRET=your_app_secret
python3 ~/.claude/skills/qiaomu-markdown-proxy/scripts/fetch_feishu.py "https://xxx.feishu.cn/docx/xxxxxxxx"

Supported types:

docx - New-style documents
doc - Legacy documents
wiki - Wiki pages (auto-resolves to actual document)

Required permissions: docx:document:readonly, wiki:wiki:readonly Output: YAML frontmatter (title, document_id, url) + Markdown body JSON output:

python3 ~/.claude/skills/qiaomu-markdown-proxy/scripts/fetch_feishu.py "URL" --json

YouTube Videos

Use the dedicated yt-search-download skill for YouTube content. It handles:

Video download
Subtitle extraction
Transcript generation

Do not use qiaomu-markdown-proxy for YouTube URLs.

Content Validation

The fetch.sh script validates content before returning:

Must have more than 5 lines
Filters out common error pages:
- “Don’t miss what’s happening” (Twitter login wall)
- “Access Denied”
- “404 Not Found”

If validation fails, automatically tries the next method.

概览

AI Agent

对话型 AI

编程开发

技术工程

翻译场景

社区贡献

Methods

Fetch Methods Reference

Proxy Cascade (General URLs)

1. r.jina.ai

2. defuddle.md

3. agent-fetch

With Custom Proxy

PDF to Markdown

Remote PDF URL

Local PDF File

WeChat Public Account (公众号)

Feishu / Lark Document

YouTube Videos

Content Validation

概览

AI Agent

对话型 AI

编程开发

技术工程

翻译场景

社区贡献

​Fetch Methods Reference

​Proxy Cascade (General URLs)

​1. r.jina.ai

​2. defuddle.md

​3. agent-fetch

​With Custom Proxy

​PDF to Markdown

​Remote PDF URL

​Local PDF File

​WeChat Public Account (公众号)

​Feishu / Lark Document

​YouTube Videos

​Content Validation

Fetch Methods Reference

Proxy Cascade (General URLs)

1. r.jina.ai

2. defuddle.md

3. agent-fetch

With Custom Proxy

PDF to Markdown

Remote PDF URL

Local PDF File

WeChat Public Account (公众号)

Feishu / Lark Document

YouTube Videos

Content Validation