跳转到主要内容

Fetch Methods Reference

Proxy Cascade (General URLs)

Use scripts/fetch.sh for automatic proxy cascade with fallback. Try in order until success:

1. r.jina.ai

bash ~/.claude/skills/qiaomu-markdown-proxy/scripts/fetch.sh "https://example.com"
Wide coverage, preserves image links. Try this first.

2. defuddle.md

Automatically tried by fetch.sh if r.jina.ai fails. Cleaner output with YAML frontmatter.

3. agent-fetch

Last resort local tool, automatically tried if both proxies fail.

With Custom Proxy

bash ~/.claude/skills/qiaomu-markdown-proxy/scripts/fetch.sh "https://example.com" "http://127.0.0.1:7890"

PDF to Markdown

Remote PDF URL

r.jina.ai handles PDF URLs directly:
curl -sL "https://r.jina.ai/https://example.com/paper.pdf"
If that fails, download and extract locally:
curl -sL "https://example.com/paper.pdf" -o /tmp/input.pdf
bash ~/.claude/skills/qiaomu-markdown-proxy/scripts/extract_pdf.sh /tmp/input.pdf

Local PDF File

bash ~/.claude/skills/qiaomu-markdown-proxy/scripts/extract_pdf.sh /path/to/file.pdf
The script tries three methods in order:
  1. marker-pdf (best quality, requires: pip install marker-pdf)
    • Best for papers, tables, complex layouts
    • Preserves formatting and structure
  2. pdftotext (fast, requires: brew install poppler)
    • Good for text-heavy PDFs
    • Fast extraction with layout preservation
  3. pypdf (no-dependency fallback, requires: pip install pypdf)
    • Works everywhere Python is available
    • Basic text extraction

WeChat Public Account (公众号)

Use the proxy cascade first (r.jina.ai / defuddle.md). Works for most articles without extra tools. If proxies are blocked, use the built-in Playwright script as last resort:
python3 ~/.claude/skills/qiaomu-markdown-proxy/scripts/fetch_weixin.py "https://mp.weixin.qq.com/s/abc123"
Requirements (one-time setup, ~300 MB):
pip install playwright beautifulsoup4 lxml
playwright install chromium
Output: YAML frontmatter (title, author, date, url) + Markdown body JSON output:
python3 ~/.claude/skills/qiaomu-markdown-proxy/scripts/fetch_weixin.py "URL" --json

Feishu / Lark Document

Built-in API script for Feishu documents. Requires app credentials:
export FEISHU_APP_ID=your_app_id
export FEISHU_APP_SECRET=your_app_secret
python3 ~/.claude/skills/qiaomu-markdown-proxy/scripts/fetch_feishu.py "https://xxx.feishu.cn/docx/xxxxxxxx"
Supported types:
  • docx - New-style documents
  • doc - Legacy documents
  • wiki - Wiki pages (auto-resolves to actual document)
Required permissions: docx:document:readonly, wiki:wiki:readonly Output: YAML frontmatter (title, document_id, url) + Markdown body JSON output:
python3 ~/.claude/skills/qiaomu-markdown-proxy/scripts/fetch_feishu.py "URL" --json

YouTube Videos

Use the dedicated yt-search-download skill for YouTube content. It handles:
  • Video download
  • Subtitle extraction
  • Transcript generation
Do not use qiaomu-markdown-proxy for YouTube URLs.

Content Validation

The fetch.sh script validates content before returning:
  • Must have more than 5 lines
  • Filters out common error pages:
    • “Don’t miss what’s happening” (Twitter login wall)
    • “Access Denied”
    • “404 Not Found”
If validation fails, automatically tries the next method.