Fetch Methods Reference
Proxy Cascade (General URLs)
Usescripts/fetch.sh for automatic proxy cascade with fallback. Try in order until success:
1. r.jina.ai
2. defuddle.md
Automatically tried byfetch.sh if r.jina.ai fails. Cleaner output with YAML frontmatter.
3. agent-fetch
Last resort local tool, automatically tried if both proxies fail.With Custom Proxy
PDF to Markdown
Remote PDF URL
r.jina.ai handles PDF URLs directly:Local PDF File
-
marker-pdf (best quality, requires:
pip install marker-pdf)- Best for papers, tables, complex layouts
- Preserves formatting and structure
-
pdftotext (fast, requires:
brew install poppler)- Good for text-heavy PDFs
- Fast extraction with layout preservation
-
pypdf (no-dependency fallback, requires:
pip install pypdf)- Works everywhere Python is available
- Basic text extraction
WeChat Public Account (公众号)
Use the proxy cascade first (r.jina.ai / defuddle.md). Works for most articles without extra tools. If proxies are blocked, use the built-in Playwright script as last resort:Feishu / Lark Document
Built-in API script for Feishu documents. Requires app credentials:docx- New-style documentsdoc- Legacy documentswiki- Wiki pages (auto-resolves to actual document)
docx:document:readonly, wiki:wiki:readonly
Output: YAML frontmatter (title, document_id, url) + Markdown body
JSON output:
YouTube Videos
Use the dedicatedyt-search-download skill for YouTube content. It handles:
- Video download
- Subtitle extraction
- Transcript generation
Content Validation
Thefetch.sh script validates content before returning:
- Must have more than 5 lines
- Filters out common error pages:
- “Don’t miss what’s happening” (Twitter login wall)
- “Access Denied”
- “404 Not Found”