Paper2Any - Paper Multimodal Workflow

Overview

Paper2Any is an open-source multimodal workflow platform for academic papers. It converts paper PDFs, screenshots, or text into model architecture diagrams, technical roadmaps, experimental plots, PPT presentations, and more — all with one click.

Project Info

🔗 Source Code: github.com/OpenDCAI/Paper2Any
📜 License: Open Source
👤 Organization: OpenDCAI
⭐ Community contributed, supports multiple LLMs via APIYI

Why Paper2Any

Multiple Output Formats

Convert papers to architecture diagrams, roadmaps, PPTs, rebuttals, and more — one tool for the entire research workflow

Flexible Model Selection

Dynamically switch between GPT-4o, Claude Sonnet, Qwen-VL and more via API parameters — no hardcoding needed

CLI + Web Dual Mode

Both command-line scripts and web interface available to suit different workflows

OpenAI-Compatible API

Native support for OpenAI-compatible API format — just configure APIYI’s Base URL to access 200+ models

Core Modules

Module	Description	Output Formats
Paper2Figure	Generate scientific visualizations from papers	Architecture diagrams, roadmaps (PPTX + SVG), plots
Paper2Diagram	Create flowcharts from papers/text/images	draw.io / PNG / SVG
Paper2PPT	Convert papers to presentations	PPTX (supports 40+ slides)
Paper2Rebuttal	Generate structured rebuttal responses	Rebuttal docs with evidence grounding
PDF2PPT	Layout-preserving PDF to editable PPT	PPTX
Image2PPT	Transform images/screenshots into slides	PPTX
PPTPolish	AI-driven layout optimization	PPTX
Knowledge Base	File ingestion, semantic search, KB-driven generation	Multiple formats

Connect to LLMs via APIYI

Paper2Any supports OpenAI-compatible API format. After configuring APIYI as the LLM endpoint, you can use GPT, Claude, Gemini, DeepSeek, and 200+ other models.

Docker Deployment

Step 1: Get Your APIYI API Key

Visit APIYI Console to register/login
Go to the Tokens section
Generate a new API key
Copy the key (starts with sk-)

Step 2: Clone and Configure Backend

After cloning the repository, edit fastapi_app/.env to set APIYI as the LLM endpoint:

# fastapi_app/.env
DEFAULT_LLM_API_URL=https://api.apiyi.com/v1
BACKEND_API_KEY=sk-your-apiyi-key

Optionally, specify default models for different workflows:

PAPER2PPT_DEFAULT_MODEL=gpt-4o
PDF2PPT_DEFAULT_MODEL=gpt-4o

Step 3: Configure Frontend

Edit frontend-workflow/.env to default the web UI to APIYI:

# frontend-workflow/.env
VITE_DEFAULT_LLM_API_URL=https://api.apiyi.com/v1
VITE_LLM_API_URLS=https://api.apiyi.com/v1

Step 4: Launch

Start everything with Docker Compose:

docker compose up -d --build

Once started, open the frontend to begin using Paper2Any.

CLI Usage

Paper2Any provides standalone CLI scripts with --api-url and --api-key parameters for direct APIYI integration:

# Paper to PPT
python script/run_paper2ppt_cli.py \
  --input paper.pdf \
  --api-url https://api.apiyi.com/v1 \
  --api-key sk-your-apiyi-key \
  --model gpt-4o

# Paper to Figure
python script/run_paper2figure_cli.py \
  --input paper.pdf \
  --api-url https://api.apiyi.com/v1 \
  --api-key sk-your-apiyi-key \
  --graph-type model_arch

Model Recommendations: For paper-to-PPT conversion, GPT-4o or Claude Sonnet 4.5 are recommended for their strong long-document understanding and structured output capabilities. For diagram generation, vision models like Qwen-VL are also worth trying.

Deployment Options

Method	Requirements	Best For
Docker (Recommended)	One-click frontend + backend startup	Quick start, production
Linux Native	Python 3.11+, LaTeX, Inkscape, LibreOffice	Development, customization
Windows	Python 3.12, Inkscape	Local use

GPU-dependent features like PDF2PPT and Image2PPT require a separate SAM3 model server. See the project README for GPU deployment instructions.

FAQ

How do I connect Paper2Any to APIYI models?

Set DEFAULT_LLM_API_URL to https://api.apiyi.com/v1 and BACKEND_API_KEY to your APIYI key in the environment variables. For CLI mode, use the --api-url and --api-key parameters.

Which models are supported?

Via APIYI, you can access 200+ models including GPT-4o, Claude Sonnet 4.5, Gemini, DeepSeek, Qwen, and more. Models can be dynamically switched in the web interface without code changes.

Docker startup fails — what should I check?

Verify that:

Docker and Docker Compose are properly installed
.env files are correctly configured
Required ports are not in use
Check docker compose logs for detailed error messages

PPT generation errors or incomplete content?

Ensure your APIYI account has sufficient balance
For long papers, use models with larger context windows (e.g., GPT-4o 128K)
Check that the paper PDF is searchable text (scanned PDFs may produce poor results)

How do I get an APIYI API key?

Visit the APIYI Console, create an account, and generate a new key in the Tokens section. New users receive free trial credits.

APIYI Model List

View the complete list of 200+ models supported by APIYI

Base URL Configuration

Learn how to configure APIYI Base URL in various tools

APIYI Token Management

Manage API keys, check usage and balance

APIYI Pricing

View model pricing and top-up offers

Overview

AI Agent

Conversational AI

Programming

Engineering

Translation

Community Contributions

Paper2Any - Paper Multimodal Workflow

Overview

Why Paper2Any

Multiple Output Formats

Flexible Model Selection

CLI + Web Dual Mode

OpenAI-Compatible API

Core Modules

Connect to LLMs via APIYI

Docker Deployment

CLI Usage

Deployment Options

FAQ

APIYI Model List

Base URL Configuration

APIYI Token Management

APIYI Pricing

Overview

AI Agent

Conversational AI

Programming

Engineering

Translation

Community Contributions

Documentation Index

​Overview

​Why Paper2Any

Multiple Output Formats

Flexible Model Selection

CLI + Web Dual Mode

OpenAI-Compatible API

​Core Modules

​Connect to LLMs via APIYI

​Docker Deployment

​CLI Usage

​Deployment Options

​FAQ

​Related Resources

APIYI Model List

Base URL Configuration

APIYI Token Management

APIYI Pricing

Overview

Why Paper2Any

Core Modules

Connect to LLMs via APIYI

Docker Deployment

CLI Usage

Deployment Options

FAQ

Related Resources