The open source vs cloud dilemma
PDF generation is deceptively simple. Render some data into a template, produce a file. In practice, the infrastructure around that simple operation — scaling, monitoring, error handling, font management, security patching, queue management — is where engineering time disappears.
Open source tools give you full control and zero vendor lock-in. Cloud APIs give you zero infrastructure and predictable costs. Neither is universally better. The right choice depends on your volume, your team's DevOps capacity, your compliance requirements, and how central document generation is to your product.
Let's look at the actual options in each camp.
The open source landscape
Gotenberg — Docker-based conversion engine
Gotenberg wraps Chromium and LibreOffice inside a Docker container and exposes a REST API for converting HTML, Markdown, Word, and other formats to PDF. It is the most popular self-hosted PDF tool in the Docker ecosystem.
- Strengths: Clean REST API, supports multiple input formats (HTML, DOCX, ODT), good Docker support, active community, handles merging and converting office documents.
- Weaknesses: Relies on headless Chrome under the hood, so you inherit Chromium's memory overhead (~300MB per instance). Scaling requires managing container orchestration. No built-in template system — you feed it pre-rendered HTML.
- Best for: Teams already running Kubernetes or Docker Swarm who need a general-purpose document converter.
Stirling-PDF — the Swiss army knife
Stirling-PDF is a self-hosted web application for PDF manipulation: merge, split, compress, convert, OCR, add watermarks, and more. It is not a generation tool per se — it's a processing tool.
- Strengths: Impressive breadth of PDF operations. Self-contained Docker image. Good web UI for non-technical users. Active development.
- Weaknesses: Not designed for programmatic document generation from templates. No API-first design for integration into backend services. Better suited for manual PDF operations than automated pipelines.
- Best for: Internal tools where users need to manipulate existing PDFs. Not a direct comparison for template-based generation.
Carbone.io — template-driven document engine
Carbone is a template engine that takes DOCX, XLSX, or ODT files as templates, injects JSON data, and produces PDF output (via LibreOffice conversion). The community edition is open source; advanced features require the paid cloud version.
- Strengths: Designers create templates in Word or LibreOffice — no coding required. JSON data injection is straightforward. Handles complex repeating sections, conditionals, and formatters.
- Weaknesses: PDF output depends on LibreOffice rendering, which can differ from what designers see in Word. Template debugging is painful when formatting breaks. The open source version lacks some features of the cloud offering.
- Best for: Teams where business users create templates in Word and developers inject data via API.
WeasyPrint — Python CSS-to-PDF engine
WeasyPrint is a Python library that converts HTML and CSS to PDF without a browser. It implements its own CSS rendering engine, so there is no Chromium dependency.
- Strengths: Lightweight compared to Puppeteer/Chromium. Good CSS support (Flexbox, Grid, paged media). Native Python integration. No browser binary required.
- Weaknesses: Python-only. No JavaScript execution, so dynamic content must be pre-rendered. CSS support is good but not complete — some edge cases behave differently from Chrome. Performance degrades on complex documents with many images.
- Best for: Python teams generating documents that don't require JavaScript rendering.
Puppeteer — headless Chrome
Puppeteer (and Playwright) launch headless Chrome and use page.pdf() to generate PDFs. This gives you pixel-perfect Chrome rendering with full CSS and JavaScript support.
- Strengths: Full browser rendering. If it looks right in Chrome, the PDF will match. JavaScript execution for dynamic charts and content. Mature ecosystem.
- Weaknesses: Heavy resource usage (~300MB RAM per browser instance). Cold start latency of 2–4 seconds. Requires browser pool management in production. Chromium updates can break rendering.
- Best for: Documents that require JavaScript execution or pixel-perfect Chrome CSS rendering.
The cloud API landscape
Typsetter — Typst-powered API
Typsetter is a PDF generation API built on the Typst typesetting language. Templates are written in Typst with Tera (Jinja2-like) templating for data injection. No browser, no LibreOffice — the Typst compiler produces PDFs directly.
- Strengths: Fast renders (~340ms average). Built-in template editor with visual preview. Batch generation from CSV. Scheduled generation. No infrastructure to manage. REST API with SDKs.
- Weaknesses: Templates use Typst, not HTML/CSS. Learning curve if your team is CSS-heavy. Data leaves your infrastructure (though encryption in transit and at rest is standard).
PDFMonkey — HTML template cloud API
PDFMonkey lets you design HTML/CSS templates in a web editor and generate PDFs via API. It uses a Chrome-based renderer in the cloud.
- Strengths: Familiar HTML/CSS templates. Web-based editor. Webhook notifications. Good for teams coming from an HTML background.
- Weaknesses: Chrome-based rendering means inheriting browser quirks. Slower than native renderers. Pricing scales with volume.
CraftMyPDF — drag-and-drop builder
CraftMyPDF provides a visual drag-and-drop template builder with a REST API for generation. Designed for non-developers to create templates.
- Strengths: No-code template builder. Good for teams where designers create templates. Integrations with Zapier and Make.
- Weaknesses: Less flexibility than code-based template systems. Complex layouts can be difficult in the visual editor.
DocRaptor — Prince XML in the cloud
DocRaptor is a cloud API powered by the Prince XML engine, which is widely regarded as having the best CSS-to-PDF output quality in the industry.
- Strengths: Excellent CSS paged media support. Best-in-class print CSS rendering. Mature and reliable.
- Weaknesses: Expensive at scale. No visual template editor. Requires HTML/CSS input.
The comparison
| Criteria | Open Source (self-hosted) | Cloud APIs (general) | Typsetter |
|---|---|---|---|
| Cost at 1K PDFs/mo | $10–$50 (server) | $15–$49 (plan) | $0 (free tier) |
| Cost at 10K PDFs/mo | $50–$150 (server + ops) | $49–$199 (plan) | $99 (Pro plan) |
| Cost at 100K PDFs/mo | $200–$500 (cluster) | $500–$2,000+ | Custom pricing |
| Setup time | Hours to days | Minutes | Minutes |
| Ongoing maintenance | You own it all | Zero | Zero |
| Scaling | Manual (containers, LB) | Automatic | Automatic |
| Template system | Varies (HTML, DOCX) | Varies (HTML, drag-drop) | Typst + visual editor |
| Render speed | 500ms–4s (depends on tool) | 500ms–3s | ~340ms avg |
| Data sovereignty | Full control | Third-party servers | EU servers, encrypted |
| Batch processing | Build it yourself | Some support it | Native CSV batch |
The "Cost at 100K PDFs/mo" row is where open source shines. At extreme volume, the per-document cost of a self-hosted solution drops to fractions of a cent, while cloud APIs charge per document. If you're generating hundreds of thousands of PDFs monthly, the infrastructure investment can pay for itself — assuming you have the team to maintain it.
The hidden costs of self-hosting
The comparison table above tells only half the story. Self-hosted solutions have real costs that don't appear in the server bill:
DevOps and infrastructure
Running Gotenberg or Puppeteer in production means managing Docker containers, configuring health checks, setting up auto-scaling rules, and maintaining a CI/CD pipeline for your PDF service. For Chromium-based tools, you're managing a browser pool — one of the more error-prone pieces of infrastructure you can own.
Monitoring and alerting
When your PDF service throws a 500 at 2am because Chromium ran out of memory and the OOM killer terminated the process, someone on your team gets paged. You need metrics on render times, memory usage, queue depth, and error rates. This is not optional infrastructure — it's the cost of running a production service.
Security updates
Chromium vulnerabilities are disclosed regularly. When a critical CVE drops for the browser engine your PDF service depends on, you need to patch and redeploy. LibreOffice (used by Carbone and Gotenberg for office format conversion) has its own security surface. These updates cannot be deferred indefinitely.
Scaling under load
A batch job that generates 10,000 invoices at month-end will saturate a single Gotenberg container. You need horizontal scaling, a job queue (Redis, RabbitMQ, SQS), retry logic for transient failures, and dead-letter handling for documents that consistently fail. This is a meaningful engineering project.
Font and rendering consistency
Chromium renders fonts differently across operating systems. A PDF rendered on your Mac in development may look different from what your Ubuntu container produces. Font installation, fallback configuration, and rendering consistency testing are ongoing concerns.
We estimate the typical engineering cost of maintaining a self-hosted PDF pipeline at 4–10 hours per month for a healthy setup. At a blended rate of $100/hr, that's $400–$1,000/month in developer time alone — before the server bill. This is not a criticism of self-hosting; it is a cost that should be factored into the decision.
When to choose open source
Self-hosting is the right call in several scenarios. Being honest about this matters, because the wrong choice in either direction wastes money and engineering time.
Data sovereignty and compliance
If your documents contain personally identifiable information (PII), financial data, or health records, and your compliance framework (HIPAA, SOC 2 Type II, specific GDPR interpretations) requires that data never leave your infrastructure, self-hosting is the only option. No cloud API can satisfy a requirement that prohibits third-party data processing.
Air-gapped and offline environments
Government, defense, and certain financial systems operate in environments with no internet access. A self-hosted tool running inside your network is the only viable approach. Gotenberg and WeasyPrint both work well in air-gapped Docker environments.
Extreme volume
At 500K+ PDFs per month, the per-document economics of cloud APIs become difficult to justify. A well-tuned Gotenberg cluster on dedicated hardware can produce PDFs at a fraction of a cent each. The infrastructure investment is significant, but at this scale you likely have the DevOps team to support it.
Custom rendering requirements
If your documents require specific rendering behavior — custom font shaping, specialized page layout algorithms, or deep integration with internal systems — owning the rendering pipeline gives you control that no API can match.
Good fit for self-hosting
Strict data residency requirements. Air-gapped environments. 500K+ PDFs/month with a DevOps team to support the infrastructure. Need for custom rendering behavior.
Bad fit for self-hosting
Small team without dedicated DevOps. Variable or unpredictable volume. Need to ship fast without infrastructure work. Limited budget for ongoing maintenance.
When to choose cloud
Cloud APIs earn their keep when infrastructure work is a distraction from your core product.
Speed to market
Integrating Typsetter takes about an hour: create an account, pick a template, call the API. Integrating Gotenberg takes a day minimum: set up Docker, configure the container, write the HTML rendering layer, build the API wrapper, set up health checks. If you're building an MVP or adding PDF generation as a feature (not your core product), the cloud path is faster by an order of magnitude.
No DevOps team
A startup with three developers does not have bandwidth to manage a Chromium-based PDF service alongside their product. The maintenance burden is real and ongoing. A cloud API converts that variable cost into a predictable monthly bill.
Predictable costs
Cloud pricing is simple: you pay per document or per plan tier. Self-hosting costs are unpredictable — a memory leak, a scaling incident, or a failed Chromium update can consume days of engineering time in a single month.
Automatic scaling
Month-end invoice runs, seasonal spikes, marketing campaigns that trigger document generation — cloud APIs handle these transparently. With self-hosting, you either over-provision (wasting money) or under-provision (risking failures during peaks).
The hybrid approach: open source for dev, cloud for prod
There is a middle path that some teams adopt successfully: use an open source tool during development and staging, then switch to a cloud API for production.
How it works
In local development and CI, you run Gotenberg or WeasyPrint in a Docker container. Templates are developed and tested against the self-hosted renderer. In production, the same data payload is sent to a cloud API like Typsetter instead. A simple environment variable switches the rendering backend.
Trade-offs of the hybrid approach
- Pro: No cloud costs during development. Full local control for rapid iteration. Production reliability of a managed service.
- Pro: Developers can work offline and still generate test PDFs.
- Con: Rendering differences between local and cloud engines. A document that looks perfect in Gotenberg may render slightly differently in Typsetter (different rendering engines).
- Con: Two rendering paths means two sets of templates if the engines use different input formats. This works when both accept HTML, but not when one uses Typst and the other uses HTML.
The hybrid approach works best when both environments use the same input format. If your cloud API accepts HTML (like PDFMonkey or DocRaptor), pairing it with a local Gotenberg instance is seamless. If your cloud API uses a different template language (like Typsetter's Typst), the hybrid approach is less practical — you'd maintain two template sets.
Decision framework
Use this quick checklist to narrow your choice:
- Is data sovereignty a hard requirement? If yes, self-host. No exceptions.
- Are you generating 100K+ PDFs/month? If yes, evaluate the TCO of self-hosting vs cloud. At extreme volume, self-hosting often wins on cost.
- Do you have a dedicated DevOps team? If no, the maintenance burden of self-hosting will fall on developers who should be building features.
- Is PDF generation core to your product? If yes, owning the infrastructure may be strategic. If it's a supporting feature, delegate it.
- Do you need to ship this week? If yes, cloud. No contest.
Conclusion
The open source tools in this space are genuinely excellent. Gotenberg is well-engineered. WeasyPrint solves a real problem elegantly. Carbone's template approach is clever. Puppeteer gives you unmatched rendering fidelity. There is no shame in choosing any of them.
But for most teams — especially those without dedicated infrastructure engineers — the total cost of self-hosting exceeds the subscription price of a cloud API. The hidden costs (monitoring, scaling, security patches, debugging rendering inconsistencies at 2am) are real and recurring.
Typsetter exists specifically for teams that want to skip the infrastructure work. Fast renders, a proper template system, batch processing, and scheduled generation — all accessible through a REST API with no Docker containers to manage, no Chromium to babysit, and no scaling surprises at month-end.
Start with the free tier. If self-hosting turns out to be the right call for your use case, you'll know within a week. If it isn't, you'll have a working PDF pipeline in production before lunch.
Try Typsetter for free
100 PDFs/month on the free plan. API key in 30 seconds. No credit card required.