Why your database is your document engine
Every invoice you send is a SELECT query away from being a PDF. The customer name, the line items, the totals, the due date — it all lives in your database already. The missing piece is the transformation layer: something that takes structured data and produces a typeset document.
Most teams solve this with HTML templates rendered through a headless browser. That works, but it introduces heavy infrastructure (Chromium), slow render times (2–4 seconds), and a maintenance burden that grows with scale. A purpose-built PDF API eliminates that middleware entirely.
The pattern is straightforward: query your database, shape the result into a JSON payload, POST it to an API, get a PDF binary back. No browser. No server-side rendering. No LaTeX installation.
Architecture overview
The data flow for database-to-PDF generation has four stages:
Your application queries the database, maps the rows to the variable schema your template expects, sends the JSON to Typsetter, and receives a PDF in the response body. The template itself — layout, fonts, colors, page structure — is managed in the Typsetter dashboard. Your code only sends data.
Example 1: Node.js + PostgreSQL — monthly invoices
This example queries a PostgreSQL database for all customers with unbilled orders, generates an invoice PDF for each one, and writes the files to disk. In production you would upload to S3 or attach to an email — the PDF generation step is identical.
The key insight: your SQL query does the heavy lifting of joining tables and aggregating line items. The API call is a simple POST with the query result mapped to template variables. No HTML rendering, no CSS debugging, no browser process management.
Example 2: Python + MySQL — employee pay slips
This example connects to a MySQL database, queries employee payroll data for the current month, and generates a pay slip PDF for each employee. Python's requests library keeps the API call minimal.
The pattern is identical regardless of language or database: query rows, map to template variables, call the API. The Typsetter template handles all formatting — columns, alignment, headers, footers, page breaks — so your application code stays focused on data.
Batch approach: generate hundreds of PDFs at once
When you need to generate documents for an entire table — monthly invoices for all customers, quarterly statements for all accounts — looping through individual API calls works but is not optimal. Typsetter supports batch generation natively.
Option A: CSV batch upload
Export your query result as a CSV file and upload it to the Typsetter batch endpoint. Each row becomes one PDF. The response is a ZIP file containing all generated documents.
Option B: parallel API calls with concurrency control
If you need more control over the process — custom error handling per record, progress tracking, conditional logic — use parallel API calls with a concurrency limiter.
For batches over 1,000 documents, use the CSV batch endpoint. It processes documents server-side in parallel and returns a single ZIP — no need to manage concurrency or handle individual failures in your code.
Scheduling: automate recurring document generation
Monthly invoices, weekly reports, quarterly statements — these are recurring tasks. You have two options for scheduling.
Option 1: Typsetter schedules (zero infrastructure)
Typsetter has a built-in schedules feature. You configure a schedule in the dashboard, point it at a data source (webhook URL that returns JSON, or a static data payload), and Typsetter generates the PDFs on the schedule you define. The generated documents are delivered via webhook, email, or stored in your Typsetter account.
This is the simplest approach — no cron job, no server, no Lambda function. You expose one endpoint that returns the current data, and Typsetter handles the rest.
Option 2: cron job or task scheduler
If you need full control, run the generation script on a schedule using your platform's native scheduler:
Storing generated PDFs
Once you have the PDF binary, you need to put it somewhere. The right choice depends on your architecture.
Upload to S3 (or any object storage)
Save URL reference in the database
A common pattern is to store the S3 URL (or file path) back in the database, linked to the record that generated it. This lets you serve the PDF on demand without regenerating it.
Other storage options
- Local filesystem: Simple for small-scale or development. Not recommended for production — no redundancy, no CDN, no access control.
- Google Cloud Storage / Azure Blob: Same pattern as S3 with their respective SDKs.
- Email attachment: Generate the PDF and attach it directly to an outbound email using Nodemailer, SendGrid, or Postmark.
- Database BLOB column: Technically possible but generally discouraged. Databases are not optimized for serving binary files at scale.
Error handling and retries
When generating PDFs from database data at scale, things will go wrong. Missing fields, malformed data, network timeouts, rate limits. Robust error handling makes the difference between a script that works in development and one that works in production.
Always validate your data before sending it to the API. A missing required field produces a clearer error from your validation layer than from the API response. Log every failure with the record ID so you can re-run just the failed records.
Performance tips
When generating thousands of PDFs from database queries, performance bottlenecks are more likely in your database and network layer than in the PDF API itself. Here are the optimizations that matter most.
1. Use connection pooling
Opening a new database connection per query is expensive. Both the Node.js pg and Python mysql-connector examples above use connection pools. Size the pool to match your concurrency — if you run 10 parallel API calls, you need at least 10 pool connections.
2. Paginate large result sets
Do not load 100,000 customer rows into memory at once. Use LIMIT and OFFSET (or cursor-based pagination with WHERE id > last_id) to process in chunks of 100–500 rows.
3. Run API calls in parallel
Typsetter's API handles concurrent requests well. Instead of sequential calls (N × 340ms), run 10–20 in parallel (N/10 × 340ms). The concurrency limiter pattern shown in the batch section prevents overwhelming either the API or your own server.
4. Cache templates
If you are using multiple templates, the Typsetter API caches compiled templates server-side. The first render of a template in a session may take slightly longer; subsequent renders are faster. Group your generation by template to take advantage of this.
5. Keep payloads lean
Only send the data your template actually uses. Sending the entire customer object with 50 fields when the template only uses 8 adds unnecessary serialization and transfer time. Map your query results to the exact shape the template expects.
With 10 parallel requests and cursor-based pagination, a test run generating 5,000 invoices from a PostgreSQL database completed in 2 minutes and 48 seconds — an effective rate of ~30 PDFs per second including database query time, API calls, and S3 uploads.
Start generating PDFs from your database today
The code in this guide is production-ready. The pattern is the same whether you are generating 10 invoices or 100,000 statements: query your database, shape the data, call the API. Typsetter handles the typesetting, page layout, fonts, and PDF compilation — your code handles the data.
The free tier gives you 100 PDFs per month to build and test your integration. No credit card, no approval process — create an account, get an API key, and start rendering.
Try Typsetter for free
100 PDFs/month on the free plan. API key in 30 seconds. No credit card required.