How-to

How to translate a scanned PDF file

Translating a scanned PDF document into another language while preserving the original layout

Scanned PDFs are images of text, not actual text — which is why most translators including Google Translate either reject them, return an empty result, or show a "can't translate this file" error. To translate a scanned PDF you need OCR (text extraction) before translation. DocTranslating runs OCR automatically as part of the translation pipeline, supports 100+ languages, and rebuilds the translated text into a copy of the original PDF. For accuracy on important documents, verify the OCR output first on PDFEquips so extraction errors don't compound with translation errors.

Updated June 5, 2026 · 8 min read

If you've ever uploaded a scanned PDF to a free translator and gotten back an empty file, a "can't translate this file" error, or a translated copy with all the text missing — you're not doing anything wrong. Most online translators, including the free document upload in Google Translate, don't run OCR on scanned content. This guide explains why that happens, what you actually need to translate a scanned PDF, and how to do it without losing the original layout.

Why scanned PDFs won't translate normally

A normal PDF — one exported from Word, an editor, or a browser — has a hidden text layer that translators read directly. A scanned PDF doesn't. When you scan a document, your scanner or phone camera captures a picture of each page. The result looks like text, but to a computer it's just an image — there's nothing extractable underneath. That's why selecting text in a scanned PDF usually doesn't work either: there are no characters to select, only pixels.

Most translation tools assume the text layer is already there. When they don't find it, they fail in confusing ways. Common symptoms include:

What you actually need: OCR + translation

Translating a scanned PDF is a two-step process under the hood, even when a single tool handles both:

  1. OCR reads each page image and extracts the recognisable text — words, numbers, and basic layout.
  2. Translation takes that extracted text, translates it, and writes it back into a copy of the document.

DocTranslating runs both steps automatically when you upload a scanned PDF — you don't need to OCR it yourself first. The catch worth understanding upfront: translation quality can only ever be as good as the OCR that feeds it. A blurry scan produces blurry OCR, and blurry OCR plus translation compounds the errors. The result can look fluent and still be subtly wrong, so important documents are worth verifying before relying on them.

Step-by-step: translate a scanned PDF

  1. 1

    Open DocTranslating and upload your scanned PDF

    Drag the file onto the upload area, or click to browse. The tool detects the file is a PDF; you don't need to do anything special to flag it as scanned — OCR runs automatically when needed.

  2. 2

    Set your source and target languages

    Pick the language the document is written in and the language you want to translate it into. For scanned PDFs, set the source language explicitly rather than relying on auto-detect — auto-detection is less reliable on OCR'd text than on clean text.

  3. 3

    Choose the Gemini engine

    For scanned PDFs, Gemini is the strongest choice. It's LLM-based, so it uses surrounding context to infer meaning when OCR produces partially garbled words, while sentence-level engines like DeepL pass garbled words through unchanged. You can also write custom instructions to keep terminology consistent across the document.

  4. 4

    Translate, then review the result carefully

    Start the translation, download the file when it's ready, and compare it page-by-page with the original. Pay special attention to numbers, dates, proper nouns, addresses and anything legally important — these are where OCR errors typically hide because they don't have surrounding context the translator can use to self-correct.

Which translation engine is best for scanned PDFs?

All DocTranslating engines that accept PDFs run OCR on scanned content, but they handle imperfect OCR output very differently. No OCR is 100% accurate — the real question is how the translator copes when it sees a partially garbled word.

EngineBehaviour on OCR outputWhen to use it
GeminiLLM-based; uses context to infer meaning when OCR is imperfectDefault choice for any scanned PDF
DeepLSentence-level translation; garbled words come out garbledClean, high-quality scans only
Google CloudRobust to noise, but adds a small watermark to translated PDFsWidest language coverage; files under 10 MB
Microsoft AzureDoesn't accept PDFs at allConvert the PDF to Word first (see below)
Translation engines on scanned PDFs

Improving OCR before you translate

OCR quality depends almost entirely on the input. A clean, properly-rotated scan at decent resolution produces near-perfect OCR; a faded, skewed, low-resolution scan produces unreliable OCR no matter which tool runs it. A few things worth doing before you upload:

Edge cases and current limitations

Handwritten documents

OCR for printed text is mature and reliable. OCR for handwritten text is much harder, and results are inconsistent across the whole industry — not just one tool. If your scanned PDF is handwritten, expect significant manual cleanup, and for anything legally sensitive prefer manual transcription over machine OCR.

Large or long scans

The Gemini engine caps each file at 25 pages and 100 MB. Longer or larger scans need a workaround:

Scanned PDFs in right-to-left languages

If you're translating a scanned PDF written in Arabic, Hebrew or Persian, there is a current limitation worth knowing: the PDF text-extraction layer can return RTL content in visual draw order rather than logical reading order, which means even OCR'd words can come out scrambled before translation starts. RTL Word and PowerPoint files work fine, and translating into an RTL language works fine — it's RTL PDF sources that are affected. If you have access to the original editable file, translate that instead. Otherwise this is being worked on but isn't solved yet.

Frequently asked questions

Why can't Google Translate translate my scanned PDF?

Google Translate's document upload reads the existing text layer of a PDF — it doesn't OCR image-based pages. Because a scanned PDF has no text layer, there's nothing to read, so Google Translate either returns an empty file or a "can't translate this file" message. The fix is to use a translator that includes OCR, or to OCR the PDF separately first and then upload the searchable copy.

How can I tell if my PDF is scanned or has a real text layer?

Open the PDF and try to select a sentence with your cursor. If text highlights and you can copy it, the PDF has a real text layer and any translator should handle it. If nothing happens — or you can only select the whole page as one image — it's scanned and needs OCR before translation.

Can I translate a scanned PDF for free?

Most free translators, including the document upload in Google Translate, don't run OCR on scanned PDFs, so they'll return an empty result or an error. Free tools that do include OCR usually have low size limits and limited language coverage. DocTranslating runs OCR automatically and supports 100+ languages with usage-based pricing, so you pay for what you translate rather than a recurring subscription.

Which translation engine is best for scanned PDFs?

Gemini is the strongest choice in DocTranslating. As an LLM-based engine it uses surrounding context to interpret meaning even when OCR introduces small errors, while sentence-level engines like DeepL pass garbled words through unchanged. Google Cloud is also robust on scans but adds a small watermark to translated PDFs.

Can I translate a handwritten scanned document?

OCR on handwriting is much less reliable than OCR on printed text — this is true across the whole industry, not just one tool. For anything legally sensitive or requiring high accuracy, manual transcription before translation is the safer route. For casual handwritten notes, OCR plus translation may produce a workable draft you can clean up afterward.

What if my scanned PDF is larger than the file size limit?

Compress the PDF using the PDF compressor on PDFEquips — it can typically halve a scan's size without visible quality loss. If the PDF is also long, split it into chunks of 25 pages or fewer with PDFEquips' splitter, translate each piece, and merge them back into a single document.

Will the translated PDF keep the original layout?

Yes — DocTranslating rebuilds the translated text into a copy of the original document, preserving paragraphs, tables, headings and images. For scanned PDFs specifically, layout fidelity depends on how clearly the original was structured: simple documents come out almost identical; densely-formatted scans may show some drift.

How do I check the OCR is accurate before committing to translate?

Run OCR separately first using the OCR tool on PDFEquips. It produces a searchable PDF where you can copy out the recognised text and read through it. If any names, dates or critical phrases came out wrong, fix them at the source before sending the file to translation — errors at the OCR stage compound with translation errors and are much easier to catch early.

I'm translating from a scanned Arabic PDF — does it work?

Translating into Arabic works correctly. Translating from a scanned Arabic (or Hebrew, Persian) PDF currently has a limitation: the PDF text-extraction layer can return right-to-left text in visual order rather than logical reading order, so the words can come out scrambled. RTL Word and PowerPoint files are fine; it's RTL PDF sources specifically that are affected, and this is a known limitation being worked on.

Is the translated scanned PDF editable?

The output is a copy of the input format, so a scanned PDF input gives you a translated PDF. If you want an editable file at the end, convert the original scanned PDF to Word first using PDFEquips' PDF-to-Word converter (it runs OCR as part of the conversion), then translate the .docx — you'll get an editable Word document back instead of a PDF.

← All guides