PDF text extraction

Syntax

	$input.pdf2txt()

Description

Extracts the text from a given PDF document. In the case of a scanned PDF, text is extracted using OCR software. The result may then be inaccurate or incorrect.

The function uses external tools that may need to be installed on the application server: pdf2txt for simple text extraction, pdftoppm for image generation in preparation for OCR, and tesseract for the OCR process of scanned documents.

Parameters

name Type Type Description Mandatory Default
input Binary data in PDF format The document to be converted. yes -

Return value

Type: String

The text extracted from the PDF.