Product

Fusion 5.8

Apache Tika Parser

This stage is deprecated.

Apache Tika is a versatile parser that supports many types of unstructured document formats, such as HTML, PDF, Microsoft Office documents, OpenOffice, RTF, audio, video, images, and more. A complete list of supported formats is available at http://tika.apache.org/.

To perform image text extraction when Include images is enabled, Tesseract should be installed in the server hosting Fusion.

When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

Loading configuration schema...