Apache Tika Parser Stage
Apache Tika is a versatile parser that supports many types of unstructured document formats, such as HTML, PDF, Microsoft Office documents, OpenOffice, RTF, audio, video, images, and more. A complete list of supported formats is available at http://tika.apache.org/.
To perform image text extraction when Include images is enabled, Tesseract should be installed in the server hosting Fusion.
Was this page helpful?
⌘I